diff --git a/LLMs-papers.md b/LLMs-papers.md index 2908639..a5024cd 100644 --- a/LLMs-papers.md +++ b/LLMs-papers.md @@ -36,13 +36,13 @@ Analyzing Formality Distribution across Languages](http://arxiv.org/abs/2302.12299v1) 24. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation](http://arxiv.org/abs/2302.08387v1) -25. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1) -26. [Can Character-based Language Models Improve Downstream Task Performance +25. [Reimagining Retrieval Augmented Language Models for Answering Queries](http://arxiv.org/abs/2306.01061v1) +26. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1) +27. [Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?](http://arxiv.org/abs/2110.13658v1) -27. [Learning Translation Quality Evaluation on Low Resource Languages from +28. [Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models](http://arxiv.org/abs/2302.03491v1) -28. [Investigating the Translation Performance of a Large Multilingual +29. [Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM](http://arxiv.org/abs/2303.01911v2) -29. [Investigating Robustness of Dialog Models to Popular Figurative Language +30. [Investigating Robustness of Dialog Models to Popular Figurative Language Constructs](http://arxiv.org/abs/2110.00687v1) -30. [DPRK-BERT: The Supreme Language Model](http://arxiv.org/abs/2112.00567v1) diff --git a/data/2023-06-11.json b/data/2023-06-11.json new file mode 100644 index 0000000..47abc9d --- /dev/null +++ b/data/2023-06-11.json @@ -0,0 +1 @@ +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human activity recognition (HAR) is a time series classification task thatfocuses on identifying the motion patterns from human sensor readings. Adequatedata is essential but a major bottleneck for training a generalizable HARmodel, which assists customization and optimization of online web applications.However, it is costly in time and economy to collect large-scale labeled datain reality, i.e., the low-resource challenge. Meanwhile, data collected fromdifferent persons have distribution shifts due to different living habits, bodyshapes, age groups, etc. The low-resource and distribution shift challenges aredetrimental to HAR when applying the trained model to new unseen subjects. Inthis paper, we propose a novel approach called Diverse and Discriminativerepresentation Learning (DDLearn) for generalizable low-resource HAR. DDLearnsimultaneously considers diversity and discrimination learning. With theconstructed self-supervised learning task, DDLearn enlarges the data diversityand explores the latent activity properties. Then, we propose a diversitypreservation module to preserve the diversity of learned features by enlargingthe distribution divergence between the original and augmented domains.Meanwhile, DDLearn also enhances semantic discrimination by learningdiscriminative representations with supervised contrastive learning. Extensiveexperiments on three public HAR datasets demonstrate that our methodsignificantly outperforms state-of-art methods by an average accuracyimprovement of 9.5% under the low-resource distribution shift scenarios, whilebeing a generic, explainable, and flexible framework.", "output": "Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Non-Fungible-Token (NFT) market has experienced explosive growth inrecent years. According to DappRadar, the total transaction volume on OpenSea,the largest NFT marketplace, reached 34.7 billion dollars in February 2023.However, the NFT market is mostly unregulated and there are significantconcerns about money laundering, fraud and wash trading. Amateur traders andretail investors comprise a significant fraction of the NFT market. Hence it isimportant that researchers highlight the relevant risks involved in NFTtrading. In this paper, we attempt to uncover common fraudulent behaviors suchas wash trading that could mislead other traders. Using market data, we designquantitative features from the network, monetary, and temporal perspectivesthat are fed into K-means clustering unsupervised learning algorithm to sorttraders into groups. Lastly, we discuss the clustering results' significanceand how regulations can reduce undesired behaviors. Our work can potentiallyhelp regulators narrow down their search space for bad actors in the market aswell as provide insights for amateur traders to protect themselves fromunforeseen frauds.", "output": "Abnormal Trading Detection in the NFT Market."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretation of deep learning remains a very challenging problem. Althoughthe Class Activation Map (CAM) is widely used to interpret deep modelpredictions by highlighting object location, it fails to provide insight intothe salient features used by the model to make decisions. Furthermore, existingevaluation protocols often overlook the correlation between interpretabilityperformance and the model's decision quality, which presents a more fundamentalissue. This paper proposes a new two-stage interpretability method called theDecomposition Class Activation Map (Decom-CAM), which offers a feature-levelinterpretation of the model's prediction. Decom-CAM decomposes intermediateactivation maps into orthogonal features using singular value decomposition andgenerates saliency maps by integrating them. The orthogonality of featuresenables CAM to capture local features and can be used to pinpoint semanticcomponents such as eyes, noses, and faces in the input image, making it morebeneficial for deep model interpretation. To ensure a comprehensive comparison,we introduce a new evaluation protocol by dividing the dataset into subsetsbased on classification accuracy results and evaluating the interpretabilityperformance on each subset separately. Our experiments demonstrate that theproposed Decom-CAM outperforms current state-of-the-art methods significantlyby generating more precise saliency maps across all levels of classificationaccuracy. Combined with our feature-level interpretability approach, this papercould pave the way for a new direction for understanding the decision-makingprocess of deep neural networks.", "output": "Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of making Conformal Prediction (CP) intervals locallyadaptive. Most existing methods focus on approximating the object-conditionalvalidity of the intervals by partitioning or re-weighting the calibration set.Our strategy is new and conceptually different. Instead of re-weighting thecalibration data, we redefine the conformity measure through a trainable changeof variables, $A to phi_X(A)$, that depends explicitly on the objectattributes, $X$. Under certain conditions and if $phi_X$ is monotonic in $A$for any $X$, the transformations produce prediction intervals that areguaranteed to be marginally valid and have $X$-dependent sizes. We describe howto parameterize and train $phi_X$ to maximize the interval efficiency.Contrary to other CP-aware training methods, the objective function is smoothand can be minimized through standard gradient methods without approximations.", "output": "On training locally adaptive CP."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In empathetic conversations, individuals express their empathy towardsothers. Previous work has mainly focused on generating empathetic responses byutilizing the speaker's emotion. Besides, external commonsense knowledge hasbeen applied to enhance the system's understandings of the speaker's situation.However, given an event, commonsense knowledge base contains various relations,potentially leading to confusion for the dialogue system. Consequently,inconsistencies arise among the emotion, generated response and speaker'scontextual information. To this end, we propose a novel approach for empatheticresponse generation, which incorporates an adaptive module for commonsenseknowledge selection to ensure consistency between the generated empatheticresponses and the speaker's situation. This selected knowledge is used torefine the commonsense cognition and empathy expression for generatedresponses. Experimental results show that our approach significantlyoutperforms baseline models in both automatic and human evaluations, exhibitingthe generation of more coherent and empathetic responses. Moreover, casestudies highlight the interpretability of knowledge selection in the responsesand the effectiveness of adaptive module in our model. Code:", "output": "Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense Knowledge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Green Light Optimal Speed Advisory (GLOSA) system suggests speeds to vehiclesto assist them in passing through intersections during green intervals, thusreducing traffic congestion and fuel consumption by minimizing the number ofstops and idle times at intersections. However, previous research has focusedon optimizing the GLOSA algorithm, neglecting the frequency of speed advisoryby the GLOSA system. Specifically, some studies provide speed advisory profileat each decision step, resulting in redundant advisory, while others calculatethe optimal speed for the vehicle only once, which cannot adapt to dynamictraffic. In this paper, we propose an Adaptive Frequency GLOSA (AF-GLOSA) modelbased on Hybrid Proximal Policy Optimization (H-PPO), which employs anactor-critic architecture with a hybrid actor network. The hybrid actor networkconsists of a discrete actor that outputs advisory frequency and a continuousactor that outputs acceleration profiles. Additionally, we design a novelreward function that considers both travel efficiency and fuel consumption. TheAF-GLOSA model is evaluated in comparison to traditional GLOSA andlearning-based GLOSA methods in a three-lane intersection with a traffic signalin SUMO, under three different levels of traffic density. The resultsdemonstrate that the AF-GLOSA model performs best in reducing average stoptimes, fuel consumption and CO2 emissions.", "output": "Adaptive Frequency Green Light Optimal Speed Advisory based on Hybrid Actor-Critic Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present BlenderBot 3x, an update on the conversational model BlenderBot 3,which is now trained using organic conversation and feedback data fromparticipating users of the system in order to improve both its skills andsafety. We are publicly releasing the participating de-identified interactiondata for use by the research community, in order to spur further progress.Training models with organic data is challenging because interactions withpeople \"in the wild\" include both high quality conversations and feedback, aswell as adversarial and toxic behavior. We study techniques that enablelearning from helpful teachers while avoiding learning from people who aretrying to trick the model into unhelpful or toxic responses. BlenderBot 3x isboth preferred in conversation to BlenderBot 3, and is shown to produce saferresponses in challenging situations. While our current models are still farfrom perfect, we believe further improvement can be achieved by continued useof the techniques explored in this work.", "output": "Improving Open Language Models by Learning from Organic Interactions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapid advancements of the text-to-image generative model,AI-generated images (AGIs) have been widely applied to entertainment,education, social media, etc. However, considering the large quality varianceamong different AGIs, there is an urgent need for quality models that areconsistent with human subjective ratings. To address this issue, we extensivelyconsider various popular AGI models, generated AGI through different promptsand model parameters, and collected subjective scores at the perceptual qualityand text-to-image alignment, thus building the most comprehensive AGIsubjective quality database AGIQA-3K so far. Furthermore, we conduct abenchmark experiment on this database to evaluate the consistency between thecurrent Image Quality Assessment (IQA) model and human perception, whileproposing StairReward that significantly improves the assessment performance ofsubjective text-to-image alignment. We believe that the fine-grained subjectivescores in AGIQA-3K will inspire subsequent AGI quality models to fit humansubjective perception mechanisms at both perception and alignment levels and tooptimize the generation result of future AGI models. The database is releasedon url{", "output": "AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How do neural networks extract patterns from pixels? Feature visualizationsattempt to answer this important question by visualizing highly activatingpatterns through optimization. Today, visualization methods form the foundationof our knowledge about the internal workings of neural networks, as a type ofmechanistic interpretability. Here we ask: How reliable are featurevisualizations? We start our investigation by developing network circuits thattrick feature visualizations into showing arbitrary patterns that arecompletely disconnected from normal network behavior on natural input. We thenprovide evidence for a similar phenomenon occurring in standard, unmanipulatednetworks: feature visualizations are processed very differently from standardinput, casting doubt on their ability to \"explain\" how neural networks processnatural images. We underpin this empirical finding by theory proving that theset of functions that can be reliably understood by feature visualization isextremely small and does not include general black-box neural networks.Therefore, a promising way forward could be the development of networks thatenforce certain structures in order to ensure more reliable featurevisualizations.", "output": "Don't trust your eyes: on the (un)reliability of feature visualizations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rapidly increasing quality of AI-generated content makes it difficult todistinguish between human and AI-generated texts, which may lead to undesirableconsequences for society. Therefore, it becomes increasingly important to studythe properties of human texts that are invariant over text domains and variousproficiency of human writers, can be easily calculated for any language, andcan robustly separate natural and AI-generated texts regardless of thegeneration model and sampling method. In this work, we propose such aninvariant of human texts, namely the intrinsic dimensionality of the manifoldunderlying the set of embeddings of a given text sample. We show that theaverage intrinsic dimensionality of fluent texts in natural language ishovering around the value $9$ for several alphabet-based languages and around$7$ for Chinese, while the average intrinsic dimensionality of AI-generatedtexts for each language is $approx 1.5$ lower, with a clear statisticalseparation between human-generated and AI-generated distributions. Thisproperty allows us to build a score-based artificial text detector. Theproposed detector's accuracy is stable over text domains, generator models, andhuman writer proficiency levels, outperforming SOTA detectors in model-agnosticand cross-domain scenarios by a significant margin.", "output": "Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompting large language models has gained immense popularity in recent yearsdue to the advantage of producing good results even without the need forlabelled data. However, this requires prompt tuning to get optimal prompts thatlead to better model performances. In this paper, we explore the use ofsoft-prompt tuning on sentiment classification task to quantify the biases oflarge language models (LLMs) such as Open Pre-trained Transformers (OPT) andGalactica language model. Since these models are trained on real-world datathat could be prone to bias toward certain groups of populations, it isimportant to identify these underlying issues. Using soft-prompts to evaluatebias gives us the extra advantage of avoiding the human-bias injection that canbe caused by manually designed prompts. We check the model biases on differentsensitive attributes using the group fairness (bias) and find interesting biaspatterns. Since LLMs have been used in the industry in various applications, itis crucial to identify the biases before deploying these models in practice. Weopen-source our pipeline and encourage industry researchers to adapt our workto their use cases.", "output": "Soft-prompt Tuning for Large Language Models to Evaluate Bias."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023)is the second annual CVPR workshop aimed at the monitoring and analysis of thehealth of Earth ecosystems by leveraging the vast amount of remote sensing datathat is continuously being collected. The primary objective of this workshop isto bring together the Earth and environmental science communities as well asthe multimodal representation learning communities to explore new ways ofharnessing technological advancements in support of environmental monitoring.The MultiEarth Workshop also seeks to provide a common benchmark for processingmultimodal remote sensing information by organizing public challenges focusedon monitoring the Amazon rainforest. These challenges include estimatingdeforestation, detecting forest fires, translating synthetic aperture radar(SAR) images to the visible domain, and projecting environmental trends. Thispaper presents the challenge guidelines, datasets, and evaluation metrics. Ourchallenge website is available at", "output": "MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Natural Language to SQL systems (NL-to-SQL) have recently shown a significantincrease in accuracy for natural language to SQL query translation. Thisimprovement is due to the emergence of transformer-based language models, andthe popularity of the Spider benchmark - the de-facto standard for evaluatingNL-to-SQL systems. The top NL-to-SQL systems reach accuracies of up to 85%.However, Spider mainly contains simple databases with few tables, columns, andentries, which does not reflect a realistic setting. Moreover, complexreal-world databases with domain-specific content have little to no trainingdata available in the form of NL/SQL-pairs leading to poor performance ofexisting NL-to-SQL systems.In this paper, we introduce ScienceBenchmark, a new complex NL-to-SQLbenchmark for three real-world, highly domain-specific databases. For this newbenchmark, SQL experts and domain experts created high-quality NL/SQL-pairs foreach domain. To garner more data, we extended the small amount ofhuman-generated data with synthetic data generated using GPT-3. We show thatour benchmark is highly challenging, as the top performing systems on Spiderachieve a very low performance on our benchmark. Thus, the challenge ismany-fold: creating NL-to-SQL systems for highly complex domains with a smallamount of hand-made training data augmented with synthetic data. To ourknowledge, ScienceBenchmark is the first NL-to-SQL benchmark designed withcomplex real-world scientific databases, containing challenging training andtest data carefully validated by domain experts.", "output": "ScienceBenchmark: A Complex Real-World Benchmark for Evaluating Natural Language to SQL Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a 3D human keypoint detector from point clouds in a supervisedmanner requires large volumes of high quality labels. While it is relativelyeasy to capture large amounts of human point clouds, annotating 3D keypoints isexpensive, subjective, error prone and especially difficult for long-tail cases(pedestrians with rare poses, scooterists, etc.). In this work, we proposeGC-KPL - Geometry Consistency inspired Key Point Leaning, an approach forlearning 3D human joint locations from point clouds without human labels. Weachieve this by our novel unsupervised loss formulations that account for thestructure and movement of the human body. We show that by training on a largetraining set from Waymo Open Dataset without any human annotated keypoints, weare able to achieve reasonable performance as compared to the fully supervisedapproach. Further, the backbone benefits from the unsupervised training and isuseful in downstream fewshot learning of keypoints, where fine-tuning on only10 percent of the labeled training data gives comparable performance tofine-tuning on the entire set. We demonstrated that GC-KPL outperforms by alarge margin over SoTA when trained on entire dataset and efficiently leverageslarge volumes of unlabeled data.", "output": "3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The integration of machine learning in medical image analysis can greatlyenhance the quality of healthcare provided by physicians. The combination ofhuman expertise and computerized systems can result in improved diagnosticaccuracy. An automated machine learning approach simplifies the creation ofcustom image recognition models by utilizing neural architecture search andtransfer learning techniques. Medical imaging techniques are used tonon-invasively create images of internal organs and body parts for diagnosticand procedural purposes. This article aims to highlight the potentialapplications, strategies, and techniques of AutoML in medical imaging throughtheoretical and empirical evidence.", "output": "AutoML Systems For Medical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Instruction-tuned large language models have revolutionized natural languageprocessing and have shown great potential in applications such asconversational agents. These models, such as GPT-4, can not only masterlanguage but also solve complex tasks in areas like mathematics, coding,medicine, and law. Despite their impressive capabilities, there is still a lackof comprehensive understanding regarding their full potential, primarily due tothe black-box nature of many models and the absence of holistic evaluationstudies. To address these challenges, we present INSTRUCTEVAL, a morecomprehensive evaluation suite designed specifically for instruction-tunedlarge language models. Unlike previous works, our evaluation involves arigorous assessment of models based on problem-solving, writing ability, andalignment to human values. We take a holistic approach to analyze variousfactors affecting model performance, including the pretraining foundation,instruction-tuning data, and training methods. Our findings reveal that thequality of instruction data is the most crucial factor in scaling modelperformance. While open-source models demonstrate impressive writing abilities,there is substantial room for improvement in problem-solving and alignment. Weare encouraged by the rapid development of models by the open-source community,but we also highlight the need for rigorous evaluation to support claims madeabout these models. Through INSTRUCTEVAL, we aim to foster a deeperunderstanding of instruction-tuned models and advancements in theircapabilities. INSTRUCTEVAL is publicly available at", "output": "INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The number of published research papers has experienced exponential growth inrecent years, which makes it crucial to develop new methods for efficient andversatile information extraction and knowledge discovery. To address this need,we propose a Semantic Knowledge Graph (SKG) that integrates semantic conceptsfrom abstracts and other meta-information to represent the corpus. The SKG cansupport various semantic queries in academic literature thanks to the highdiversity and rich information content stored within. To extract knowledge fromunstructured text, we develop a Knowledge Extraction Module that includes asemi-supervised pipeline for entity extraction and entity normalization. Wealso create an ontology to integrate the concepts with other meta information,enabling us to build the SKG. Furthermore, we design and develop a dataflowsystem that demonstrates how to conduct various semantic queries flexibly andinteractively over the SKG. To demonstrate the effectiveness of our approach,we conduct the research based on the visualization literature and providereal-world use cases to show the usefulness of the SKG.The dataset and codes for this work are available at", "output": "SKG: A Versatile Information Retrieval and Analysis Framework for Academic Papers with Semantic Knowledge Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Publicly deploying research chatbots is a nuanced topic involving necessaryrisk-benefit analyses. While there have recently been frequent discussions onwhether it is responsible to deploy such models, there has been far less focuson the interaction paradigms and design approaches that the resultinginterfaces should adopt, in order to achieve their goals more effectively. Weaim to pose, ground, and attempt to answer HCI questions involved in thisscope, by reporting on a mixed-methods user study conducted on a recentresearch chatbot. We find that abstract anthropomorphic representation for theagent has a significant effect on user's perception, that offering AIexplainability may have an impact on feedback rates, and that two (diegetic andextradiegetic) levels of the chat experience should be intentionally designed.We offer design recommendations and areas of further focus for the researchcommunity.", "output": "The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning models exhibit strong performance on datasets with abundantlabeled samples. However, for tabular datasets with extremely high$d$-dimensional features but limited $n$ samples (i.e. $d gg n$), machinelearning models struggle to achieve strong performance due to the risk ofoverfitting. Here, our key insight is that there is often abundant, auxiliarydomain information describing input features which can be structured as aheterogeneous knowledge graph (KG). We propose PLATO, a method that achievesstrong performance on tabular data with $d gg n$ by using an auxiliary KGdescribing input features to regularize a multilayer perceptron (MLP). InPLATO, each input feature corresponds to a node in the auxiliary KG. In theMLP's first layer, each input feature also corresponds to a weight vector.PLATO is based on the inductive bias that two input features corresponding tosimilar nodes in the auxiliary KG should have similar weight vectors in theMLP's first layer. PLATO captures this inductive bias by inferring the weightvector for each input feature from its corresponding node in the KG via atrainable message-passing function. Across 6 $d gg n$ datasets, PLATOoutperforms 13 state-of-the-art baselines by up to 10.19%.", "output": "Enabling tabular deep learning when $d \\gg n$ with an auxiliary knowledge graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Progress in graph neural networks has grown rapidly in recent years, withmany new developments in drug discovery, medical diagnosis, and recommendersystems. While this progress is significant, many networks are `black boxes'with little understanding of the `what' exactly the network is learning. Manyhigh-stakes applications, such as drug discovery, require human-intelligibleexplanations from the models so that users can recognize errors and discovernew knowledge. Therefore, the development of explainable AI algorithms isessential for us to reap the benefits of AI.We propose an explainability algorithm for GNNs called eXplainable Insight(XInsight) that generates a distribution of model explanations using GFlowNets.Since GFlowNets generate objects with probabilities proportional to a reward,XInsight can generate a diverse set of explanations, compared to previousmethods that only learn the maximum reward sample. We demonstrate XInsight bygenerating explanations for GNNs trained on two graph classification tasks:classifying mutagenic compounds with the MUTAG dataset and classifying acyclicgraphs with a synthetic dataset that we have open-sourced. We show the utilityof XInsight's explanations by analyzing the generated compounds using QSARmodeling, and we find that XInsight generates compounds that cluster bylipophilicity, a known correlate of mutagenicity. Our results show thatXInsight generates a distribution of explanations that uncovers the underlyingrelationships demonstrated by the model. They also highlight the importance ofgenerating a diverse set of explanations, as it enables us to discover hiddenrelationships in the model and provides valuable guidance for further analysis.", "output": "XInsight: Revealing Model Insights for GNNs with Flow-based Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The use of a hypothetical generative model was been suggested for causalanalysis of observational data. The very assumption of a particular model is acommitment to a certain set of variables and therefore to a certain set ofpossible causes. Estimating the joint probability distribution of can be usefulfor predicting values of variables in view of the observed values of others,but it is not sufficient for inferring causal relationships. The modeldescribes a single observable distribution and cannot a chain of effects ofintervention that deviate from the observed distribution.", "output": "On the Use of Generative Models in Observational Causal Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Healthcare knowledge graphs (HKGs) have emerged as a promising tool fororganizing medical knowledge in a structured and interpretable way, whichprovides a comprehensive view of medical concepts and their relationships.However, challenges such as data heterogeneity and limited coverage remain,emphasizing the need for further research in the field of HKGs. This surveypaper serves as the first comprehensive overview of HKGs. We summarize thepipeline and key techniques for HKG construction (i.e., from scratch andthrough integration), as well as the common utilization approaches (i.e.,model-free and model-based). To provide researchers with valuable resources, weorganize existing HKGs (The resource is available at based on thedata types they capture and application domains, supplemented with pertinentstatistical information. In the application section, we delve into thetransformative impact of HKGs across various healthcare domains, spanning fromfine-grained basic science research to high-level clinical decision support.Lastly, we shed light on the opportunities for creating comprehensive andaccurate HKGs in the era of large language models, presenting the potential torevolutionize healthcare delivery and enhance the interpretability andreliability of clinical prediction.", "output": "A Survey on Knowledge Graphs for Healthcare: Resources, Applications, and Promises."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is essential for users to understand what their AI systems can and can'tdo in order to use them safely. However, the problem of enabling users toassess AI systems with evolving sequential decision making (SDM) capabilitiesis relatively understudied. This paper presents a new approach for modeling thecapabilities of black-box AI systems that can plan and act, along with thepossible effects and requirements for executing those capabilities instochastic settings. We present an active-learning approach that caneffectively interact with a black-box SDM system and learn an interpretableprobabilistic model describing its capabilities. Theoretical analysis of theapproach identifies the conditions under which the learning process isguaranteed to converge to the correct model of the agent; empirical evaluationson different agents and simulated scenarios show that this approach is few-shotgeneralizable and can effectively describe the capabilities of arbitraryblack-box SDM agents in a sample-efficient manner.", "output": "Autonomous Capability Assessment of Black-Box Sequential Decision-Making Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision-Language Pretraining (VLP) has demonstrated remarkable capabilities inlearning visual representations from textual descriptions of images withoutannotations. Yet, effective VLP demands large-scale image-text pairs, aresource that suffers scarcity in the medical domain. Moreover, conventionalVLP is limited to 2D images while medical images encompass diverse modalities,often in 3D, making the learning process more challenging. To address thesechallenges, we present Generative Text-Guided 3D Vision-Language Pretrainingfor Unified Medical Image Segmentation (GTGM), a framework that extends of VLPto 3D medical images without relying on paired textual descriptions.Specifically, GTGM utilizes large language models (LLM) to generatemedical-style text from 3D medical images. This synthetic text is then used tosupervise 3D visual representation learning. Furthermore, a negative-freecontrastive learning objective strategy is introduced to cultivate consistentvisual representations between augmented 3D medical image patches, whicheffectively mitigates the biases associated with strict positive-negativesample pairings. We evaluate GTGM on three imaging modalities - ComputedTomography (CT), Magnetic Resonance Imaging (MRI), and electron microscopy (EM)over 13 datasets. GTGM's superior performance across various medical imagesegmentation tasks underscores its effectiveness and versatility, by enablingVLP extension into 3D medical imagery while bypassing the need for paired text.", "output": "Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Developing artificial intelligence approaches to overcome novel, unexpectedcircumstances is a difficult, unsolved problem. One challenge to advancing thestate of the art in novelty accommodation is the availability of testingframeworks for evaluating performance against novel situations. Recent noveltygeneration approaches in domains such as Science Birds and Monopoly leveragehuman domain expertise during the search to discover new novelties. Suchapproaches introduce human guidance before novelty generation occurs and yieldnovelties that can be directly loaded into a simulated environment. Weintroduce a new approach to novelty generation that uses abstract models ofenvironments (including simulation domains) that do not requiredomain-dependent human guidance to generate novelties. A key result is alarger, often infinite space of novelties capable of being generated, with thetrade-off being a requirement to involve human guidance to select and filternovelties post generation. We describe our Human-in-the-Loop novelty generationprocess using our open-source novelty generation library to test baselineagents in two domains: Monopoly and VizDoom. Our results shows theHuman-in-the-Loop method enables users to develop, implement, test, and revisenovelties within 4 hours for both Monopoly and VizDoom domains.", "output": "Human in the Loop Novelty Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge Graph (KG) completion is the problem of extending an incomplete KGwith missing facts. A key feature of Machine Learning approaches for KGcompletion is their ability to learn inference patterns, so that the predictedfacts are the results of applying these patterns to the KG. Standard completionbenchmarks, however, are not well-suited for evaluating models' abilities tolearn patterns, because the training and test sets of these benchmarks are arandom split of a given KG and hence do not capture the causality of inferencepatterns. We propose a novel approach for designing KG completion benchmarksbased on the following principles: there is a set of logical rules so that themissing facts are the results of the rules' application; the training setincludes both premises matching rule antecedents and the correspondingconclusions; the test set consists of the results of applying the rules to thetraining set; the negative examples are designed to discourage the models fromlearning rules not entailed by the rule set. We use our methodology to generateseveral benchmarks and evaluate a wide range of existing KG completion systems.Our results provide novel insights on the ability of existing models to induceinference patterns from incomplete KGs.", "output": "Revisiting Inferential Benchmarks for Knowledge Graph Completion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Embedding-based neural retrieval is a prevalent approach to address thesemantic gap problem which often arises in product search on tail queries. Incontrast, popular queries typically lack context and have a broad intent whereadditional context from users historical interaction can be helpful. In thispaper, we share our novel approach to address both: the semantic gap problemfollowed by an end to end trained model for personalized semantic retrieval. Wepropose learning a unified embedding model incorporating graph, transformer andterm-based embeddings end to end and share our design choices for optimaltradeoff between performance and efficiency. We share our learnings in featureengineering, hard negative sampling strategy, and application of transformermodel, including a novel pre-training strategy and other tricks for improvingsearch relevance and deploying such a model at industry scale. Our personalizedretrieval model significantly improves the overall search experience, asmeasured by a 5.58% increase in search purchase rate and a 2.63% increase insite-wide conversion rate, aggregated across multiple A/B tests - on livetraffic.", "output": "Unified Embedding Based Personalized Retrieval in Etsy Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have various practical applications, such asdrug discovery, recommendation engines, and chip design. However, GNNs lacktransparency as they cannot provide understandable explanations for theirpredictions. To address this issue, counterfactual reasoning is used. The maingoal is to make minimal changes to the input graph of a GNN in order to alterits prediction. While several algorithms have been proposed for counterfactualexplanations of GNNs, most of them have two main drawbacks. Firstly, they onlyconsider edge deletions as perturbations. Secondly, the counterfactualexplanation models are transductive, meaning they do not generalize to unseendata. In this study, we introduce an inductive algorithm called INDUCE, whichovercomes these limitations. By conducting extensive experiments on severaldatasets, we demonstrate that incorporating edge additions leads to bettercounterfactual results compared to the existing methods. Moreover, theinductive modeling approach allows INDUCE to directly predict counterfactualperturbations without requiring instance-specific training. This results insignificant computational speed improvements compared to baseline methods andenables scalable counterfactual analysis for GNNs.", "output": "Empowering Counterfactual Reasoning over Graph Neural Networks through Inductivity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the significant increase in the size of spatial data, it is essentialto use distributed parallel processing systems to efficiently analyze spatialdata. In this paper, we first study learned spatial data partitioning, whicheffectively assigns groups of big spatial data to computers based on locationsof data by using machine learning techniques. We formalize spatial datapartitioning in the context of reinforcement learning and develop a novel deepreinforcement learning algorithm. Our learning algorithm leverages features ofspatial data partitioning and prunes ineffective learning processes to findoptimal partitions efficiently. Our experimental study, which uses ApacheSedona and real-world spatial data, demonstrates that our method efficientlyfinds partitions for accelerating distance join queries and reduces theworkload run time by up to 59.4%.", "output": "Learned spatial data partitioning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the arising concerns of privacy within machine learning, federatedlearning (FL) was invented in 2017, in which the clients, such as mobiledevices, train a model and send the update to the centralized server. Choosingclients randomly for FL can harm learning performance due to different reasons.Many studies have proposed approaches to address the challenges of clientselection of FL. However, no systematic literature review (SLR) on this topicexisted. This SLR investigates the state of the art of client selection in FLand answers the challenges, solutions, and metrics to evaluate the solutions.We systematically reviewed 47 primary studies. The main challenges found inclient selection are heterogeneity, resource allocation, communication costs,and fairness. The client selection schemes aim to improve the original randomselection algorithm by focusing on one or several of the aforementionedchallenges. The most common metric used is testing accuracy versuscommunication rounds, as testing accuracy measures the successfulness of thelearning and preferably in as few communication rounds as possible, as they arevery expensive. Although several possible improvements can be made with thecurrent state of client selection, the most beneficial ones are evaluating theimpact of unsuccessful clients and gaining a more theoretical understanding ofthe impact of fairness in FL.", "output": "A Systematic Literature Review on Client Selection in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies have revealed that NLP predictive models are vulnerable toadversarial attacks. Most existing studies focused on designing attacks toevaluate the robustness of NLP models in the English language alone. Literaturehas seen an increasing need for NLP solutions for other languages. We,therefore, ask one natural question: whether state-of-the-art (SOTA) attackmethods generalize to other languages. This paper investigates how to adaptSOTA adversarial attack algorithms in English to the Chinese language. Ourexperiments show that attack methods previously applied to English NLP cangenerate high-quality adversarial examples in Chinese when combined with propertext segmentation and linguistic constraints. In addition, we demonstrate thatthe generated adversarial examples can achieve high fluency and semanticconsistency by focusing on the Chinese language's morphology and phonology,which in turn can be used to improve the adversarial robustness of Chinese NLPmodels.", "output": "Expanding Scope: Adapting English Adversarial Attacks to Chinese."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current communication networks use design methodologies that prevent therealization of maximum network efficiency. In the first place, while users'perception of satisfactory service diverges widely, current networks aredesigned to be a \"universal fit,\" where they are generally over-engineered todeliver services appealing to all types of users. Also, current networks lackuser-level data cognitive intelligence that would enable fast personalizednetwork decisions and actions through automation. Thus, in this article, wepropose the utilization of AI, big data analytics, and real-time non-intrusiveuser feedback in order to enable the personalization of wireless networks.Based on each user's actual QoS requirements and context, a multi-objectiveformulation enables the network to micro-manage and optimize the provided QoSand user satisfaction levels simultaneously. Moreover, in order to enable userfeedback tracking and measurement, we propose a user satisfaction model basedon the zone of tolerance concept. Furthermore, we propose a big-data-driven andAI-based personalization framework to integrate personalization into wirelessnetworks. Finally, we implement a personalized network prototype to demonstratethe proposed personalization concept and its potential benefits through a casestudy. The case study shows how personalization can be realized to enable theefficient optimization of network resources such that certain requirementlevels of user satisfaction and revenue in the form of saved resources areachieved.", "output": "Big-data-driven and AI-based framework to enable personalization in wireless networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Directed evolution plays an indispensable role in protein engineering thatrevises existing protein sequences to attain new or enhanced functions.Accurately predicting the effects of protein variants necessitates an in-depthunderstanding of protein structure and function. Although large self-supervisedlanguage models have demonstrated remarkable performance in zero-shot inferenceusing only protein sequences, these models inherently do not interpret thespatial characteristics of protein structures, which are crucial forcomprehending protein folding stability and internal molecular interactions.This paper introduces a novel pre-training framework that cascades sequentialand geometric analyzers for protein primary and tertiary structures. It guidesmutational directions toward desired traits by simulating natural selection onwild-type proteins and evaluates the effects of variants based on their fitnessto perform the function. We assess the proposed approach using a publicdatabase and two new databases for a variety of variant effect predictiontasks, which encompass a diverse set of proteins and assays from differenttaxa. The prediction results achieve state-of-the-art performance over otherzero-shot learning methods for both single-site mutations and deep mutations.", "output": "Multi-level Protein Representation Learning for Blind Mutational Effect Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In domain generalization (DG), the target domain is unknown when the model isbeing trained, and the trained model should successfully work on an arbitrary(and possibly unseen) target domain during inference. This is a difficultproblem, and despite active studies in recent years, it remains a greatchallenge. In this paper, we take a simple yet effective approach to tacklethis issue. We propose test-time style shifting, which shifts the style of thetest sample (that has a large style gap with the source domains) to the nearestsource domain that the model is already familiar with, before making theprediction. This strategy enables the model to handle any target domains witharbitrary style statistics, without additional model update at test-time.Additionally, we propose style balancing, which provides a great platform formaximizing the advantage of test-time style shifting by handling theDG-specific imbalance issues. The proposed ideas are easy to implement andsuccessfully work in conjunction with various other DG schemes. Experimentalresults on different datasets show the effectiveness of our methods.", "output": "Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The COVID-19 pandemic led to 1.1 million deaths in the United States, despitethe explosion of coronavirus research. These new findings are slow to translateto clinical interventions, leading to poorer patient outcomes and unnecessarydeaths. One reason is that clinicians, overwhelmed by patients, struggle tokeep pace with the rate of new coronavirus literature. A potential solution isdeveloping a tool for evaluating coronavirus literature using large languagemodels (LLMs) -- neural networks that are deployed for natural languageprocessing. LLMs can be used to summarize and extract user-specifiedinformation. The greater availability and advancement of LLMs and pre-processedcoronavirus literature databases provide the opportunity to assist cliniciansin evaluating coronavirus literature through a coronavirus literature specificLLM (covLLM), a tool that directly takes an inputted research article and auser query to return an answer. Using the COVID-19 Open Research Dataset(CORD-19), we produced two datasets: (1) synCovid, which uses a combination ofhandwritten prompts and synthetic prompts generated using OpenAI, and (2) realabstracts, which contains abstract and title pairs. covLLM was trained withLLaMA 7B as a baseline model to produce three models trained on (1) the Alpacaand synCovid datasets, (2) the synCovid dataset, and (3) the synCovid and realabstract datasets. These models were evaluated by two human evaluators andChatGPT. Results demonstrate that training covLLM on the synCovid and abstractpairs datasets performs competitively with ChatGPT and outperforms covLLMtrained primarily using the Alpaca dataset.", "output": "covLLM: Large Language Models for COVID-19 Biomedical Literature."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual question answering (VQA) is a Multidisciplinary research problem thatpursued through practices of natural language processing and computer vision.Visual question answering automatically answers natural language questionsaccording to the content of an image. Some testing questions require externalknowledge to derive a solution. Such knowledge-based VQA uses various methodsto retrieve features of image and text, and combine them to generate theanswer. To generate knowledgebased answers either question dependent or imagedependent knowledge retrieval methods are used. If knowledge about all theobjects in the image is derived, then not all knowledge is relevant to thequestion. On other side only question related knowledge may lead to incorrectanswers and over trained model that answers question that is irrelevant toimage. Our proposed method takes image attributes and question features asinput for knowledge derivation module and retrieves only question relevantknowledge about image objects which can provide accurate answers.", "output": "Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the recent progress in sports analytics, deep learning approaches havedemonstrated the effectiveness of mining insights into players' tactics forimproving performance quality and fan engagement. This is attributed to theavailability of public ground-truth datasets. While there are a few availabledatasets for turn-based sports for action detection, these datasets severelylack structured source data and stroke-level records since these requirehigh-cost labeling efforts from domain experts and are hard to detect usingautomatic techniques. Consequently, the development of artificial intelligenceapproaches is significantly hindered when existing models are applied to morechallenging structured turn-based sequences. In this paper, we presentShuttleSet, the largest publicly-available badminton singles dataset withannotated stroke-level records. It contains 104 sets, 3,685 rallies, and 36,492strokes in 44 matches between 2018 and 2021 with 27 top-ranking men's singlesand women's singles players. ShuttleSet is manually annotated with acomputer-aided labeling tool to increase the labeling efficiency andeffectiveness of selecting the shot type with a choice of 18 distinct classes,the corresponding hitting locations, and the locations of both players at eachstroke. In the experiments, we provide multiple benchmarks (i.e., strokeinfluence, stroke forecasting, and movement forecasting) with baselines toillustrate the practicability of using ShuttleSet for turn-based analytics,which is expected to stimulate both academic and sports communities. Over thepast two years, a visualization platform has been deployed to illustrate thevariability of analysis cases from ShuttleSet for coaches to delve intoplayers' tactical preferences with human-interactive interfaces, which was alsoused by national badminton teams during multiple international high-rankingmatches.", "output": "ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton Tactical Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is well-known that modern computer vision systems often exhibit behaviorsmisaligned with those of humans: from adversarial attacks to image corruptions,deep learning vision models suffer in a variety of settings that humans capablyhandle. In light of these phenomena, here we introduce another, orthogonalperspective studying the human-machine vision gap. We revisit the task ofrecovering images under degradation, first introduced over 30 years ago in theRecognition-by-Components theory of human vision. Specifically, we study theperformance and behavior of neural networks on the seemingly simple task ofclassifying regular polygons at varying orders of degradation along theirperimeters. To this end, we implement the Automated Shape Recoverability Testfor rapidly generating large-scale datasets of perimeter-degraded regularpolygons, modernizing the historically manual creation of image recoverabilityexperiments. We then investigate the capacity of neural networks to recognizeand recover such degraded shapes when initialized with different priors.Ultimately, we find that neural networks' behavior on this simple taskconflicts with human behavior, raising a fundamental question of the robustnessand learning capabilities of modern computer vision models.", "output": "Degraded Polygons Raise Fundamental Questions of Neural Network Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces FedMLSecurity, a benchmark that simulates adversarialattacks and corresponding defense mechanisms in Federated Learning (FL). As anintegral module of the open-sourced library FedML that facilitates FL algorithmdevelopment and performance comparison, FedMLSecurity enhances the securityassessment capacity of FedML. FedMLSecurity comprises two principal components:FedMLAttacker, which simulates attacks injected into FL training, andFedMLDefender, which emulates defensive strategies designed to mitigate theimpacts of the attacks. FedMLSecurity is open-sourced 1 and is customizable toa wide range of machine learning models (e.g., Logistic Regression, ResNet,GAN, etc.) and federated optimizers (e.g., FedAVG, FedOPT, FedNOVA, etc.).Experimental evaluations in this paper also demonstrate the ease of applicationof FedMLSecurity to Large Language Models (LLMs), further reinforcing itsversatility and practical utility in various scenarios.", "output": "FedMLSecurity: A Benchmark for Attacks and Defenses in Federated Learning and LLMs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Temporal graph clustering (TGC) is a crucial task in temporal graph learning.Its focus is on node clustering on temporal graphs, and it offers greaterflexibility for large-scale graph structures due to the mechanism of temporalgraph methods. However, the development of TGC is currently constrained by asignificant problem: the lack of suitable and reliable large-scale temporalgraph datasets to evaluate clustering performance. In other words, mostexisting temporal graph datasets are in small sizes, and even large-scaledatasets contain only a limited number of available node labels. It makesevaluating models for large-scale temporal graph clustering challenging. Toaddress this challenge, we build arXiv4TGC, a set of novel academic datasets(including arXivAI, arXivCS, arXivMath, arXivPhy, and arXivLarge) forlarge-scale temporal graph clustering. In particular, the largest dataset,arXivLarge, contains 1.3 million labeled available nodes and 10 milliontemporal edges. We further compare the clustering performance with typicaltemporal graph learning models on both previous classic temporal graph datasetsand the new datasets proposed in this paper. The clustering performance onarXiv4TGC can be more apparent for evaluating different models, resulting inhigher clustering confidence and more suitable for large-scale temporal graphclustering. The arXiv4TGC datasets are publicly available at:", "output": "arXiv4TGC: Large-Scale Datasets for Temporal Graph Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Errors of machine learning models are costly, especially in safety-criticaldomains such as healthcare, where such mistakes can prevent the deployment ofmachine learning altogether. In these settings, conservative models -- modelswhich can defer to human judgment when they are likely to make an error -- mayoffer a solution. However, detecting unusual or difficult examples is notablychallenging, as it is impossible to anticipate all potential inputs at testtime. To address this issue, prior work has proposed to minimize the model'sconfidence on an auxiliary pseudo-OOD dataset. We theoretically analyze theeffect of confidence minimization and show that the choice of auxiliary datasetis critical. Specifically, if the auxiliary dataset includes samples from theOOD region of interest, confidence minimization provably separates ID and OODinputs by predictive confidence. Taking inspiration from this result, wepresent data-driven confidence minimization (DCM), which minimizes confidenceon an uncertainty dataset containing examples that the model is likely tomisclassify at test time. Our experiments show that DCM consistentlyoutperforms state-of-the-art OOD detection methods on 8 ID-OOD dataset pairs,reducing FPR (at TPR 95%) by 6.3% and 58.1% on CIFAR-10 and CIFAR-100, andoutperforms existing selective classification approaches on 4 datasets inconditions of distribution shift.", "output": "Conservative Prediction via Data-Driven Confidence Minimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although graph neural networks (GNNs) have achieved impressive achievementsin graph classification, they often need abundant task-specific labels, whichcould be extensively costly to acquire. A credible solution is to exploreadditional labeled graphs to enhance unsupervised learning on the targetdomain. However, how to apply GNNs to domain adaptation remains unsolved owingto the insufficient exploration of graph topology and the significant domaindiscrepancy. In this paper, we propose underline{Co}upledunderline{Co}ntrastive Graph Representation Learning (method{}), whichextracts the topological information from coupled learning branches and reducesthe domain discrepancy with coupled contrastive learning. method{} contains agraph convolutional network branch and a hierarchical graph kernel networkbranch, which explore graph topology in implicit and explicit manners. Besides,we incorporate coupled branches into a holistic multi-view contrastive learningframework, which not only incorporates graph representations learned fromcomplementary views for enhanced understanding, but also encourages thesimilarity between cross-domain example pairs with the same semantics fordomain alignment. Extensive experiments on various popular datasets show thatmethod{} outperforms these competing baselines by 5.7% to 21.0% generally.", "output": "CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose to use a liquid time constant (LTC) network to predict the futureblockage status of a millimeter wave (mmWave) link using only the receivedsignal power as the input to the system. The LTC network is based on anordinary differential equation (ODE) system inspired by biology and specializedfor near-future prediction for time sequence observation as the input. Using anexperimental dataset at 60 GHz, we show that our proposed use of LTC canreliably predict the occurrence of blockage and the length of the blockagewithout the need for scenario-specific data. The results show that the proposedLTC can predict with upwards of 97.85% accuracy without prior knowledge of theoutdoor scenario or retraining/tuning. These results highlight the promisinggains of using LTC networks to predict time series-dependent signals, which canlead to more reliable and low-latency communication.", "output": "Blockage Prediction in Directional mmWave Links Using Liquid Time Constant Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the last years, the raise of Artificial Intelligence (AI), and itspervasiveness in our lives, has sparked a flourishing debate about the ethicalprinciples that should lead its implementation and use in society. Driven bythese concerns, we conduct a rapid review of several frameworks providingprinciples, guidelines, and/or tools to help practitioners in the developmentand deployment of Responsible AI (RAI) applications. We map each frameworkw.r.t. the different Software Development Life Cycle (SDLC) phases discoveringthat most of these frameworks fall just in the Requirements Elicitation phase,leaving the other phases uncovered. Very few of these frameworks offersupporting tools for practitioners, and they are mainly provided by privatecompanies. Our results reveal that there is not a \"catching-all\" frameworksupporting both technical and non-technical stakeholders in the implementationof real-world projects. Our findings highlight the lack of a comprehensiveframework encompassing all RAI principles and all (SDLC) phases that could benavigated by users with different skill sets and with different goals.", "output": "A Rapid Review of Responsible AI frameworks: How to guide the development of ethical AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of DCASE 2023 Challenge Task 7 is to generate various sound clipsfor Foley sound synthesis (FSS) by \"category-to-sound\" approach. \"Category\" isexpressed by a single index while corresponding \"sound\" covers diverse anddifferent sound examples. To generate diverse sounds for a given category, weadopt VITS, a text-to-speech (TTS) model with variational inference. Inaddition, we apply various techniques from speech synthesis including PhaseAugand Avocodo. Different from TTS models which generate short pronunciation fromphonemes and speaker identity, the category-to-sound problem requiresgenerating diverse sounds just from a category index. To compensate for thedifference while maintaining consistency within each audio clip, we heavilymodified the prior encoder to enhance consistency with posterior latentvariables. This introduced additional Gaussian on the prior encoder whichpromotes variance within the category. With these modifications, we proposeVIFS, variational inference for end-to-end Foley sound synthesis, whichgenerates diverse high-quality sounds.", "output": "VIFS: An End-to-End Variational Inference for Foley Sound Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-vehicle pursuit (MVP) such as autonomous police vehicles pursuingsuspects is important but very challenging due to its mission and safetycritical nature. While multi-agent reinforcement learning (MARL) algorithmshave been proposed for MVP problem in structured grid-pattern roads, theexisting algorithms use randomly training samples in centralized learning,which leads to homogeneous agents showing low collaboration performance. Forthe more challenging problem of pursuing multiple evading vehicles, thesealgorithms typically select a fixed target evading vehicle for pursuingvehicles without considering dynamic traffic situation, which significantlyreduces pursuing success rate. To address the above problems, this paperproposes a Progression Cognition Reinforcement Learning with PrioritizedExperience for MVP (PEPCRL-MVP) in urban multi-intersection dynamic trafficscenes. PEPCRL-MVP uses a prioritization network to assess the transitions inthe global experience replay buffer according to the parameters of each MARLagent. With the personalized and prioritized experience set selected via theprioritization network, diversity is introduced to the learning process ofMARL, which can improve collaboration and task related performance.Furthermore, PEPCRL-MVP employs an attention module to extract criticalfeatures from complex urban traffic environments. These features are used todevelop progression cognition method to adaptively group pursuing vehicles.Each group efficiently target one evading vehicle in dynamic drivingenvironments. Extensive experiments conducted with a simulator overunstructured roads of an urban area show that PEPCRL-MVP is superior to otherstate-of-the-art methods. Specifically, PEPCRL-MVP improves pursuing efficiencyby 3.95% over TD3-DMAP and its success rate is 34.78% higher than that ofMADDPG. Codes are open sourced.", "output": "Progression Cognition Reinforcement Learning with Prioritized Experience for Multi-Vehicle Pursuit."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs), such as ChatGPT and GPT-4, are gainingwide-spread real world use. Yet, the two LLMs are closed source, and little isknown about the LLMs' performance in real-world use cases. In academia, LLMperformance is often measured on benchmarks which may have leaked intoChatGPT's and GPT-4's training data. In this paper, we apply and evaluateChatGPT and GPT-4 for the real-world task of cost-efficient extractive questionanswering over a text corpus that was published after the two LLMs completedtraining. More specifically, we extract research challenges for researchers inthe field of HCI from the proceedings of the 2023 Conference on Human Factorsin Computing Systems (CHI). We critically evaluate the LLMs on this practicaltask and conclude that the combination of ChatGPT and GPT-4 makes an excellentcost-efficient means for analyzing a text corpus at scale. Cost-efficiency iskey for prototyping research ideas and analyzing text corpora from differentperspectives, with implications for applying LLMs in academia and practice. Forresearchers in HCI, we contribute an interactive visualization of 4392 researchchallenges in over 90 research topics. We share this visualization and thedataset in the spirit of open science.", "output": "Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Cost-Efficient Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce a novel semantic generative communication (SGC)framework, where generative users leverage text-to-image (T2I) generators tocreate images locally from downloaded text prompts, while non-generative usersdirectly download images from a base station (BS). Although generative usershelp reduce downlink transmission energy at the BS, they consume additionalenergy for image generation and for uploading their generator state information(GSI). We formulate the problem of minimizing the total energy consumption ofthe BS and the users, and devise a generative user selection algorithm.Simulation results corroborate that our proposed algorithm reduces total energyby up to 54% compared to a baseline with all non-generative users.", "output": "Energy-Efficient Downlink Semantic Generative Communication with Text-to-Image Generators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Each year, wildfires destroy larger areas of Spain, threatening numerousecosystems. Humans cause 90% of them (negligence or provoked) and the behaviourof individuals is unpredictable. However, atmospheric and environmentalvariables affect the spread of wildfires, and they can be analysed by usingdeep learning. In order to mitigate the damage of these events we proposed thenovel Wildfire Assessment Model (WAM). Our aim is to anticipate the economicand ecological impact of a wildfire, assisting managers resource allocation anddecision making for dangerous regions in Spain, Castilla y Le'on andAndaluc'ia. The WAM uses a residual-style convolutional network architectureto perform regression over atmospheric variables and the greenness index,computing necessary resources, the control and extinction time, and theexpected burnt surface area. It is first pre-trained with self-supervision over100,000 examples of unlabelled data with a masked patch prediction objectiveand fine-tuned using 311 samples of wildfires. The pretraining allows the modelto understand situations, outclassing baselines with a 1,4%, 3,7% and 9%improvement estimating human, heavy and aerial resources; 21% and 10,2% inexpected extinction and control time; and 18,8% in expected burnt area. Usingthe WAM we provide an example assessment map of Castilla y Le'on, visualizingthe expected resources over an entire region.", "output": "Spain on Fire: A novel wildfire risk assessment model based on image satellite processing and atmospheric information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tabular data is often hidden in text, particularly in medical diagnosticreports. Traditional machine learning (ML) models designed to work with tabulardata, cannot effectively process information in such form. On the other hand,large language models (LLMs) which excel at textual tasks, are probably not thebest tool for modeling tabular data. Therefore, we propose a novel, simple, andeffective methodology for extracting structured tabular data from textualmedical reports, called TEMED-LLM. Drawing upon the reasoning capabilities ofLLMs, TEMED-LLM goes beyond traditional extraction techniques, accuratelyinferring tabular features, even when their names are not explicitly mentionedin the text. This is achieved by combining domain-specific reasoning guidelineswith a proposed data validation and reasoning correction feedback loop. Byapplying interpretable ML models such as decision trees and logistic regressionover the extracted and validated data, we obtain end-to-end interpretablepredictions. We demonstrate that our approach significantly outperformsstate-of-the-art text classification models in medical diagnostics. Given itspredictive performance, simplicity, and interpretability, TEMED-LLM underscoresthe potential of leveraging LLMs to improve the performance and trustworthinessof ML models in medical applications.", "output": "Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work studies how brain-inspired neural ensembles equipped with localHebbian plasticity can perform active inference (AIF) in order to controldynamical agents. A generative model capturing the environment dynamics islearned by a network composed of two distinct Hebbian ensembles: a posteriornetwork, which infers latent states given the observations, and a statetransition network, which predicts the next expected latent state given currentstate-action pairs. Experimental studies are conducted using the Mountain Carenvironment from the OpenAI gym suite, to study the effect of the variousHebbian network parameters on the task performance. It is shown that theproposed Hebbian AIF approach outperforms the use of Q-learning, while notrequiring any replay buffer, as in typical reinforcement learning systems.These results motivate further investigations of Hebbian learning for thedesign of AIF networks that can learn environment dynamics without the need forrevisiting past buffered experiences.", "output": "Active Inference in Hebbian Learning Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Learning models are a standard solution for sensor-based Human ActivityRecognition (HAR), but their deployment is often limited by labeled datascarcity and models' opacity. Neuro-Symbolic AI (NeSy) provides an interestingresearch direction to mitigate these issues by infusing knowledge about contextinformation into HAR deep learning classifiers. However, existing NeSy methodsfor context-aware HAR require computationally expensive symbolic reasonersduring classification, making them less suitable for deployment onresource-constrained devices (e.g., mobile devices). Additionally, NeSyapproaches for context-aware HAR have never been evaluated on in-the-wilddatasets, and their generalization capabilities in real-world scenarios arequestionable. In this work, we propose a novel approach based on a semanticloss function that infuses knowledge constraints in the HAR model during thetraining phase, avoiding symbolic reasoning during classification. Our resultson scripted and in-the-wild datasets show the impact of different semantic lossfunctions in outperforming a purely data-driven model. We also compare oursolution with existing NeSy methods and analyze each approach's strengths andweaknesses. Our semantic loss remains the only NeSy solution that can bedeployed as a single DNN without the need for symbolic reasoning modules,reaching recognition rates close (and better in some cases) to existingapproaches.", "output": "Neuro-Symbolic Approaches for Context-Aware Human Activity Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since the rise of fair machine learning as a critical field of inquiry, manydifferent notions on how to quantify and measure discrimination have beenproposed in the literature. Some of these notions, however, were shown to bemutually incompatible. Such findings make it appear that numerous differentkinds of fairness exist, thereby making a consensus on the appropriate measureof fairness harder to reach, hindering the applications of these tools inpractice. In this paper, we investigate one of these key impossibility resultsthat relates the notions of statistical and predictive parity. Specifically, wederive a new causal decomposition formula for the fairness measures associatedwith predictive parity, and obtain a novel insight into how this criterion isrelated to statistical parity through the legal doctrines of disparatetreatment, disparate impact, and the notion of business necessity. Our resultsshow that through a more careful causal analysis, the notions of statisticaland predictive parity are not really mutually exclusive, but complementary andspanning a spectrum of fairness notions through the concept of businessnecessity. Finally, we demonstrate the importance of our findings on areal-world example.", "output": "Reconciling Predictive and Statistical Parity: A Causal Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs)have achieved great success in general domains ofnatural language processing. In this paper, we bring LLMs to the realm ofgeoscience, with the objective of advancing research and applications in thisfield. To this end, we present the first-ever LLM in geoscience, K2, alongsidea suite of resources developed to further promote LLM research withingeoscience. For instance, we have curated the first geoscience instructiontuning dataset, GeoSignal, which aims to align LLM responses togeoscience-related user queries. Additionally, we have established the firstgeoscience benchmark, GeoBenchmark, to evaluate LLMs in the context ofgeoscience. In this work, we experiment with a complete recipe to adapt apretrained general-domain LLM to the geoscience domain. Specifically, wefurther train the LLaMA-7B model on over 1 million pieces of geoscienceliterature and utilize GeoSignal's supervised data to fine-tune the model.Moreover, we share a protocol that can efficiently gather domain-specific dataand construct domain-supervised data, even in situations where manpower isscarce. Experiments conducted on the GeoBenchmark demonstrate the theeffectiveness of our approach and datasets.", "output": "Learning A Foundation Language Model for Geoscience Knowledge Understanding and Utilization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As society transitions towards an AI-based decision-making infrastructure, anever-increasing number of decisions once under control of humans are nowdelegated to automated systems. Even though such developments make variousparts of society more efficient, a large body of evidence suggests that a greatdeal of care needs to be taken to make such automated decision-making systemsfair and equitable, namely, taking into account sensitive attributes such asgender, race, and religion. In this paper, we study a specific decision-makingtask called outcome control in which an automated system aims to optimize anoutcome variable $Y$ while being fair and equitable. The interest in such asetting ranges from interventions related to criminal justice and welfare, allthe way to clinical decision-making and public health. In this paper, we firstanalyze through causal lenses the notion of benefit, which captures how much aspecific individual would benefit from a positive decision, counterfactuallyspeaking, when contrasted with an alternative, negative one. We introduce thenotion of benefit fairness, which can be seen as the minimal fairnessrequirement in decision-making, and develop an algorithm for satisfying it. Wethen note that the benefit itself may be influenced by the protected attribute,and propose causal tools which can be used to analyze this. Finally, if some ofthe variations of the protected attribute in the benefit are considered asdiscriminatory, the notion of benefit fairness may need to be strengthened,which leads us to articulating a notion of causal benefit fairness. Using thisnotion, we develop a new optimization procedure capable of maximizing $Y$ whileascertaining causal fairness in the decision process.", "output": "Causal Fairness for Outcome Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual Prompt Tuning (VPT) is an effective tuning method for adaptingpretrained Vision Transformers (ViTs) to downstream tasks. It leverages extralearnable tokens, known as prompts, which steer the frozen pretrained ViTs.Although VPT has demonstrated its applicability with supervised visiontransformers, it often underperforms with self-supervised ones. Throughempirical observations, we deduce that the effectiveness of VPT hinges largelyon the ViT blocks with which the prompt tokens interact. Specifically, VPTshows improved performance on image classification tasks for MAE and MoCo v3when the prompt tokens are inserted into later blocks rather than the firstblock. These observations suggest that there exists an optimal location ofblocks for the insertion of prompt tokens. Unfortunately, identifying theoptimal blocks for prompts within each self-supervised ViT for diverse futurescenarios is a costly process. To mitigate this problem, we propose a simpleyet effective method that learns a gate for each ViT block to adjust itsintervention into the prompt tokens. With our method, prompt tokens areselectively influenced by blocks that require steering for task adaptation. Ourmethod outperforms VPT variants in FGVC and VTAB image classification andADE20K semantic segmentation. The code is available at", "output": "Improving Visual Prompt Tuning for Self-supervised Vision Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately measuring discrimination is crucial to faithfully assessingfairness of trained machine learning (ML) models. Any bias in measuringdiscrimination leads to either amplification or underestimation of the existingdisparity. Several sources of bias exist and it is assumed that bias resultingfrom machine learning is born equally by different groups (e.g. females vsmales, whites vs blacks, etc.). If, however, bias is born differently bydifferent groups, it may exacerbate discrimination against specificsub-populations. Sampling bias, is inconsistently used in the literature todescribe bias due to the sampling procedure. In this paper, we attempt todisambiguate this term by introducing clearly defined variants of samplingbias, namely, sample size bias (SSB) and underrepresentation bias (URB). Weshow also how discrimination can be decomposed into variance, bias, and noise.Finally, we challenge the commonly accepted mitigation approach thatdiscrimination can be addressed by collecting more samples of theunderrepresented group.", "output": "Shedding light on underrepresentation and Sampling Bias in machine learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We establish a novel relation between delete-free planning, an important taskfor the AI Planning community also known as relaxed planning, and logicprogramming. We show that given a planning problem, all subsets of actions thatcould be ordered to produce relaxed plans for the problem can be bijectivelycaptured with stable models of a logic program describing the correspondingrelaxed planning problem. We also consider the supported model semantics oflogic programs, and introduce one causal and one diagnostic encoding of therelaxed planning problem as logic programs, both capturing relaxed plans withtheir supported models. Our experimental results show that these new encodingscan provide major performance gain when computing optimal relaxed plans, withour diagnostic encoding outperforming state-of-the-art approaches to relaxedplanning regardless of the given time limit when measured on a wide collectionof STRIPS planning benchmarks.", "output": "Capturing (Optimal) Relaxed Plans with Stable and Supported Models of Logic Programs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the fundamental challenges found throughout the data sciences is toexplain why things happen in specific ways, or through which mechanisms acertain variable $X$ exerts influences over another variable $Y$. In statisticsand machine learning, significant efforts have been put into developingmachinery to estimate correlations across variables efficiently. In causalinference, a large body of literature is concerned with the decomposition ofcausal effects under the rubric of mediation analysis. However, many variationsare spurious in nature, including different phenomena throughout the appliedsciences. Despite the statistical power to estimate correlations and theidentification power to decompose causal effects, there is still littleunderstanding of the properties of spurious associations and how they can bedecomposed in terms of the underlying causal mechanisms. In this manuscript, wedevelop formal tools for decomposing spurious variations in both Markovian andSemi-Markovian models. We prove the first results that allow a non-parametricdecomposition of spurious effects and provide sufficient conditions for theidentification of such decompositions. The described approach has severalapplications, ranging from explainable and fair AI to questions in epidemiologyand medicine, and we empirically demonstrate its use on a real-world dataset.", "output": "A Causal Framework for Decomposing Spurious Variations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The integration of language models for neural machine translation has beenextensively studied in the past. It has been shown that an external languagemodel, trained on additional target-side monolingual data, can help improvetranslation quality. However, there has always been the assumption that thetranslation model also learns an implicit target-side language model duringtraining, which interferes with the external language model at decoding time.Recently, some works on automatic speech recognition have demonstrated that, ifthe implicit language model is neutralized in decoding, further improvementscan be gained when integrating an external language model. In this work, wetransfer this concept to the task of machine translation and compare with themost prominent way of including additional monolingual data - namelyback-translation. We find that accounting for the implicit language modelsignificantly boosts the performance of language model fusion, although thisapproach is still outperformed by back-translation.", "output": "Improving Language Model Integration for Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The application of Algorithmic Recourse in decision-making is a promisingfield that offers practical solutions to reverse unfavorable decisions.However, the inability of these methods to consider potential dependenciesamong variables poses a significant challenge due to the assumption of featureindependence. Recent advancements have incorporated knowledge of causaldependencies, thereby enhancing the quality of the recommended recourseactions. Despite these improvements, the inability to incorporate the temporaldimension remains a significant limitation of these approaches. This isparticularly problematic as identifying and addressing the root causes ofundesired outcomes requires understanding time-dependent relationships betweenvariables. In this work, we motivate the need to integrate the temporaldimension into causal algorithmic recourse methods to enhance recommendations'plausibility and reliability. The experimental evaluation highlights thesignificance of the role of time in this field.", "output": "The Importance of Time in Causal Algorithmic Recourse."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Instruction tuning large language models (LLMs) remains a challenging task,owing to the complexity of hyperparameter selection and the difficulty involvedin evaluating the tuned models. To determine the optimal hyperparameters, anautomatic, robust, and reliable evaluation benchmark is essential. However,establishing such a benchmark is not a trivial task due to the challengesassociated with evaluation accuracy and privacy protection. In response tothese challenges, we introduce a judge large language model, named PandaLM,which is trained to distinguish the superior model given several LLMs.PandaLM's focus extends beyond just the objective correctness of responses,which is the main focus of traditional evaluation datasets. It addresses vitalsubjective factors such as relative conciseness, clarity, adherence toinstructions, comprehensiveness, and formality. To ensure the reliability ofPandaLM, we collect a diverse human-annotated test dataset, where all contextsare generated by humans and labels are aligned with human preferences. Ourresults indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluationability and 88.28% of GPT-4's in terms of F1-score on our test dataset. PandaLMenables the evaluation of LLM to be fairer but with less cost, evidenced bysignificant improvements achieved by models tuned through PandaLM compared totheir counterparts trained with default Alpaca's hyperparameters. In addition,PandaLM does not depend on API-based evaluations, thus avoiding potential dataleakage. All resources of PandaLM are released at", "output": "PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The federated learning (FL) technique was initially developed to mitigatedata privacy issues that can arise in the traditional machine learningparadigm. While FL ensures that a user's data always remain with the user, thegradients of the locally trained models must be communicated with thecentralized server to build the global model. This results in privacy leakage,where the server can infer private information of the users' data from theshared gradients. To mitigate this flaw, the next-generation FL architecturesproposed encryption and anonymization techniques to protect the model updatesfrom the server. However, this approach creates other challenges, such as amalicious user might sabotage the global model by sharing false gradients.Since the gradients are encrypted, the server is unable to identify andeliminate rogue users which would protect the global model. Therefore, tomitigate both attacks, this paper proposes a novel fully homomorphic encryption(FHE) based scheme suitable for FL. We modify the one-to-one single-keyCheon-Kim-Kim-Song (CKKS)-based FHE scheme into a distributed multi-keyadditive homomorphic encryption scheme that supports model aggregation in FL.We employ a novel aggregation scheme within the encrypted domain, utilizingusers' non-poisoning rates, to effectively address data poisoning attacks whileensuring privacy is preserved by the proposed encryption scheme. Rigoroussecurity, privacy, convergence, and experimental analyses have been provided toshow that FheFL is novel, secure, and private, and achieves comparable accuracyat reasonable computational cost.", "output": "FheFL: Fully Homomorphic Encryption Friendly Privacy-Preserving Federated Learning with Byzantine Users."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compared to sentence-level systems, document-level neural machine translation(NMT) models produce a more consistent output across a document and are able tobetter resolve ambiguities within the input. There are many works ondocument-level NMT, mostly focusing on modifying the model architecture ortraining strategy to better accommodate the additional context-input. On theother hand, in most works, the question on how to perform search with thetrained model is scarcely discussed, sometimes not mentioned at all. In thiswork, we aim to answer the question how to best utilize a context-awaretranslation model in decoding. We start with the most popular document-levelNMT approach and compare different decoding schemes, some from the literatureand others proposed by us. In the comparison, we are using both, standardautomatic metrics, as well as specific linguistic phenomena on three standarddocument-level translation benchmarks. We find that most commonly used decodingstrategies perform similar to each other and that higher quality contextinformation has the potential to further improve the translation.", "output": "On Search Strategies for Document-Level Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Explainable Artificial Intelligence (XAI) fills the role of a criticalinterface fostering interactions between sophisticated intelligent systems anddiverse individuals, including data scientists, domain experts, end-users, andmore. It aids in deciphering the intricate internal mechanisms of ``black box''Machine Learning (ML), rendering the reasons behind their decisions moreunderstandable. However, current research in XAI primarily focuses on twoaspects; ways to facilitate user trust, or to debug and refine the ML model.The majority of it falls short of recognising the diverse types of explanationsneeded in broader contexts, as different users and varied application areasnecessitate solutions tailored to their specific needs.One such domain is Predictive Maintenance (PdM), an exploding area ofresearch under the Industry 4.0 & 5.0 umbrella. This position paper highlightsthe gap between existing XAI methodologies and the specific requirements forexplanations within industrial applications, particularly the PredictiveMaintenance field. Despite explainability's crucial role, this subject remainsa relatively under-explored area, making this paper a pioneering attempt tobring relevant challenges to the research community's attention. We provide anoverview of predictive maintenance tasks and accentuate the need and varyingpurposes for corresponding explanations. We then list and describe XAItechniques commonly employed in the literature, discussing their suitabilityfor PdM tasks. Finally, to make the ideas and claims more concrete, wedemonstrate XAI applied in four specific industrial use cases: commercialvehicles, metro trains, steel plants, and wind farms, spotlighting areasrequiring further research.", "output": "Explainable Predictive Maintenance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image anonymization is widely adapted in practice to comply with privacyregulations in many regions. However, anonymization often degrades the qualityof the data, reducing its utility for computer vision development. In thispaper, we investigate the impact of image anonymization for training computervision models on key computer vision tasks (detection, instance segmentation,and pose estimation). Specifically, we benchmark the recognition drop on commondetection datasets, where we evaluate both traditional and realisticanonymization for faces and full bodies. Our comprehensive experiments reflectthat traditional image anonymization substantially impacts final modelperformance, particularly when anonymizing the full body. Furthermore, we findthat realistic anonymization can mitigate this decrease in performance, whereour experiments reflect a minimal performance drop for face anonymization. Ourstudy demonstrates that realistic anonymization can enable privacy-preservingcomputer vision development with minimal performance degradation across a rangeof important computer vision benchmarks.", "output": "Does Image Anonymization Impact Computer Vision Training?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quality Diversity (QD) algorithms have been proposed to search for a largecollection of both diverse and high-performing solutions instead of a singleset of local optima. While early QD algorithms view the objective anddescriptor functions as black-box functions, novel tools have been introducedto use gradient information to accelerate the search and improve overallperformance of those algorithms over continuous input spaces. However a broadrange of applications involve discrete spaces, such as drug discovery or imagegeneration. Exploring those spaces is challenging as they are combinatoriallylarge and gradients cannot be used in the same manner as in continuous spaces.We introduce map-elites with a Gradient-Informed Discrete Emitter (ME-GIDE),which extends QD optimisation with differentiable functions over discretesearch spaces. ME-GIDE leverages the gradient information of the objective anddescriptor functions with respect to its discrete inputs to proposegradient-informed updates that guide the search towards a diverse set of highquality solutions. We evaluate our method on challenging benchmarks includingprotein design and discrete latent space illumination and find that our methodoutperforms state-of-the-art QD algorithms in all benchmarks.", "output": "Gradient-Informed Quality Diversity for the Illumination of Discrete Spaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of optimizing a grey-box objective function, i.e.,nested function composed of both black-box and white-box functions. A generalformulation for such grey-box problems is given, which covers the existinggrey-box optimization formulations as special cases. We then design anoptimism-driven algorithm to solve it. Under certain regularity assumptions,our algorithm achieves similar regret bound as that for the standard black-boxBayesian optimization algorithm, up to a constant multiplicative term dependingon the Lipschitz constants of the functions considered. We further extend ourmethod to the constrained case and discuss several special cases. For thecommonly used kernel functions, the regret bounds allow us to derive aconvergence rate to the optimal solution. Experimental results show that ourgrey-box optimization method empirically improves the speed of finding theglobal optimal solution significantly, as compared to the standard black-boxoptimization algorithm.", "output": "Bayesian Optimization of Expensive Nested Grey-Box Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The emergence of large-language models (LLMs) that excel at code generationand commercial products such as GitHub's Copilot has sparked interest inhuman-AI pair programming (referred to as \"pAIr programming\") where an AIsystem collaborates with a human programmer. While traditional pair programmingbetween humans has been extensively studied, it remains uncertain whether itsfindings can be applied to human-AI pair programming. We compare human-humanand human-AI pair programming, exploring their similarities and differences ininteraction, measures, benefits, and challenges. We find that the effectivenessof both approaches is mixed in the literature (though the measures used forpAIr programming are not as comprehensive). We summarize moderating factors onthe success of human-human pair programming, which provides opportunities forpAIr programming research. For example, mismatched expertise makes pairprogramming less productive, therefore well-designed AI programming assistantsmay adapt to differences in expertise levels.", "output": "Is AI the better programming partner? Human-Human Pair Programming vs. Human-AI pAIr Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional robot task planning methods face challenges when dealing withhighly unstructured environments and complex tasks. We propose a task planningmethod that combines human expertise with an LLM and have designed an LLMprompt template, Think_Net_Prompt, with stronger expressive power to representstructured professional knowledge. We further propose a method to progressivelydecompose tasks and generate a task tree to reduce the planning volume for eachtask, and we have designed a strategy to decouple robot task planning. Bydividing different planning entities and separating the task from the actualmachine binding process, the task planning process becomes more flexible.Research results show that our method performs well in handling specified codeformats, understanding the relationship between tasks and subtasks, andextracting parameters from text descriptions. However, there are also problemssuch as limited complexity of task logic handling, ambiguity in the quantity ofparts and the precise location of assembly. Improving the precision of taskdescription and cognitive structure can bring certain improvements.", "output": "Robot Task Planning Based on Large Language Model Representing Knowledge with Directed Graph Structures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Owing to the impressive dot-product attention, the Transformers have been thedominant architectures in various natural language processing (NLP) tasks.Recently, the Receptance Weighted Key Value (RWKV) architecture follows anon-transformer architecture to eliminate the drawbacks of dot-productattention, where memory and computational complexity exhibits quadratic scalingwith sequence length. Although RWKV has exploited a linearly tensor-productattention mechanism and achieved parallelized computations by deploying thetime-sequential mode, it fails to capture long-range dependencies because ofits limitation on looking back at previous information, compared with fullinformation obtained by direct interactions in the standard transformer.Therefore, the paper devises the Retrospected Receptance Weighted Key Value(RRWKV) architecture via incorporating the retrospecting ability into the RWKVto effectively absorb information, which maintains memory and computationalefficiency as well.", "output": "RRWKV: Capturing Long-range Dependencies in RWKV."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-level context for neural machine translation (NMT) is crucial toimprove the translation consistency and cohesion, the translation of ambiguousinputs, as well as several other linguistic phenomena. Many works have beenpublished on the topic of document-level NMT, but most restrict the system toonly local context, typically including just the one or two preceding sentencesas additional information. This might be enough to resolve some ambiguousinputs, but it is probably not sufficient to capture some document-levelinformation like the topic or style of a conversation. When increasing thecontext size beyond just the local context, there are two challenges: (i)the~memory usage increases exponentially (ii) the translation performancestarts to degrade. We argue that the widely-used attention mechanism isresponsible for both issues. Therefore, we propose a constrained attentionvariant that focuses the attention on the most relevant parts of the sequence,while simultaneously reducing the memory consumption. For evaluation, weutilize targeted test sets in combination with novel evaluation techniques toanalyze the translations in regards to specific discourse-related phenomena. Wefind that our approach is a good compromise between sentence-level NMT vsattending to the full context, especially in low resource scenarios.", "output": "Improving Long Context Document-Level Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intelligent systems that aim at mastering language as humans do must dealwith its semantic underspecification, namely, the possibility for a linguisticsignal to convey only part of the information needed for communication tosucceed. Consider the usages of the pronoun they, which can leave the genderand number of its referent(s) underspecified. Semantic underspecification isnot a bug but a crucial language feature that boosts its storage and processingefficiency. Indeed, human speakers can quickly and effortlessly integratesemantically-underspecified linguistic signals with a wide range ofnon-linguistic information, e.g., the multimodal context, social or culturalconventions, and shared knowledge. Standard NLP models have, in principle, noor limited access to such extra information, while multimodal systems groundinglanguage into other modalities, such as vision, are naturally equipped toaccount for this phenomenon. However, we show that they struggle with it, whichcould negatively affect their performance and lead to harmful consequences whenused for applications. In this position paper, we argue that our communityshould be aware of semantic underspecification if it aims to develop languagetechnology that can successfully interact with human users. We discuss someapplications where mastering it is crucial and outline a few directions towardachieving this goal.", "output": "Dealing with Semantic Underspecification in Multimodal NLP."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Variational Autoencoder (VAE) is a seminal approach in deep generativemodeling with latent variables. Interpreting its reconstruction process as anonlinear transformation of samples from the latent posterior distribution, weapply the Unscented Transform (UT) -- a well-known distribution approximationused in the Unscented Kalman Filter (UKF) from the field of filtering. A finiteset of statistics called sigma points, sampled deterministically, provides amore informative and lower-variance posterior representation than theubiquitous noise-scaling of the reparameterization trick, while ensuringhigher-quality reconstruction. We further boost the performance by replacingthe Kullback-Leibler (KL) divergence with the Wasserstein distribution metricthat allows for a sharper posterior. Inspired by the two components, we derivea novel, deterministic-sampling flavor of the VAE, the Unscented Autoencoder(UAE), trained purely with regularization-like terms on the per-sampleposterior. We empirically show competitive performance in Fr'echet InceptionDistance (FID) scores over closely-related models, in addition to a lowertraining variance than the VAE.", "output": "Unscented Autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a wide range of multimodal tasks, contrastive learning has become aparticularly appealing approach since it can successfully learn representationsfrom abundant unlabeled data with only pairing information (e.g., image-captionor video-audio pairs). Underpinning these approaches is the assumption ofmulti-view redundancy - that shared information between modalities is necessaryand sufficient for downstream tasks. However, in many real-world settings,task-relevant information is also contained in modality-unique regions:information that is only present in one modality but still relevant to thetask. How can we learn self-supervised multimodal representations to captureboth shared and unique information relevant to downstream tasks? This paperproposes FactorCL, a new multimodal representation learning method to go beyondmulti-view redundancy. FactorCL is built from three new contributions: (1)factorizing task-relevant information into shared and unique representations,(2) capturing task-relevant information via maximizing MI lower bounds andremoving task-irrelevant information via minimizing MI upper bounds, and (3)multimodal data augmentations to approximate task relevance without labels. Onlarge-scale real-world datasets, FactorCL captures both shared and uniqueinformation and achieves state-of-the-art results on six benchmarks.", "output": "Factorized Contrastive Learning: Going Beyond Multi-view Redundancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adverse Event (ADE) extraction is one of the core tasks in digitalpharmacovigilance, especially when applied to informal texts. This task hasbeen addressed by the Natural Language Processing community using largepre-trained language models, such as BERT. Despite the great number ofTransformer-based architectures used in the literature, it is unclear which ofthem has better performances and why. Therefore, in this paper we perform anextensive evaluation and analysis of 19 Transformer-based models for ADEextraction on informal texts. We compare the performance of all the consideredmodels on two datasets with increasing levels of informality (forums posts andtweets). We also combine the purely Transformer-based models with twocommonly-used additional processing layers (CRF and LSTM), and analyze theireffect on the models performance. Furthermore, we use a well-establishedfeature importance technique (SHAP) to correlate the performance of the modelswith a set of features that describe them: model category (AutoEncoding,AutoRegressive, Text-to-Text), pretraining domain, training from scratch, andmodel size in number of parameters. At the end of our analyses, we identify alist of take-home messages that can be derived from the experimental data.", "output": "Extensive Evaluation of Transformer-based Architectures for Adverse Drug Events Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To improve the performance in identifying the faults under strong noise forrotating machinery, this paper presents a dynamic feature reconstruction signalgraph method, which plays the key role of the proposed end-to-end faultdiagnosis model. Specifically, the original mechanical signal is firstdecomposed by wavelet packet decomposition (WPD) to obtain multiple subbandsincluding coefficient matrix. Then, with originally defined two featureextraction factors MDD and DDD, a dynamic feature selection method based on L2energy norm (DFSL) is proposed, which can dynamically select the featurecoefficient matrix of WPD based on the difference in the distribution of normenergy, enabling each sub-signal to take adaptive signal reconstruction. Nextthe coefficient matrices of the optimal feature sub-bands are reconstructed andreorganized to obtain the feature signal graphs. Finally, deep features areextracted from the feature signal graphs by 2D-Convolutional neural network(2D-CNN). Experimental results on a public data platform of a bearing and ourlaboratory platform of robot grinding show that this method is better than theexisting methods under different noise intensities.", "output": "Fault Identification of Rotating Machinery Based on Dynamic Feature Reconstruction Signal Graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We tackle the task of conditional music generation. We introduce MusicGen, asingle Language Model (LM) that operates over several streams of compresseddiscrete music representation, i.e., tokens. Unlike prior work, MusicGen iscomprised of a single-stage transformer LM together with efficient tokeninterleaving patterns, which eliminates the need for cascading several models,e.g., hierarchically or upsampling. Following this approach, we demonstrate howMusicGen can generate high-quality samples, while being conditioned on textualdescription or melodic features, allowing better controls over the generatedoutput. We conduct extensive empirical evaluation, considering both automaticand human studies, showing the proposed approach is superior to the evaluatedbaselines on a standard text-to-music benchmark. Through ablation studies, weshed light over the importance of each of the components comprising MusicGen.Music samples, code, and models are available at", "output": "Simple and Controllable Music Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concentration of drivers on traffic is a vital safety issue; thus, monitoringa driver being on road becomes an essential requirement. The key purpose ofsupervision is to detect abnormal behaviours of the driver and promptly sendwarnings to him her for avoiding incidents related to traffic accidents. Inthis paper, to meet the requirement, based on radar sensors applications, theauthors first use a small sized millimetre wave radar installed at the steeringwheel of the vehicle to collect signals from different head movements of thedriver. The received signals consist of the reflection patterns that change inresponse to the head movements of the driver. Then, in order to distinguishthese different movements, a classifier based on the measured signal of theradar sensor is designed. However, since the collected data set is not large,in this paper, the authors propose One shot learning to classify four cases ofdriver's head movements. The experimental results indicate that the proposedmethod can classify the four types of cases according to the various headmovements of the driver with a high accuracy reaching up to 100. In addition,the classification performance of the proposed method is significantly betterthan that of the convolutional neural network model.", "output": "One shot learning based drivers head movement identification using a millimetre wave radar sensor."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When we exercise sequences of actions, their execution becomes more fluentand precise. Here, we consider the possibility that exercised action sequencescan also be used to make planning faster and more accurate by focusingexpansion of the search tree on paths that have been frequently used in thepast, and by reducing deep planning problems to shallow ones via multi-stepjumps in the tree. To capture such sequences, we use a flexible Bayesian actionchunking mechanism which finds and exploits statistically reliable structure atdifferent scales. This gives rise to shorter or longer routines that can beembedded into a Monte-Carlo tree search planner. We show the benefits of thisscheme using a physical construction task patterned after tangrams.", "output": "Habits of Mind: Reusing Action Sequences for Efficient Planning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing availability of graph-structured data motivates the task ofoptimising over functions defined on the node set of graphs. Traditional graphsearch algorithms can be applied in this case, but they may besample-inefficient and do not make use of information about the functionvalues; on the other hand, Bayesian optimisation is a class of promisingblack-box solvers with superior sample efficiency, but it has been scarcelybeen applied to such novel setups. To fill this gap, we propose a novelBayesian optimisation framework that optimises over functions defined ongeneric, large-scale and potentially unknown graphs. Through the learning ofsuitable kernels on graphs, our framework has the advantage of adapting to thebehaviour of the target function. The local modelling approach furtherguarantees the efficiency of our method. Extensive experiments on bothsynthetic and real-world graphs demonstrate the effectiveness of the proposedoptimisation framework.", "output": "Bayesian Optimisation of Functions on Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The introduction of computerized medical records in hospitals has reducedburdensome operations like manual writing and information fetching. However,the data contained in medical records are still far underutilized, primarilybecause extracting them from unstructured textual medical records takes timeand effort. Information Extraction, a subfield of Natural Language Processing,can help clinical practitioners overcome this limitation, using automatedtext-mining pipelines. In this work, we created the first Italianneuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it todevelop a Large Language Model for this task. Moreover, we conducted severalexperiments with three external independent datasets to implement an effectivemulticenter model, with overall F1-score 84.77%, Precision 83.16%, Recall86.44%. The lessons learned are: (i) the crucial role of a consistentannotation process and (ii) a fine-tuning strategy that combines classicalmethods with a \"few-shot\" approach. This allowed us to establish methodologicalguidelines that pave the way for future implementations in this field and allowItalian hospitals to tap into important research opportunities.", "output": "Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-dimensional deep neural network representations of images and conceptscan be aligned to predict human annotations of diverse stimuli. However, suchalignment requires the costly collection of behavioral responses, such that, inpractice, the deep-feature spaces are only ever sparsely sampled. Here, wepropose an active learning approach to adaptively sampling experimental stimulito efficiently learn a Bayesian matrix factorization model with deep sideinformation. We observe a significant efficiency gain over a passive baseline.Furthermore, with a sequential batched sampling strategy, the algorithm isapplicable not only to small datasets collected from traditional laboratoryexperiments but also to settings where large-scale crowdsourced data collectionis needed to accurately align the high-dimensional deep feature representationsderived from pre-trained networks.", "output": "Actively learning a Bayesian matrix fusion model with deep side information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over-generalization is a thorny issue in cognitive science, where people maybecome overly cautious due to past experiences. Agents in multi-agentreinforcement learning (MARL) also have been found to suffer relativeover-generalization (RO) as people do and stuck to sub-optimal cooperation.Recent methods have shown that assigning reasoning ability to agents canmitigate RO algorithmically and empirically, but there has been a lack oftheoretical understanding of RO, let alone designing provably RO-free methods.This paper first proves that RO can be avoided when the MARL method satisfies aconsistent reasoning requirement under certain conditions. Then we introduce anovel reasoning framework, called negotiated reasoning, that first builds theconnection between reasoning and RO with theoretical justifications. Afterthat, we propose an instantiated algorithm, Stein variational negotiatedreasoning (SVNR), which uses Stein variational gradient descent to derive anegotiation policy that provably avoids RO in MARL under maximum entropy policyiteration. The method is further parameterized with neural networks foramortized learning, making computation efficient. Numerical experiments on manyRO-challenged environments demonstrate the superiority and efficiency of SVNRcompared to state-of-the-art methods in addressing RO.", "output": "Negotiated Reasoning: On Provably Addressing Relative Over-Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-image generative models have enabled high-resolution image synthesisacross different domains, but require users to specify the content they wish togenerate. In this paper, we consider the inverse problem -- given a collectionof different images, can we discover the generative concepts that representeach image? We present an unsupervised approach to discover generative conceptsfrom a collection of images, disentangling different art styles in paintings,objects, and lighting from kitchen scenes, and discovering image classes givenImageNet images. We show how such generative concepts can accurately representthe content of images, be recombined and composed to generate new artistic andhybrid images, and be further used as a representation for downstreamclassification tasks.", "output": "Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There are increasing concerns about malicious attacks on autonomous vehicles.In particular, inaudible voice command attacks pose a significant threat asvoice commands become available in autonomous driving systems. How toempirically defend against these inaudible attacks remains an open question.Previous research investigates utilizing deep learning-based multimodal fusionfor defense, without considering the model uncertainty in trustworthiness. Asdeep learning has been applied to increasingly sensitive tasks, uncertaintymeasurement is crucial in helping improve model robustness, especially inmission-critical scenarios. In this paper, we propose the Multimodal FusionFramework (MFF) as an intelligent security system to defend against inaudiblevoice command attacks. MFF fuses heterogeneous audio-vision modalities usingVGG family neural networks and achieves the detection accuracy of 92.25% in thecomparative fusion method empirical study. Additionally, extensive experimentson audio-vision tasks reveal the model's uncertainty. Using ExpectedCalibration Errors, we measure calibration errors and Monte-Carlo Dropout toestimate the predictive distribution for the proposed models. Our findings showempirically to train robust multimodal models, improve standard accuracy andprovide a further step toward interpretability. Finally, we discuss the prosand cons of our approach and its applicability for Advanced Driver AssistanceSystems.", "output": "Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents the ADAIO team's system entry in the Building EducationalApplications (BEA) 2023 Shared Task on Generating AI Teacher Responses inEducational Dialogues. The task aims to assess the performance ofstate-of-the-art generative models as AI teachers in producing suitableresponses within a student-teacher dialogue. Our system comprises evaluatingvarious baseline models using OpenAI GPT-3 and designing diverse prompts toprompt the OpenAI models for teacher response generation. After the challenge,our system achieved second place by employing a few-shot prompt-based approachwith the OpenAI text-davinci-003 model. The results highlight the few-shotlearning capabilities of large-language models, particularly OpenAI's GPT-3, inthe role of AI teachers.", "output": "The ADAIO System at the BEA-2023 Shared Task on Generating AI Teacher Responses in Educational Dialogues."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Car-following is a control process in which a following vehicle (FV) adjustsits acceleration to keep a safe distance from the lead vehicle (LV). Recently,there has been a booming of data-driven models that enable more accuratemodeling of car-following through real-world driving datasets. Although thereare several public datasets available, their formats are not always consistent,making it challenging to determine the state-of-the-art models and how well anew model performs compared to existing ones. In contrast, research fields suchas image recognition and object detection have benchmark datasets likeImageNet, Microsoft COCO, and KITTI. To address this gap and promote thedevelopment of microscopic traffic flow modeling, we establish a publicbenchmark dataset for car-following behavior modeling. The benchmark consistsof more than 80K car-following events extracted from five public drivingdatasets using the same criteria. These events cover diverse situationsincluding different road types, various weather conditions, and mixed trafficflows with autonomous vehicles. Moreover, to give an overview of currentprogress in car-following modeling, we implemented and tested representativebaseline models with the benchmark. Results show that the deep deterministicpolicy gradient (DDPG) based model performs competitively with a lower MSE forspacing compared to traditional intelligent driver model (IDM) andGazis-Herman-Rothery (GHR) models, and a smaller collision rate compared tofully connected neural network (NN) and long short-term memory (LSTM) models inmost datasets. The established benchmark will provide researchers withconsistent data formats and metrics for cross-comparing different car-followingmodels, promoting the development of more accurate models. We open-source ourdataset and implementation code in", "output": "FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many complex multi-agent systems such as robot swarms control and autonomousvehicle coordination can be modeled as Multi-Agent Reinforcement Learning(MARL) tasks. QMIX, a widely popular MARL algorithm, has been used as abaseline for the benchmark environments, e.g., Starcraft Multi-Agent Challenge(SMAC), Difficulty-Enhanced Predator-Prey (DEPP). Recent variants of QMIXtarget relaxing the monotonicity constraint of QMIX, allowing for performanceimprovement in SMAC. In this paper, we investigate the code-level optimizationsof these variants and the monotonicity constraint. (1) We find that suchimprovements of the variants are significantly affected by various code-leveloptimizations. (2) The experiment results show that QMIX with normalizedoptimizations outperforms other works in SMAC; (3) beyond the common wisdomfrom these works, the monotonicity constraint can improve sample efficiency inSMAC and DEPP. We also discuss why monotonicity constraints work well in purelycooperative tasks with a theoretical analysis. We open-source the code aturl{", "output": "Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the promising results achieved, state-of-the-art interactivereinforcement learning schemes rely on passively receiving supervision signalsfrom advisor experts, in the form of either continuous monitoring orpre-defined rules, which inevitably result in a cumbersome and expensivelearning process. In this paper, we introduce a novel initiativeadvisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces theunilateral advisor-guidance mechanism with a bidirectional learner-initiativeone, and thereby enables a customized and efficacious message exchange betweenlearner and advisor. At the heart of Ask-AC are two complementary components,namely action requester and adaptive state selector, that can be readilyincorporated into various discrete actor-critic architectures. The formercomponent allows the agent to initiatively seek advisor intervention in thepresence of uncertain states, while the latter identifies the unstable statespotentially missed by the former especially when environment changes, and thenlearns to promote the ask action on such states. Experimental results on bothstationary and non-stationary environments and across different actor-criticbackbones demonstrate that the proposed framework significantly improves thelearning efficiency of the agent, and achieves the performances on par withthose obtained by continuous advisor monitoring.", "output": "Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of physics-informed neural networks (PINNs) inapproximating partial differential equations (PDEs), PINNs can sometimes failto converge to the correct solution in problems involving complicated PDEs.This is reflected in several recent studies on characterizing the \"failuremodes\" of PINNs, although a thorough understanding of the connection betweenPINN failure modes and sampling strategies is missing. In this paper, weprovide a novel perspective of failure modes of PINNs by hypothesizing thattraining PINNs relies on successful \"propagation\" of solution from initialand/or boundary condition points to interior points. We show that PINNs withpoor sampling strategies can get stuck at trivial solutions if there arepropagation failures, characterized by highly imbalanced PDE residual fields.To mitigate propagation failures, we propose a novel Retain-Resample-Releasesampling (R3) algorithm that can incrementally accumulate collocation points inregions of high PDE residuals with little to no computational overhead. Weprovide an extension of R3 sampling to respect the principle of causality whilesolving time-dependent PDEs. We theoretically analyze the behavior of R3sampling and empirically demonstrate its efficacy and efficiency in comparisonwith baselines on a variety of PDE problems.", "output": "Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Evacuation planning is a crucial part of disaster management. However, jointoptimization of its two essential components, routing and scheduling, withobjectives such as minimizing average evacuation time or evacuation completiontime, is a computationally hard problem. To approach it, we present MIP-LNS, ascalable optimization method that utilizes heuristic search with mathematicaloptimization and can optimize a variety of objective functions. We also presentthe method MIP-LNS-SIM, where we combine agent-based simulation with MIP-LNS toestimate delays due to congestion, as well as, find optimized plans consideringsuch delays. We use Harris County in Houston, Texas, as our study area. We showthat, within a given time limit, MIP-LNS finds better solutions than existingmethods in terms of three different metrics. However, when congestion dependentdelay is considered, MIP-LNS-SIM outperforms MIP-LNS in multiple performancemetrics. In addition, MIP-LNS-SIM has a significantly lower percent error inestimated evacuation completion time compared to MIP-LNS.", "output": "Simulation-Assisted Optimization for Large-Scale Evacuation Planning with Congestion-Dependent Delays."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The problem of anticipating human actions is an inherently uncertain one.However, we can reduce this uncertainty if we have a sense of the goal that theactor is trying to achieve. Here, we present an action anticipation model thatleverages goal information for the purpose of reducing the uncertainty infuture predictions. Since we do not possess goal information or the observedactions during inference, we resort to visual representation to encapsulateinformation about both actions and goals. Through this, we derive a novelconcept called abstract goal which is conditioned on observed sequences ofvisual features for action anticipation. We design the abstract goal as adistribution whose parameters are estimated using a variational recurrentnetwork. We sample multiple candidates for the next action and introduce a goalconsistency measure to determine the best candidate that follows from theabstract goal. Our method obtains impressive results on the very challengingEpic-Kitchens55 (EK55), EK100, and EGTEA Gaze+ datasets. We obtain absoluteimprovements of +13.69, +11.24, and +5.19 for Top-1 verb, Top-1 noun, and Top-1action anticipation accuracy respectively over prior state-of-the-art methodsfor seen kitchens (S1) of EK55. Similarly, we also obtain significantimprovements in the unseen kitchens (S2) set for Top-1 verb (+10.75), noun(+5.84) and action (+2.87) anticipation. Similar trend is observed for EGTEAGaze+ dataset, where absolute improvement of +9.9, +13.1 and +6.8 is obtainedfor noun, verb, and action anticipation. It is through the submission of thispaper that our method is currently the new state-of-the-art for actionanticipation in EK55 and EGTEA Gaze+ Code available at", "output": "Predicting the Next Action by Modeling the Abstract Goal."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fair representation learning (FRL) is a popular class of methods aiming toproduce fair classifiers via data preprocessing. Recent regulatory directivesstress the need for FRL methods that provide practical certificates, i.e.,provable upper bounds on the unfairness of any downstream classifier trained onpreprocessed data, which directly provides assurance in a practical scenario.Creating such FRL methods is an important challenge that remains unsolved. Inthis work, we address that challenge and introduce FARE (Fairness withRestricted Encoders), the first FRL method with practical fairnesscertificates. FARE is based on our key insight that restricting therepresentation space of the encoder enables the derivation of practicalguarantees, while still permitting favorable accuracy-fairness tradeoffs forsuitable instantiations, such as one we propose based on fair trees. To producea practical certificate, we develop and apply a statistical procedure thatcomputes a finite sample high-confidence upper bound on the unfairness of anydownstream classifier trained on FARE embeddings. In our comprehensiveexperimental evaluation, we demonstrate that FARE produces practicalcertificates that are tight and often even comparable with purely empiricalresults obtained by prior methods, which establishes the practical value of ourapproach.", "output": "FARE: Provably Fair Representation Learning with Practical Certificates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classical artificial neural networks have witnessed widespread successes inmachine-learning applications. Here, we propose fermion neural networks (FNNs)whose physical properties, such as local density of states or conditionalconductance, serve as outputs, once the inputs are incorporated as an initiallayer. Comparable to back-propagation, we establish an efficient optimization,which entitles FNNs to competitive performance on challenging machine-learningbenchmarks. FNNs also directly apply to quantum systems, including hard oneswith interactions, and offer in-situ analysis without preprocessing orpresumption. Following machine learning, FNNs precisely determine topologicalphases and emergent charge orders. Their quantum nature also brings variousadvantages: quantum correlation entitles more general network connectivity andinsight into the vanishing gradient problem, quantum entanglement opens upnovel avenues for interpretable machine learning, etc.", "output": "A fermion neural network with efficient optimization and quantum applicability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When people think of everyday things like an egg, they typically have amental image associated with it. This allows them to correctly judge, forexample, that \"the yolk surrounds the shell\" is a false statement. Do languagemodels similarly have a coherent picture of such everyday things? Toinvestigate this, we propose a benchmark dataset consisting of 100 everydaythings, their parts, and the relationships between these parts, expressed as11,720 \"X relation Y?\" true/false questions. Using these questions as probes,we observe that state-of-the-art pre-trained language models (LMs) like GPT-3and Macaw have fragments of knowledge about these everyday things, but do nothave fully coherent \"parts mental models\" (54-59% accurate, 19-43% conditionalconstraint violation). We propose an extension where we add a constraintsatisfaction layer on top of the LM's raw predictions to apply commonsenseconstraints. As well as removing inconsistencies, we find that this alsosignificantly improves accuracy (by 16-20%), suggesting how the incoherence ofthe LM's pictures of everyday things can be significantly reduced.", "output": "Do language models have coherent mental models of everyday things?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, indoor human presence detection based on supervised learning(SL) and channel state information (CSI) has attracted much attention. However,existing studies that rely on spatial information of CSI are susceptible toenvironmental changes which degrade prediction accuracy. Moreover, SL-basedmethods require time-consuming data labeling for retraining models. Therefore,it is imperative to design a continuously monitored model using asemi-supervised learning (SSL) based scheme. In this paper, we conceive abifold teacher-student (BTS) learning approach for indoor human presencedetection in an adjoining two-room scenario. The proposed SSL-based primal-dualteacher-student network intelligently learns spatial and temporal features fromlabeled and unlabeled CSI datasets. Additionally, the enhanced penalized lossfunction leverages entropy and distance measures to distinguish drifted data,i.e., features of new datasets affected by time-varying effects and alteredfrom the original distribution. Experimental results demonstrate that theproposed BTS system sustains asymptotic accuracy after retraining the modelwith unlabeled data. Furthermore, BTS outperforms existing SSL-based models interms of the highest detection accuracy while achieving the asymptoticperformance of SL-based methods.", "output": "BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans naturally exploit haptic feedback during contact-rich tasks likeloading a dishwasher or stocking a bookshelf. Current robotic systems focus onavoiding unexpected contact, often relying on strategically placed environmentsensors. Recently, contact-exploiting manipulation policies have been trainedin simulation and deployed on real robots. However, they require some form ofreal-world adaptation to bridge the sim-to-real gap, which might not befeasible in all scenarios. In this paper we train a contact-exploitingmanipulation policy in simulation for the contact-rich household task ofloading plates into a slotted holder, which transfers without any fine-tuningto the real robot. We investigate various factors necessary for this zero-shottransfer, like time delay modeling, memory representation, and domainrandomization. Our policy transfers with minimal sim-to-real gap andsignificantly outperforms heuristic and learnt baselines. It also generalizesto plates of different sizes and weights. Demonstration videos and code areavailable at ", "output": "Zero-Shot Transfer of Haptics-Based Object Insertion Policies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The optimized certainty equivalent (OCE) is a family of risk measures thatcover important examples such as entropic risk, conditional value-at-risk andmean-variance models. In this paper, we propose a new episodic risk-sensitivereinforcement learning formulation based on tabular Markov decision processeswith recursive OCEs. We design an efficient learning algorithm for this problembased on value iteration and upper confidence bound. We derive an upper boundon the regret of the proposed algorithm, and also establish a minimax lowerbound. Our bounds show that the regret rate achieved by our proposed algorithmhas optimal dependence on the number of episodes and the number of actions.", "output": "Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In digital marketing, experimenting with new website content is one of thekey levers to improve customer engagement. However, creating successfulmarketing content is a manual and time-consuming process that lacks clearguiding principles. This paper seeks to close the loop between content creationand online experimentation by offering marketers AI-driven actionable insightsbased on historical data to improve their creative process. We present aneural-network-based system that scores and extracts insights from a marketingcontent design, namely, a multimodal neural network predicts the attractivenessof marketing contents, and a post-hoc attribution method generates actionableinsights for marketers to improve their content in specific marketinglocations. Our insights not only point out the advantages and drawbacks of agiven current content, but also provide design recommendations based onhistorical data. We show that our scoring model and insights work well bothquantitatively and qualitatively.", "output": "Neural Insights for Digital Marketing Content Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-centric computer vision (HCCV) data curation practices often neglectprivacy and bias concerns, leading to dataset retractions and unfair models.Further, HCCV datasets constructed through nonconsensual web scraping lack thenecessary metadata for comprehensive fairness and robustness evaluations.Current remedies address issues post hoc, lack persuasive justification foradoption, or fail to provide proper contextualization for appropriateapplication. Our research focuses on proactive, domain-specific recommendationsfor curating HCCV datasets, addressing privacy and bias. We adopt an ante hocreflective perspective and draw from current practices and guidelines, guidedby the ethical framework of principlism.", "output": "Principlism Guided Responsible Data Curation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the grand challenges of cell biology is inferring the gene regulatorynetwork (GRN) which describes interactions between genes and their productsthat control gene expression and cellular function. We can treat this as acausal discovery problem but with two non-standard challenges: (1) regulatorynetworks are inherently cyclic so we should not model a GRN as a directedacyclic graph (DAG), and (2) observations have significant measurement noise,so for typical sample sizes there will always be a large equivalence class ofgraphs that are likely given the data, and we want methods that capture thisuncertainty. Existing methods either focus on challenge (1), identifying cyclicstructure from dynamics, or on challenge (2) learning complex Bayesianposteriors over DAGs, but not both. In this paper we leverage the fact that itis possible to estimate the \"velocity\" of gene expression with RNA velocitytechniques to develop an approach that addresses both challenges. Because wehave access to velocity information, we can treat the Bayesian structurelearning problem as a problem of sparse identification of a dynamical system,capturing cyclic feedback loops through time. Since our objective is to modeluncertainty over discrete structures, we leverage Generative Flow Networks(GFlowNets) to estimate the posterior distribution over the combinatorial spaceof possible sparse dependencies. Our results indicate that our method learnsposteriors that better encapsulate the distributions of cyclic structurescompared to counterpart state-of-the-art Bayesian structure learningapproaches.", "output": "DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since their introduction, diffusion models have quickly become the prevailingapproach to generative modeling in many domains. They can be interpreted aslearning the gradients of a time-varying sequence of log-probability densityfunctions. This interpretation has motivated classifier-based andclassifier-free guidance as methods for post-hoc control of diffusion models.In this work, we build upon these ideas using the score-based interpretation ofdiffusion models, and explore alternative ways to condition, modify, and reusediffusion models for tasks involving compositional generation and guidance. Inparticular, we investigate why certain types of composition fail using currenttechniques and present a number of solutions. We conclude that the sampler (notthe model) is responsible for this failure and propose new samplers, inspiredby MCMC, which enable successful compositional generation. Further, we proposean energy-based parameterization of diffusion models which enables the use ofnew compositional operators and more sophisticated, Metropolis-correctedsamplers. Intriguingly we find these samplers lead to notable improvements incompositional generation across a wide set of problems such asclassifier-guided ImageNet modeling and compositional text-to-image generation.", "output": "Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When interacting with people, AI agents do not just influence the state ofthe world -- they also influence the actions people take in response to theagent, and even their underlying intentions and strategies. Accounting for andleveraging this influence has mostly been studied in settings where it issufficient to assume that human behavior is near-optimal: competitive games, orgeneral-sum settings like autonomous driving alongside human drivers. Instead,we focus on influence in settings where there is a need to capture humansuboptimality. For instance, imagine a collaborative task in which, due eitherto cognitive biases or lack of information, people do not perform very well --how could an agent influence them towards more optimal behavior? Assumingnear-optimal human behavior will not work here, and so the agent needs to learnfrom real human data. But experimenting online with humans is potentiallyunsafe, and creating a high-fidelity simulator of the environment is oftenimpractical. Hence, we focus on learning from an offline dataset of human-humaninteractions. Our observation is that offline reinforcement learning (RL) canlearn to effectively influence suboptimal humans by extending and combiningelements of observed human-human behavior. We demonstrate that offline RL cansolve two challenges with effective influence. First, we show that by learningfrom a dataset of suboptimal human-human interaction on a variety of tasks --none of which contains examples of successful influence -- an agent can learninfluence strategies to steer humans towards better performance even on newtasks. Second, we show that by also modeling and conditioning on humanbehavior, offline RL can learn to affect not just the human's actions but alsotheir underlying strategy, and adapt to changes in their strategy.", "output": "Learning to Influence Human Behavior with Offline Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The use of Shap scores has become widespread in Explainable AI. However,their computation is in general intractable, in particular when done with ablack-box classifier, such as neural network. Recent research has unveiledclasses of open-box Boolean Circuit classifiers for which Shap can be computedefficiently. We show how to transform binary neural networks into thosecircuits for efficient Shap computation. We use logic-based knowledgecompilation techniques. The performance gain is huge, as we show in the lightof our experiments.", "output": "Efficient Computation of Shap Explanation Scores for Neural Network Classifiers via Knowledge Compilation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper addresses fine-grained object detection in scenarios with limitedcomputing resources, such as edge computing. In particular, we focus on ascenario where a single image contains objects of the same category but varyingsizes, and we desire an algorithm that can not only recognize the physicalclass of objects but also detect their size. Deep learning (DL), particularlythrough the use of deep neural networks (DNNs), has become the primary approachto object detection. However, obtaining accurate fine-grained detectionrequires a large DNN model and a significant amount of annotated data,presenting a challenge to solve our problem particularly forresource-constrained scenarios. To this end, we propose an approach thatutilizes commonsense knowledge to assist a coarse-grained object detector inachieving accurate size-related fine-grained detection results. Specifically,we introduce a commonsense knowledge inference module (CKIM) that processes thecoarse-grained labels produced by a benchmark coarse-grained DL detector togenerate size-related fine-grained labels. Our CKIM explores both crisp-ruleand fuzzy-rule based inference methods, with the latter being employed tohandle ambiguity in the target size-related labels. We implement our methodbased on two modern DL detectors, including Mobilenet-SSD, and YOLOv7-tiny.Experimental results demonstrate that our approach achieves accuratefine-grained detections with a reduced amount of annotated data, and smallermodel size. Our code is available at ", "output": "Commonsense Knowledge Assisted Deep Learning with Application to Size-Related Fine-Grained Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial agents have traditionally been trained to maximize reward, whichmay incentivize power-seeking and deception, analogous to how next-tokenprediction in language models (LMs) may incentivize toxicity. So do agentsnaturally learn to be Machiavellian? And how do we measure these behaviors ingeneral-purpose models such as GPT-4? Towards answering these questions, weintroduce MACHIAVELLI, a benchmark of 134 Choose-Your-Own-Adventure gamescontaining over half a million rich, diverse scenarios that center on socialdecision-making. Scenario labeling is automated with LMs, which are moreperformant than human annotators. We mathematize dozens of harmful behaviorsand use our annotations to evaluate agents' tendencies to be power-seeking,cause disutility, and commit ethical violations. We observe some tensionbetween maximizing reward and behaving ethically. To improve this trade-off, weinvestigate LM-based methods to steer agents' towards less harmful behaviors.Our results show that agents can both act competently and morally, so concreteprogress can currently be made in machine ethics--designing agents that arePareto improvements in both safety and capabilities.", "output": "Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models generate fluent texts and can follow natural languageinstructions to solve a wide range of tasks without task-specific training.Nevertheless, it is notoriously difficult to control their generation tosatisfy the various constraints required by different applications. In thiswork, we present InstructCTG, a controlled text generation framework thatincorporates different constraints by conditioning on natural languagedescriptions and demonstrations of the constraints. In particular, we firstextract the underlying constraints of natural texts through a combination ofoff-the-shelf NLP tools and simple heuristics. We then verbalize theconstraints into natural language instructions to form weakly supervisedtraining data. By prepending natural language descriptions of the constraintsand a few demonstrations, we fine-tune a pre-trained language model toincorporate various types of constraints. Compared to existing search-based orscore-based methods, InstructCTG is more flexible to different constraint typesand has a much smaller impact on the generation quality and speed because itdoes not modify the decoding procedure. Additionally, InstructCTG allows themodel to adapt to new constraints without re-training through the use offew-shot task generalization and in-context learning abilities ofinstruction-tuned language models.", "output": "Controlled Text Generation with Natural Language Instructions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt engineering relevance research has seen a notable surge in recentyears, primarily driven by advancements in pre-trained language models andlarge language models. However, a critical issue has been identified withinthis domain: the inadequate of sensitivity and robustness of these modelstowards Prompt Templates, particularly in lesser-studied languages such asJapanese. This paper explores this issue through a comprehensive evaluation ofseveral representative Large Language Models (LLMs) and a widely-utilizedpre-trained model(PLM). These models are scrutinized using a benchmark datasetin Japanese, with the aim to assess and analyze the performance of the currentmultilingual models in this context. Our experimental results reveal startlingdiscrepancies. A simple modification in the sentence structure of the PromptTemplate led to a drastic drop in the accuracy of GPT-4 from 49.21 to 25.44.This observation underscores the fact that even the highly performance GPT-4model encounters significant stability issues when dealing with diverseJapanese prompt templates, rendering the consistency of the model's outputresults questionable. In light of these findings, we conclude by proposingpotential research trajectories to further enhance the development andperformance of Large Language Models in their current stage.", "output": "Sensitivity and Robustness of Large Language Models to Prompt Template in Japanese Text Classification Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The proximal policy optimization (PPO) algorithm stands as one of the mostprosperous methods in the field of reinforcement learning (RL). Despite itssuccess, the theoretical understanding of PPO remains deficient. Specifically,it is unclear whether PPO or its optimistic variants can effectively solvelinear Markov decision processes (MDPs), which are arguably the simplest modelsin RL with function approximation. To bridge this gap, we propose an optimisticvariant of PPO for episodic adversarial linear MDPs with full-informationfeedback, and establish a $tilde{mathcal{O}}(d^{3/4}H^2K^{3/4})$ regret forit. Here $d$ is the ambient dimension of linear MDPs, $H$ is the length of eachepisode, and $K$ is the number of episodes. Compared with existing policy-basedalgorithms, we achieve the state-of-the-art regret bound in both stochasticlinear MDPs and adversarial linear MDPs with full information. Additionally,our algorithm design features a novel multi-batched updating mechanism and thetheoretical analysis utilizes a new covering number argument of value andpolicy classes, which might be of independent interest.", "output": "A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) do not differentially represent numbers, whichare pervasive in text. In contrast, neuroscience research has identifieddistinct neural representations for numbers and words. In this work, weinvestigate how well popular LLMs capture the magnitudes of numbers (e.g., that$4 < 5$) from a behavioral lens. Prior research on the representationalcapabilities of LLMs evaluates whether they show human-level performance, forinstance, high overall accuracy on standard benchmarks. Here, we ask adifferent question, one inspired by cognitive science: How closely do thenumber representations of LLMscorrespond to those of human language users, whotypically demonstrate the distance, size, and ratio effects? We depend on alinking hypothesis to map the similarities among the model embeddings of numberwords and digits to human response times. The results reveal surprisinglyhuman-like representations across language models of different architectures,despite the absence of the neural circuitry that directly supports theserepresentations in the human brain. This research shows the utility ofunderstanding LLMs using behavioral benchmarks and points the way to futurework on the number representations of LLMs and their cognitive plausibility.", "output": "Numeric Magnitude Comparison Effects in Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identifying molecules that exhibit some pre-specified properties is adifficult problem to solve. In the last few years, deep generative models havebeen used for molecule generation. Deep Graph Variational Autoencoders areamong the most powerful machine learning tools with which it is possible toaddress this problem. However, existing methods struggle in capturing the truedata distribution and tend to be computationally expensive. In this work, wepropose RGCVAE, an efficient and effective Graph Variational Autoencoder basedon: (i) an encoding network exploiting a new powerful Relational GraphIsomorphism Network; (ii) a novel probabilistic decoding component. Compared toseveral state-of-the-art VAE methods on two widely adopted datasets, RGCVAEshows state-of-the-art molecule generation performance while beingsignificantly faster to train.", "output": "RGCVAE: Relational Graph Conditioned Variational Autoencoder for Molecule Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Conversion rate (CVR) prediction is one of the core components in onlinerecommender systems, and various approaches have been proposed to obtainaccurate and well-calibrated CVR estimation. However, we observe that awell-trained CVR prediction model often performs sub-optimally during salespromotions. This can be largely ascribed to the problem of the datadistribution shift, in which the conventional methods no longer work. To thisend, we seek to develop alternative modeling techniques for CVR prediction.Observing similar purchase patterns across different promotions, we proposereusing the historical promotion data to capture the promotional conversionpatterns. Herein, we propose a novel textbf{H}istorical textbf{D}atatextbf{R}euse (textbf{HDR}) approach that first retrieves historicallysimilar promotion data and then fine-tunes the CVR prediction model with theacquired data for better adaptation to the promotion mode. HDR consists ofthree components: an automated data retrieval module that seeks similar datafrom historical promotions, a distribution shift correction module thatre-weights the retrieved data for better aligning with the target promotion,and a TransBlock module that quickly fine-tunes the original model for betteradaptation to the promotion mode. Experiments conducted with real-world datademonstrate the effectiveness of HDR, as it improves both ranking andcalibration metrics to a large extent. HDR has also been deployed on thedisplay advertising system in Alibaba, bringing a lift of $9%$ RPM and $16%$CVR during Double 11 Sales in 2022.", "output": "Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The emergent few-shot reasoning capabilities of Large Language Models (LLMs)have excited the natural language and machine learning community over recentyears. Despite of numerous successful applications, the underlying mechanism ofsuch in-context capabilities still remains unclear. In this work, wehypothesize that the learned textit{semantics} of language tokens do the mostheavy lifting during the reasoning process. Different from human's symbolicreasoning process, the semantic representations of LLMs could create strongconnections among tokens, thus composing a superficial logical chain. To testour hypothesis, we decouple semantics from the language reasoning process andevaluate three kinds of reasoning abilities, i.e., deduction, induction andabduction. Our findings reveal that semantics play a vital role in LLMs'in-context reasoning -- LLMs perform significantly better when semantics areconsistent with commonsense but struggle to solve symbolic orcounter-commonsense reasoning tasks by leveraging in-context new knowledge. Thesurprising observations question whether modern LLMs have mastered theinductive, deductive and abductive reasoning abilities as in humanintelligence, and motivate research on unveiling the magic existing within theblack-box LLMs. On the whole, our analysis provides a novel perspective on therole of semantics in developing and evaluating language models' reasoningabilities. Code is available at {url{", "output": "Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models are powerful generative models but suffer from slowsampling, often taking 1000 sequential denoising steps for one sample. As aresult, considerable efforts have been directed toward reducing the number ofdenoising steps, but these methods hurt sample quality. Instead of reducing thenumber of denoising steps (trading quality for speed), in this paper we explorean orthogonal approach: can we run the denoising steps in parallel (tradingcompute for speed)? In spite of the sequential nature of the denoising steps,we show that surprisingly it is possible to parallelize sampling via Picarditerations, by guessing the solution of future denoising steps and iterativelyrefining until convergence. With this insight, we present ParaDiGMS, a novelmethod to accelerate the sampling of pretrained diffusion models by denoisingmultiple steps in parallel. ParaDiGMS is the first diffusion sampling methodthat enables trading compute for speed and is even compatible with existingfast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, weimprove sampling speed by 2-4x across a range of robotics and image generationmodels, giving state-of-the-art sampling speeds of 0.2s on 100-stepDiffusionPolicy and 16s on 1000-step StableDiffusion-v2 with no measurabledegradation of task reward, FID score, or CLIP score.", "output": "Parallel Sampling of Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Forecasting physical signals in long time range is among the most challengingtasks in Partial Differential Equations (PDEs) research. To circumventlimitations of traditional solvers, many different Deep Learning methods havebeen proposed. They are all based on auto-regressive methods and exhibitstability issues. Drawing inspiration from the stability property of implicitnumerical schemes, we introduce a stable auto-regressive implicit neuralnetwork. We develop a theory based on the stability definition of schemes toensure the stability in forecasting of this network. It leads us to introducehard constraints on its weights and propagate the dynamics in the latent space.Our experimental results validate our stability property, and show improvedresults at long-term forecasting for two transports PDEs.", "output": "Stability of implicit neural networks for long-term forecasting in dynamical systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of large language models (LLMs) such as ChatGPT has brought alot of attention recently. However, their evaluation in the benchmark academicdatasets remains under-explored due to the difficulty of evaluating thegenerative outputs produced by this model against the ground truth. In thispaper, we aim to present a thorough evaluation of ChatGPT's performance ondiverse academic datasets, covering tasks like question-answering, textsummarization, code generation, commonsense reasoning, mathematicalproblem-solving, machine translation, bias detection, and ethicalconsiderations. Specifically, we evaluate ChatGPT across 140 tasks and analyze255K responses it generates in these datasets. This makes our work the largestevaluation of ChatGPT in NLP benchmarks. In short, our study aims to validatethe strengths and weaknesses of ChatGPT in various tasks and provide insightsfor future research using LLMs. We also report a new emergent ability to followmulti-query instructions that we mostly found in ChatGPT and otherinstruction-tuned models. Our extensive evaluation shows that even thoughChatGPT is capable of performing a wide variety of tasks, and may obtainimpressive performance in several benchmark datasets, it is still far fromachieving the ability to reliably solve many challenging tasks. By providing athorough assessment of ChatGPT's performance across diverse NLP tasks, thispaper sets the stage for a targeted deployment of ChatGPT-like LLMs inreal-world applications.", "output": "A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a sequential regression setting, a decision-maker may be primarilyconcerned with whether the future observation will increase or decreasecompared to the current one, rather than the actual value of the futureobservation. In this context, we introduce the notion of parity calibration,which captures the goal of calibrated forecasting for the increase-decrease (or\"parity\") event in a timeseries. Parity probabilities can be extracted from aforecasted distribution for the output, but we show that such a strategy leadsto theoretical unpredictability and poor practical performance. We then observethat although the original task was regression, parity calibration can beexpressed as binary calibration. Drawing on this connection, we use an onlinebinary calibration method to achieve parity calibration. We demonstrate theeffectiveness of our approach on real-world case studies in epidemiology,weather forecasting, and model-based control in nuclear fusion.", "output": "Parity Calibration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The growing interest in language-conditioned robot manipulation aims todevelop robots capable of understanding and executing complex tasks, with theobjective of enabling robots to interpret language commands and manipulateobjects accordingly. While language-conditioned approaches demonstrateimpressive capabilities for addressing tasks in familiar environments, theyencounter limitations in adapting to unfamiliar environment settings. In thisstudy, we propose a general-purpose, language-conditioned approach thatcombines base skill priors and imitation learning under unstructured data toenhance the algorithm's generalization in adapting to unfamiliar environments.We assess our model's performance in both simulated and real-world environmentsusing a zero-shot setting. In the simulated environment, the proposed approachsurpasses previously reported scores for CALVIN benchmark, especially in thechallenging Zero-Shot Multi-Environment setting. The average completed tasklength, indicating the average number of tasks the agent can continuouslycomplete, improves more than 2.5 times compared to the state-of-the-art methodHULC. In addition, we conduct a zero-shot evaluation of our policy in areal-world setting, following training exclusively in simulated environmentswithout additional specific adaptations. In this evaluation, we set up tentasks and achieved an average 30% improvement in our approach compared to thecurrent state-of-the-art approach, demonstrating a high generalizationcapability in both simulated environments and the real world. For furtherdetails, including access to our code and videos, please refer to", "output": "Language-Conditioned Imitation Learning with Base Skill Priors under Unstructured Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We tackle the problem of graph out-of-distribution (OOD) generalization.Existing graph OOD algorithms either rely on restricted assumptions or fail toexploit environment information in training data. In this work, we propose tosimultaneously incorporate label and environment causal independence (LECI) tofully make use of label and environment information, thereby addressing thechallenges faced by prior methods on identifying causal and invariantsubgraphs. We further develop an adversarial training strategy to jointlyoptimize these two properties for causal subgraph discovery with theoreticalguarantees. Extensive experiments and analysis show that LECI significantlyoutperforms prior methods on both synthetic and real-world datasets,establishing LECI as a practical and effective solution for graph OODgeneralization.", "output": "Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Integrating ethical practices into the AI development process for artificialintelligence (AI) is essential to ensure safe, fair, and responsible operation.AI ethics involves applying ethical principles to the entire life cycle of AIsystems. This is essential to mitigate potential risks and harms associatedwith AI, such as algorithm biases. To achieve this goal, responsible designpatterns (RDPs) are critical for Machine Learning (ML) pipelines to guaranteeethical and fair outcomes. In this paper, we propose a comprehensive frameworkincorporating RDPs into ML pipelines to mitigate risks and ensure the ethicaldevelopment of AI systems. Our framework comprises new responsible AI designpatterns for ML pipelines identified through a survey of AI ethics and datamanagement experts and validated through real-world scenarios with expertfeedback. The framework guides AI developers, data scientists, andpolicy-makers to implement ethical practices in AI development and deployresponsible AI systems in production.", "output": "Responsible Design Patterns for Machine Learning Pipelines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic speech recognition (ASR) models are frequently exposed to datadistribution shifts in many real-world scenarios, leading to erroneouspredictions. To tackle this issue, an existing test-time adaptation (TTA)method has recently been proposed to adapt the pre-trained ASR model onunlabeled test instances without source data. Despite decent performance gain,this work relies solely on naive greedy decoding and performs adaptation acrosstimesteps at a frame level, which may not be optimal given the sequentialnature of the model output. Motivated by this, we propose a novel TTAframework, dubbed SGEM, for general ASR models. To treat the sequential output,SGEM first exploits beam search to explore candidate output logits and selectsthe most plausible one. Then, it utilizes generalized entropy minimization andnegative sampling as unsupervised objectives to adapt the model. SGEM achievesstate-of-the-art performance for three mainstream ASR models under variousdomain shifts.", "output": "SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One way to expand the available dataset for training AI models in the medicalfield is through the use of Generative Adversarial Networks (GANs) for dataaugmentation. GANs work by employing a generator network to create new datasamples that are then assessed by a discriminator network to determine theirsimilarity to real samples. The discriminator network is taught todifferentiate between actual and synthetic samples, while the generator systemis trained to generate data that closely resemble real ones. The process isrepeated until the generator network can produce synthetic data that isindistinguishable from genuine data. GANs have been utilized in medical imageanalysis for various tasks, including data augmentation, image creation, anddomain adaptation. They can generate synthetic samples that can be used toincrease the available dataset, especially in cases where obtaining largeamounts of genuine data is difficult or unethical. However, it is essential tonote that the use of GANs in medical imaging is still an active area ofresearch to ensure that the produced images are of high quality and suitablefor use in clinical settings.", "output": "Generative Adversarial Networks for Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The field of medical imaging is an essential aspect of the medical sciences,involving various forms of radiation to capture images of the internal tissuesand organs of the body. These images provide vital information for clinicaldiagnosis, and in this chapter, we will explore the use of X-ray, MRI, andnuclear imaging in detecting severe illnesses. However, manual evaluation andstorage of these images can be a challenging and time-consuming process. Toaddress this issue, artificial intelligence (AI)-based techniques, particularlydeep learning (DL), have become increasingly popular for systematic featureextraction and classification from imaging modalities, thereby aiding doctorsin making rapid and accurate diagnoses. In this review study, we will focus onhow AI-based approaches, particularly the use of Convolutional Neural Networks(CNN), can assist in disease detection through medical imaging technology. CNNis a commonly used approach for image analysis due to its ability to extractfeatures from raw input images, and as such, will be the primary area ofdiscussion in this study. Therefore, we have considered CNN as our discussionarea in this study to diagnose ailments using medical imaging technology.", "output": "Case Studies on X-Ray Imaging, MRI and Nuclear Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we strive to develop an interpretable GNNs' inferenceparadigm, termed MSInterpreter, which can serve as a plug-and-play schemereadily applicable to various GNNs' baselines. Unlike the most existingexplanation methods, MSInterpreter provides a Message-passing Selectionscheme(MSScheme) to select the critical paths for GNNs' message aggregations,which aims at reaching the self-explaination instead of post-hoc explanations.In detail, the elaborate MSScheme is designed to calculate weight factors ofmessage aggregation paths by considering the vanilla structure and nodeembedding components, where the structure base aims at weight factors amongnode-induced substructures; on the other hand, the node embedding base focuseson weight factors via node embeddings obtained by one-layer GNN.Finally, wedemonstrate the effectiveness of our approach on graph classificationbenchmarks.", "output": "Message-passing selection: Towards interpretable GNNs for graph classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthesizing novel view images from a few views is a challenging butpractical problem. Existing methods often struggle with producing high-qualityresults or necessitate per-object optimization in such few-view settings due tothe insufficient information provided. In this work, we explore leveraging thestrong 2D priors in pre-trained diffusion models for synthesizing novel viewimages. 2D diffusion models, nevertheless, lack 3D awareness, leading todistorted image synthesis and compromising the identity. To address theseproblems, we propose DreamSparse, a framework that enables the frozenpre-trained diffusion model to generate geometry and identity-consistent novelview image. Specifically, DreamSparse incorporates a geometry module designedto capture 3D features from sparse views as a 3D prior. Subsequently, a spatialguidance model is introduced to convert these 3D feature maps into spatialinformation for the generative process. This information is then used to guidethe pre-trained diffusion model, enabling it to generate geometricallyconsistent images without tuning it. Leveraging the strong image priors in thepre-trained diffusion models, DreamSparse is capable of synthesizinghigh-quality novel views for both object and scene-level images andgeneralising to open-set images. Experimental results demonstrate that ourframework can effectively synthesize novel view images from sparse views andoutperforms baselines in both trained and open-set category images. Moreresults can be found on our project page:", "output": "DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) encode a vast amount of world knowledge acquiredfrom massive text datasets. Recent studies have demonstrated that LLMs canassist an agent in solving complex sequential decision making tasks in embodiedenvironments by providing high-level instructions. However, interacting withLLMs can be time-consuming, as in many practical scenarios, they require asignificant amount of storage space that can only be deployed on remote cloudserver nodes. Additionally, using commercial LLMs can be costly since they maycharge based on usage frequency. In this paper, we explore how to enableintelligent cost-effective interactions between the agent and an LLM. Wepropose a reinforcement learning based mediator model that determines when itis necessary to consult LLMs for high-level instructions to accomplish a targettask. Experiments on 4 MiniGrid environments that entail planning sub-goalsdemonstrate that our method can learn to solve target tasks with only a fewnecessary interactions with an LLM, significantly reducing interaction costs intesting environments, compared with baseline methods. Experimental results alsosuggest that by learning a mediator model to interact with the LLM, the agent'sperformance becomes more robust against partial observability of theenvironment. Our Code is available at ", "output": "Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Offline reinforcement learning (RL) offers an appealing approach toreal-world tasks by learning policies from pre-collected datasets withoutinteracting with the environment. However, the performance of existing offlineRL algorithms heavily depends on the scale and state-action space coverage ofdatasets. Real-world data collection is often expensive and uncontrollable,leading to small and narrowly covered datasets and posing significantchallenges for practical deployments of offline RL. In this paper, we provide anew insight that leveraging the fundamental symmetry of system dynamics cansubstantially enhance offline RL performance under small datasets.Specifically, we propose a Time-reversal symmetry (T-symmetry) enforcedDynamics Model (TDM), which establishes consistency between a pair of forwardand reverse latent dynamics. TDM provides both well-behaved representations forsmall datasets and a new reliability measure for OOD samples based oncompliance with the T-symmetry. These can be readily used to construct a newoffline RL algorithm (TSRL) with less conservative policy constraints and areliable latent space data augmentation procedure. Based on extensiveexperiments, we find TSRL achieves great performance on small benchmarkdatasets with as few as 1% of the original samples, which significantlyoutperforms the recent offline RL algorithms in terms of data efficiency andgeneralizability.", "output": "Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have attracted significant attention due to their remarkableability to create content and generate data for tasks such as imageclassification. However, the usage of diffusion models to generate high-qualityobject detection data remains an underexplored area, where not only theimage-level perceptual quality but also geometric conditions such as boundingboxes and camera views are essential. Previous studies have utilized eithercopy-paste synthesis or layout-to-image (L2I) generation with specificallydesigned modules to encode semantic layouts. In this paper, we proposeGeoDiffusion, a simple framework that can flexibly translate various geometricconditions into text prompts and empower the pre-trained text-to-image (T2I)diffusion models for high-quality detection data generation. Unlike previousL2I methods, our GeoDiffusion is able to encode not only bounding boxes butalso extra geometric conditions such as camera views in self-driving scenes.Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methodswhile maintaining 4x training time faster. To the best of our knowledge, thisis the first work to adopt diffusion models for layout-to-image generation withgeometric conditions and demonstrate that L2I-generated images can bebeneficial for improving the performance of object detectors.", "output": "Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human activity recognition (HAR) is a time series classification task thatfocuses on identifying the motion patterns from human sensor readings. Adequatedata is essential but a major bottleneck for training a generalizable HARmodel, which assists customization and optimization of online web applications.However, it is costly in time and economy to collect large-scale labeled datain reality, i.e., the low-resource challenge. Meanwhile, data collected fromdifferent persons have distribution shifts due to different living habits, bodyshapes, age groups, etc. The low-resource and distribution shift challenges aredetrimental to HAR when applying the trained model to new unseen subjects. Inthis paper, we propose a novel approach called Diverse and Discriminativerepresentation Learning (DDLearn) for generalizable low-resource HAR. DDLearnsimultaneously considers diversity and discrimination learning. With theconstructed self-supervised learning task, DDLearn enlarges the data diversityand explores the latent activity properties. Then, we propose a diversitypreservation module to preserve the diversity of learned features by enlargingthe distribution divergence between the original and augmented domains.Meanwhile, DDLearn also enhances semantic discrimination by learningdiscriminative representations with supervised contrastive learning. Extensiveexperiments on three public HAR datasets demonstrate that our methodsignificantly outperforms state-of-art methods by an average accuracyimprovement of 9.5% under the low-resource distribution shift scenarios, whilebeing a generic, explainable, and flexible framework.", "output": "Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Generative Diffusion Models (GDMs) have showcased their remarkablecapabilities in learning and generating images. A large community of GDMs hasnaturally emerged, further promoting the diversified applications of GDMs invarious fields. However, this unrestricted proliferation has raised seriousconcerns about copyright protection. For example, artists including paintersand photographers are becoming increasingly concerned that GDMs couldeffortlessly replicate their unique creative works without authorization. Inresponse to these challenges, we introduce a novel watermarking scheme,DiffusionShield, tailored for GDMs. DiffusionShield protects images fromcopyright infringement by GDMs through encoding the ownership information intoan imperceptible watermark and injecting it into the images. Its watermark canbe easily learned by GDMs and will be reproduced in their generated images. Bydetecting the watermark from generated images, copyright infringement can beexposed with evidence. Benefiting from the uniformity of the watermarks and thejoint optimization method, DiffusionShield ensures low distortion of theoriginal image, high watermark detection performance, and the ability to embedlengthy messages. We conduct rigorous and comprehensive experiments to show theeffectiveness of DiffusionShield in defending against infringement by GDMs andits superiority over traditional watermarking methods.", "output": "DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretation of deep learning remains a very challenging problem. Althoughthe Class Activation Map (CAM) is widely used to interpret deep modelpredictions by highlighting object location, it fails to provide insight intothe salient features used by the model to make decisions. Furthermore, existingevaluation protocols often overlook the correlation between interpretabilityperformance and the model's decision quality, which presents a more fundamentalissue. This paper proposes a new two-stage interpretability method called theDecomposition Class Activation Map (Decom-CAM), which offers a feature-levelinterpretation of the model's prediction. Decom-CAM decomposes intermediateactivation maps into orthogonal features using singular value decomposition andgenerates saliency maps by integrating them. The orthogonality of featuresenables CAM to capture local features and can be used to pinpoint semanticcomponents such as eyes, noses, and faces in the input image, making it morebeneficial for deep model interpretation. To ensure a comprehensive comparison,we introduce a new evaluation protocol by dividing the dataset into subsetsbased on classification accuracy results and evaluating the interpretabilityperformance on each subset separately. Our experiments demonstrate that theproposed Decom-CAM outperforms current state-of-the-art methods significantlyby generating more precise saliency maps across all levels of classificationaccuracy. Combined with our feature-level interpretability approach, this papercould pave the way for a new direction for understanding the decision-makingprocess of deep neural networks.", "output": "Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gait recognition aims at identifying the pedestrians at a long distance bytheir biometric gait patterns. It is inherently challenging due to the variouscovariates and the properties of silhouettes (textureless and colorless), whichresult in two kinds of pair-wise hard samples: the same pedestrian could havedistinct silhouettes (intra-class diversity) and different pedestrians couldhave similar silhouettes (inter-class similarity). In this work, we propose tosolve the hard sample issue with a Memory-augmented Progressive Learningnetwork (GaitMPL), including Dynamic Reweighting Progressive Learning module(DRPL) and Global Structure-Aligned Memory bank (GSAM). Specifically, DRPLreduces the learning difficulty of hard samples by easy-to-hard progressivelearning. GSAM further augments DRPL with a structure-aligned memory mechanism,which maintains and models the feature distribution of each ID. Experiments ontwo commonly used datasets, CASIA-B and OU-MVLP, demonstrate the effectivenessof GaitMPL. On CASIA-B, we achieve the state-of-the-art performance, i.e.,88.0% on the most challenging condition (Clothing) and 93.3% on the averagecondition, which outperforms the other methods by at least 3.8% and 1.4%,respectively.", "output": "GaitMPL: Gait Recognition with Memory-Augmented Progressive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although the impressive performance in visual grounding, the prevailingapproaches usually exploit the visual backbone in a passive way, i.e., thevisual backbone extracts features with fixed weights without expression-relatedhints. The passive perception may lead to mismatches (e.g., redundant andmissing), limiting further performance improvement. Ideally, the visualbackbone should actively extract visual features since the expressions alreadyprovide the blueprint of desired visual features. The active perception cantake expressions as priors to extract relevant visual features, which caneffectively alleviate the mismatches. Inspired by this, we propose an activeperception Visual Grounding framework based on Language Adaptive Weights,called VG-LAW. The visual backbone serves as an expression-specific featureextractor through dynamic weights generated for various expressions. Benefitingfrom the specific and relevant visual features extracted from thelanguage-aware visual backbone, VG-LAW does not require additional modules forcross-modal interaction. Along with a neat multi-task head, VG-LAW can becompetent in referring expression comprehension and segmentation jointly.Extensive experiments on four representative datasets, i.e., RefCOCO, RefCOCO+,RefCOCOg, and ReferItGame, validate the effectiveness of the proposed frameworkand demonstrate state-of-the-art performance.", "output": "Language Adaptive Weight Generation for Multi-task Visual Grounding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The emergence of smart cities demands harnessing advanced technologies likethe Internet of Things (IoT) and Artificial Intelligence (AI) and promises tounlock cities' potential to become more sustainable, efficient, and ultimatelylivable for their inhabitants. This work introduces an intelligent citymanagement system that provides a data-driven approach to three use cases: (i)analyze traffic information to reduce the risk of traffic collisions andimprove driver and pedestrian safety, (ii) identify when and where energyconsumption can be reduced to improve cost savings, and (iii) detectmaintenance issues like potholes in the city's roads and sidewalks, as well asthe beginning of hazards like floods and fires. A case study in Aveiro Citydemonstrates the system's effectiveness in generating actionable insights thatenhance security, energy efficiency, and sustainability, while highlighting thepotential of AI and IoT-driven solutions for smart city development.", "output": "From Data to Action: Exploring AI and IoT-driven Solutions for Smarter Cities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a simple yet effective transformer framework forself-supervised learning called DenseDINO to learn dense visualrepresentations. To exploit the spatial information that the dense predictiontasks require but neglected by the existing self-supervised transformers, weintroduce point-level supervision across views in a novel token-based way.Specifically, DenseDINO introduces some extra input tokens called referencetokens to match the point-level features with the position prior. With thereference token, the model could maintain spatial consistency and deal withmulti-object complex scene images, thus generalizing better on dense predictiontasks. Compared with the vanilla DINO, our approach obtains competitiveperformance when evaluated on classification in ImageNet and achieves a largemargin (+7.2% mIoU) improvement in semantic segmentation on PascalVOC under thelinear probing protocol for segmentation.", "output": "DenseDINO: Boosting Dense Self-Supervised Learning with Token-Based Point-Level Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Positron emission tomography (PET) is an important functional medical imagingtechnique often used in the evaluation of certain brain disorders, whosereconstruction problem is ill-posed. The vast majority of reconstructionmethods in PET imaging, both iterative and deep learning, return a singleestimate without quantifying the associated uncertainty. Due to ill-posednessand noise, a single solution can be misleading or inaccurate. Thus, providing ameasure of uncertainty in PET image reconstruction can help medicalpractitioners in making critical decisions. This paper proposes a deeplearning-based method for uncertainty quantification in PET imagereconstruction via posterior sampling. The method is based on training aconditional generative adversarial network whose generator approximatessampling from the posterior in Bayesian inversion. The generator is conditionedon reconstruction from a low-dose PET scan obtained by a conventionalreconstruction method and a high-quality magnetic resonance image and learnedto estimate a corresponding standard-dose PET scan reconstruction. We show thatthe proposed model generates high-quality posterior samples and yieldsphysically-meaningful uncertainty estimates.", "output": "Estimating Uncertainty in PET Image Reconstruction via Deep Posterior Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose to formulate point cloud extraction from ultrasound volumes as animage segmentation problem. Through this convenient formulation, a quickprototype exploring various variants of the U-Net architecture was developedand evaluated. This report documents the experimental results compiled using atraining dataset of 5 labelled ultrasound volumes and 84 unlabelled volumesthat got completed in a two-week period as part of a challenge submission to anopen challenge entitled ``Deep Learning in Ultrasound Image Analysis''. Sourcecode is shared with the research community at this GitHub URLurl{", "output": "SMRVIS: Point cloud extraction from 3-D ultrasound for non-destructive testing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Astounding performance of Transformers in natural language processing (NLP)has delighted researchers to explore their utilization in computer visiontasks. Like other computer vision tasks, DEtection TRansformer (DETR)introduces transformers for object detection tasks by considering the detectionas a set prediction problem without needing proposal generation andpost-processing steps. It is a state-of-the-art (SOTA) method for objectdetection, particularly in scenarios where the number of objects in an image isrelatively small. Despite the success of DETR, it suffers from slow trainingconvergence and performance drops for small objects. Therefore, manyimprovements are proposed to address these issues, leading to immenserefinement in DETR. Since 2020, transformer-based object detection hasattracted increasing interest and demonstrated impressive performance. Althoughnumerous surveys have been conducted on transformers in vision in general, areview regarding advancements made in 2D object detection using transformers isstill missing. This paper gives a detailed review of twenty-one papers aboutrecent developments in DETR. We begin with the basic modules of Transformers,such as self-attention, object queries and input features encoding. Then, wecover the latest advancements in DETR, including backbone modification, querydesign and attention refinement. We also compare all detection transformers interms of performance and network design. We hope this study will increase theresearcher's interest in solving existing challenges towards applyingtransformers in the object detection domain. Researchers can follow newerimprovements in detection transformers on this webpage available at:", "output": "2D Object Detection with Transformers: A Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We systematically study a wide variety of image-based generative modelsspanning semantically-diverse datasets to understand and improve the featureextractors and metrics used to evaluate them. Using best practices inpsychophysics, we measure human perception of image realism for generatedsamples by conducting the largest experiment evaluating generative models todate, and find that no existing metric strongly correlates with humanevaluations. Comparing to 16 modern metrics for evaluating the overallperformance, fidelity, diversity, and memorization of generative models, wefind that the state-of-the-art perceptual realism of diffusion models as judgedby humans is not reflected in commonly reported metrics such as FID. Thisdiscrepancy is not explained by diversity in generated samples, though onecause is over-reliance on Inception-V3. We address these flaws through a studyof alternative self-supervised feature extractors, find that the semanticinformation encoded by individual networks strongly depends on their trainingprocedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation ofgenerative models. Next, we investigate data memorization, and find thatgenerative models do memorize training examples on simple, smaller datasetslike CIFAR10, but not necessarily on more complex datasets like ImageNet.However, our experiments show that current metrics do not properly detectmemorization; none in the literature is able to separate memorization fromother phenomena such as underfitting or mode shrinkage. To facilitate furtherdevelopment of generative models and their evaluation we release all generatedimage datasets, human evaluation data, and a modular library to compute 16common metrics for 8 different encoders at", "output": "Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to understand visual concepts and replicate and compose theseconcepts from images is a central goal for computer vision. Recent advances intext-to-image (T2I) models have lead to high definition and realistic imagequality generation by learning from large databases of images and theirdescriptions. However, the evaluation of T2I models has focused on photorealismand limited qualitative measures of visual understanding. To quantify theability of T2I models in learning and synthesizing novel visual concepts, weintroduce ConceptBed, a large-scale dataset that consists of 284 unique visualconcepts, 5K unique concept compositions, and 33K composite text prompts. Alongwith the dataset, we propose an evaluation metric, Concept Confidence Deviation(CCD), that uses the confidence of oracle concept classifiers to measure thealignment between concepts generated by T2I generators and concepts containedin ground truth images. We evaluate visual concepts that are either objects,attributes, or styles, and also evaluate four dimensions of compositionality:counting, attributes, relations, and actions. Our human study shows that CCD ishighly correlated with human understanding of concepts. Our results point to atrade-off between learning the concepts and preserving the compositionalitywhich existing approaches struggle to overcome.", "output": "ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a volume rendering-based neural surface reconstruction method thattakes as few as three disparate RGB images as input. Our key idea is toregularize the reconstruction, which is severely ill-posed and leaving Ourmethod, coined DiViNet, operates in two stages. The first stage learns thetemplates, in the form of 3D Gaussian functions, across different scenes,without 3D supervision. In the reconstruction stage, our predicted templatesserve as anchors to help ``stitch'' the surfaces over sparse regions. Wedemonstrate that our approach is not only able to complete the surface geometrybut also reconstructs surface details to a reasonable extent from few disparateinput views. On the DTU and BlendedMVS datasets, our approach achieves the bestreconstruction quality among existing methods in the presence of such sparseviews, and performs on par, if not better, with competing methods when denseviews are employed as inputs.", "output": "DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud registration is a fundamental problem in computer vision thataims to estimate the transformation between corresponding sets of points.Non-rigid registration, in particular, involves addressing challenges includingvarious levels of deformation, noise, outliers, and data incompleteness. Thispaper introduces Robust-DefReg, a robust non-rigid point cloud registrationmethod based on graph convolutional networks (GCNNs). Robust-DefReg is acoarse-to-fine registration approach within an end-to-end pipeline, leveragingthe advantages of both coarse and fine methods. The method learns globalfeatures to find correspondences between source and target point clouds, toenable appropriate initial alignment, and subsequently fine registration. Thesimultaneous achievement of high accuracy and robustness across all challengesis reported less frequently in existing studies, making it a key objective ofthe Robust-DefReg method. The proposed method achieves high accuracy in largedeformations while maintaining computational efficiency. This method possessesthree primary attributes: high accuracy, robustness to different challenges,and computational efficiency. The experimental results show that the proposedRobust-DefReg holds significant potential as a foundational architecture forfuture investigations in non-rigid point cloud registration. The source code ofRobust-DefReg is available.", "output": "Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nested pairwise frames is a method for relative benchmarking of cell ortissue digital pathology models against manual pathologist annotations on a setof sampled patches. At a high level, the method compares agreement between acandidate model and pathologist annotations with agreement among pathologists'annotations. This evaluation framework addresses fundamental issues of datasize and annotator variability in using manual pathologist annotations as asource of ground truth for model validation. We implemented nested pairwiseframes evaluation for tissue classification, cell classification, and cellcount prediction tasks and show results for cell and tissue models deployed onan H&E-stained melanoma dataset.", "output": "Improved statistical benchmarking of digital pathology models using pairwise frames evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale joint training of multimodal models, e.g., CLIP, havedemonstrated great performance in many vision-language tasks. However,image-text pairs for pre-training are restricted to the intersection of imagesand texts, limiting their ability to cover a large distribution of real-worlddata, where noise can also be introduced as misaligned pairs duringpre-processing. Conversely, unimodal models trained on text or image data alonethrough unsupervised techniques can achieve broader coverage of diversereal-world data and are not constrained by the requirement of simultaneouspresence of image and text. In this paper, we demonstrate that usinglarge-scale unsupervised unimodal models as pre-training can enhance thezero-shot performance of image-text pair models. Our thorough studies validatethat models pre-trained as such can learn rich representations of bothmodalities, improving their ability to understand how images and text relate toeach other. Our experiments show that unimodal pre-training outperformsstate-of-the-art CLIP-based models by 6.5% (52.3% $rightarrow$ 58.8%) onPASCAL-5$^i$ and 6.2% (27.2% $rightarrow$ 33.4%) on COCO-20$^i$ semanticsegmentation under zero-shot setting respectively. By learning representationsof both modalities, unimodal pre-training offers broader coverage, reducedmisalignment errors, and the ability to capture more complex features andpatterns in the real-world data resulting in better performance especially forzero-shot vision-language tasks.", "output": "UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapid advancements of the text-to-image generative model,AI-generated images (AGIs) have been widely applied to entertainment,education, social media, etc. However, considering the large quality varianceamong different AGIs, there is an urgent need for quality models that areconsistent with human subjective ratings. To address this issue, we extensivelyconsider various popular AGI models, generated AGI through different promptsand model parameters, and collected subjective scores at the perceptual qualityand text-to-image alignment, thus building the most comprehensive AGIsubjective quality database AGIQA-3K so far. Furthermore, we conduct abenchmark experiment on this database to evaluate the consistency between thecurrent Image Quality Assessment (IQA) model and human perception, whileproposing StairReward that significantly improves the assessment performance ofsubjective text-to-image alignment. We believe that the fine-grained subjectivescores in AGIQA-3K will inspire subsequent AGI quality models to fit humansubjective perception mechanisms at both perception and alignment levels and tooptimize the generation result of future AGI models. The database is releasedon url{", "output": "AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How do neural networks extract patterns from pixels? Feature visualizationsattempt to answer this important question by visualizing highly activatingpatterns through optimization. Today, visualization methods form the foundationof our knowledge about the internal workings of neural networks, as a type ofmechanistic interpretability. Here we ask: How reliable are featurevisualizations? We start our investigation by developing network circuits thattrick feature visualizations into showing arbitrary patterns that arecompletely disconnected from normal network behavior on natural input. We thenprovide evidence for a similar phenomenon occurring in standard, unmanipulatednetworks: feature visualizations are processed very differently from standardinput, casting doubt on their ability to \"explain\" how neural networks processnatural images. We underpin this empirical finding by theory proving that theset of functions that can be reliably understood by feature visualization isextremely small and does not include general black-box neural networks.Therefore, a promising way forward could be the development of networks thatenforce certain structures in order to ensure more reliable featurevisualizations.", "output": "Don't trust your eyes: on the (un)reliability of feature visualizations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A major bottleneck of interdisciplinary computer vision (CV) research is thelack of a framework that eases the reuse and abstraction of state-of-the-art CVmodels by CV and non-CV researchers alike. We present here BU-CVKit, a computervision framework that allows the creation of research pipelines with chainableProcessors. The community can create plugins of their work for the framework,hence improving the re-usability, accessibility, and exposure of their workwith minimal overhead. Furthermore, we provide MuSeqPose Kit, a user interfacefor the pose estimation package of BU-CVKit, which automatically scans forinstalled plugins and programmatically generates an interface for them based onthe metadata provided by the user. It also provides software support forstandard pose estimation features such as annotations, 3D reconstruction,reprojection, and camera calibration. Finally, we show examples of behavioralneuroscience pipelines created through the sample plugins created for ourframework.", "output": "BU-CVKit: Extendable Computer Vision Framework for Species Independent Tracking and Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023)is the second annual CVPR workshop aimed at the monitoring and analysis of thehealth of Earth ecosystems by leveraging the vast amount of remote sensing datathat is continuously being collected. The primary objective of this workshop isto bring together the Earth and environmental science communities as well asthe multimodal representation learning communities to explore new ways ofharnessing technological advancements in support of environmental monitoring.The MultiEarth Workshop also seeks to provide a common benchmark for processingmultimodal remote sensing information by organizing public challenges focusedon monitoring the Amazon rainforest. These challenges include estimatingdeforestation, detecting forest fires, translating synthetic aperture radar(SAR) images to the visible domain, and projecting environmental trends. Thispaper presents the challenge guidelines, datasets, and evaluation metrics. Ourchallenge website is available at", "output": "MultiEarth 2023 -- Multimodal Learning for Earth and Environment Workshop and Challenge."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid advancement of generative models, facilitating the creation ofhyper-realistic images from textual descriptions, has concurrently escalatedcritical societal concerns such as misinformation. Traditional fake detectionmechanisms, although providing some mitigation, fall short in attributingresponsibility for the malicious use of synthetic images. This paper introducesa novel approach to model fingerprinting that assigns responsibility for thegenerated images, thereby serving as a potential countermeasure to modelmisuse. Our method modifies generative models based on each user's uniquedigital fingerprint, imprinting a unique identifier onto the resultant contentthat can be traced back to the user. This approach, incorporating fine-tuninginto Text-to-Image (T2I) tasks using the Stable Diffusion Model, demonstratesnear-perfect attribution accuracy with a minimal impact on output quality. Werigorously scrutinize our method's secrecy under two distinct scenarios: onewhere a malicious user attempts to detect the fingerprint, and another where auser possesses a comprehensive understanding of our method. We also evaluatethe robustness of our approach against various image post-processingmanipulations typically executed by end-users. Through extensive evaluation ofthe Stable Diffusion models, our method presents a promising and novel avenuefor accountable model distribution and responsible use.", "output": "WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training a 3D human keypoint detector from point clouds in a supervisedmanner requires large volumes of high quality labels. While it is relativelyeasy to capture large amounts of human point clouds, annotating 3D keypoints isexpensive, subjective, error prone and especially difficult for long-tail cases(pedestrians with rare poses, scooterists, etc.). In this work, we proposeGC-KPL - Geometry Consistency inspired Key Point Leaning, an approach forlearning 3D human joint locations from point clouds without human labels. Weachieve this by our novel unsupervised loss formulations that account for thestructure and movement of the human body. We show that by training on a largetraining set from Waymo Open Dataset without any human annotated keypoints, weare able to achieve reasonable performance as compared to the fully supervisedapproach. Further, the backbone benefits from the unsupervised training and isuseful in downstream fewshot learning of keypoints, where fine-tuning on only10 percent of the labeled training data gives comparable performance tofine-tuning on the entire set. We demonstrated that GC-KPL outperforms by alarge margin over SoTA when trained on entire dataset and efficiently leverageslarge volumes of unlabeled data.", "output": "3D Human Keypoints Estimation From Point Clouds in the Wild Without Human Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computational modeling of Multiresolution- Fractional Brownian motion (fBm)has been effective in stochastic multiscale fractal texture feature extractionand machine learning of abnormal brain tissue segmentation. Further, deepmultiresolution methods have been used for pixel-wise brain tissuesegmentation. Robust tissue segmentation and volumetric measurement may providemore objective quantification of disease burden and offer improved tracking oftreatment response for the disease. However, we posit that computationalmodeling of deep multiresolution fractal texture features may offer elegantfeature learning. Consequently, this work proposes novel modeling ofMultiresolution Fractal Deep Neural Network (MFDNN) and its computationalimplementation that mathematically combines a multiresolution fBm model anddeep multiresolution analysis. The proposed full 3D MFDNN model offers thedesirable properties of estimating multiresolution stochastic texture featuresby analyzing large amount of raw MRI image data for brain tumor segmentation.We apply the proposed MFDNN to estimate stochastic deep multiresolution fractaltexture features for tumor tissues in brain MRI images. The MFDNN model isevaluated using 1251 patient cases for brain tumor segmentation using the mostrecent BRATS 2021 Challenges dataset. The evaluation of the proposed modelusing Dice overlap score, Husdorff distance and associated uncertaintyestimation offers either better or comparable performances in abnormal braintissue segmentation when compared to the state-of-the-art methods in theliterature. Index Terms: Computational Modeling, Multiresolution FractionalBrownian Motion (fBm), Deep Multiresolution Analysis, Fractal Dimension (FD),Texture Features, Brain tumor segmentation, Deep Learning.", "output": "Computational Modeling of Deep Multiresolution-Fractal Texture and Its Application to Abnormal Brain Tissue Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Presenting whole slide images (WSIs) as graph will enable a more efficientand accurate learning framework for cancer diagnosis. Due to the fact that asingle WSI consists of billions of pixels and there is a lack of vast annotateddatasets required for computational pathology, the problem of learning fromWSIs using typical deep learning approaches such as convolutional neuralnetwork (CNN) is challenging. Additionally, WSIs down-sampling may lead to theloss of data that is essential for cancer detection. A novel two-stage learningtechnique is presented in this work. Since context, such as topologicalfeatures in the tumor surroundings, may hold important information for cancergrading and diagnosis, a graph representation capturing all dependencies amongregions in the WSI is very intuitive. Graph convolutional network (GCN) isdeployed to include context from the tumor and adjacent tissues, andself-supervised learning is used to enhance training through unlabeled data.More specifically, the entire slide is presented as a graph, where the nodescorrespond to the patches from the WSI. The proposed framework is then testedusing WSIs from prostate and kidney cancers. To assess the performanceimprovement through self-supervised mechanism, the proposed context-aware modelis tested with and without use of pre-trained self-supervised layer. Theoverall model is also compared with multi-instance learning (MIL) based andother existing approaches.", "output": "Context-Aware Self-Supervised Learning of Whole Slide Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a novel framework called RefineVIS for Video InstanceSegmentation (VIS) that achieves good object association between frames andaccurate segmentation masks by iteratively refining the representations usingsequence context. RefineVIS learns two separate representations on top of anoff-the-shelf frame-level image instance segmentation model: an associationrepresentation responsible for associating objects across frames and asegmentation representation that produces accurate segmentation masks.Contrastive learning is utilized to learn temporally stable associationrepresentations. A Temporal Attention Refinement (TAR) module learnsdiscriminative segmentation representations by exploiting temporalrelationships and a novel temporal contrastive denoising technique. Our methodsupports both online and offline inference. It achieves state-of-the-art videoinstance segmentation accuracy on YouTube-VIS 2019 (64.4 AP), Youtube-VIS 2021(61.4 AP), and OVIS (46.1 AP) datasets. The visualization shows that the TARmodule can generate more accurate instance segmentation masks, particularly forchallenging cases such as highly occluded objects.", "output": "RefineVIS: Video Instance Segmentation with Temporal Attention Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision-Language Pretraining (VLP) has demonstrated remarkable capabilities inlearning visual representations from textual descriptions of images withoutannotations. Yet, effective VLP demands large-scale image-text pairs, aresource that suffers scarcity in the medical domain. Moreover, conventionalVLP is limited to 2D images while medical images encompass diverse modalities,often in 3D, making the learning process more challenging. To address thesechallenges, we present Generative Text-Guided 3D Vision-Language Pretrainingfor Unified Medical Image Segmentation (GTGM), a framework that extends of VLPto 3D medical images without relying on paired textual descriptions.Specifically, GTGM utilizes large language models (LLM) to generatemedical-style text from 3D medical images. This synthetic text is then used tosupervise 3D visual representation learning. Furthermore, a negative-freecontrastive learning objective strategy is introduced to cultivate consistentvisual representations between augmented 3D medical image patches, whicheffectively mitigates the biases associated with strict positive-negativesample pairings. We evaluate GTGM on three imaging modalities - ComputedTomography (CT), Magnetic Resonance Imaging (MRI), and electron microscopy (EM)over 13 datasets. GTGM's superior performance across various medical imagesegmentation tasks underscores its effectiveness and versatility, by enablingVLP extension into 3D medical imagery while bypassing the need for paired text.", "output": "Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we address the challenges posed by the substantial trainingtime and memory consumption associated with video transformers, focusing on theViViT (Video Vision Transformer) model, in particular the Factorised Encoderversion, as our baseline for action recognition tasks. The factorised encodervariant follows the late-fusion approach that is adopted by many state of theart approaches. Despite standing out for its favorable speed/accuracy tradeoffsamong the different variants of ViViT, its considerable training time andmemory requirements still pose a significant barrier to entry. Our method isdesigned to lower this barrier and is based on the idea of freezing the spatialtransformer during training. This leads to a low accuracy model if naivelydone. But we show that by (1) appropriately initializing the temporaltransformer (a module responsible for processing temporal information) (2)introducing a compact adapter model connecting frozen spatial representations((a module that selectively focuses on regions of the input image) to thetemporal transformer, we can enjoy the benefits of freezing the spatialtransformer without sacrificing accuracy. Through extensive experimentationover 6 benchmarks, we demonstrate that our proposed training strategysignificantly reduces training costs (by $sim 50%$) and memory consumptionwhile maintaining or slightly improving performance by up to 1.79% compared tothe baseline model. Our approach additionally unlocks the capability to utilizelarger image transformer models as our spatial transformer and access moreframes with the same memory consumption.", "output": "Optimizing ViViT Training: Time and Memory Reduction for Action Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised video-based object-centric learning is a promising avenue tolearn structured representations from large, unlabeled video collections, butprevious approaches have only managed to scale to real-world datasets inrestricted domains. Recently, it was shown that the reconstruction ofpre-trained self-supervised features leads to object-centric representations onunconstrained real-world image datasets. Building on this approach, we proposea novel way to use such pre-trained features in the form of a temporal featuresimilarity loss. This loss encodes temporal correlations between image patchesand is a natural way to introduce a motion bias for object discovery. Wedemonstrate that this loss leads to state-of-the-art performance on thechallenging synthetic MOVi datasets. When used in combination with the featurereconstruction loss, our model is the first object-centric video model thatscales to unconstrained video datasets such as YouTube-VIS.", "output": "Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detection of artificial objects from underwater imagery gathered byAutonomous Underwater Vehicles (AUVs) is a key requirement for many subseaapplications. Real-world AUV image datasets tend to be very large andunlabelled. Furthermore, such datasets are typically imbalanced, containing fewinstances of objects of interest, particularly when searching for unusualobjects in a scene. It is therefore, difficult to fit models capable ofreliably detecting these objects. Given these factors, we propose to treatartificial objects as anomalies and detect them through a semi-supervisedframework based on Variational Autoencoders (VAEs). We develop a method whichclusters image data in a learned low-dimensional latent space and extractsimages that are likely to contain anomalous features. We also devise an anomalyscore based on extracting poorly reconstructed regions of an image. Wedemonstrate that by applying both methods on large image datasets, humanoperators can be shown candidate anomalous samples with a low false positiverate to identify objects of interest. We apply our approach to real seafloorimagery gathered by an AUV and evaluate its sensitivity to the dimensionalityof the latent representation used by the VAE. We evaluate the precision-recalltradeoff and demonstrate that by choosing an appropriate latent dimensionalityand threshold, we are able to achieve an average precision of 0.64 onunlabelled datasets.", "output": "A Semi-supervised Object Detection Algorithm for Underwater Imagery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task scene understanding aims to design models that can simultaneouslypredict several scene understanding tasks with one versatile model. Previousstudies typically process multi-task features in a more local way, and thuscannot effectively learn spatially global and cross-task interactions, whichhampers the models' ability to fully leverage the consistency of various tasksin multi-task learning. To tackle this problem, we propose an Inverted Pyramidmulti-task Transformer, capable of modeling cross-task interaction amongspatial features of different tasks in a global context. Specifically, we firstutilize a transformer encoder to capture task-generic features for all tasks.And then, we design a transformer decoder to establish spatial and cross-taskinteraction globally, and a novel UP-Transformer block is devised to increasethe resolutions of multi-task features gradually and establish cross-taskinteraction at different scales. Furthermore, two types of Cross-ScaleSelf-Attention modules, i.e., Fusion Attention and Selective Attention, areproposed to efficiently facilitate cross-task interaction across differentfeature scales. An Encoder Feature Aggregation strategy is further introducedto better model multi-scale information in the decoder. Comprehensiveexperiments on several 2D/3D multi-task benchmarks clearly demonstrate ourproposal's effectiveness, establishing significant state-of-the-artperformances.", "output": "InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising is intuitively related to projection. Indeed, under the manifoldhypothesis, adding random noise is approximately equivalent to orthogonalperturbation. Hence, learning to denoise is approximately learning to project.In this paper, we use this observation to reinterpret denoising diffusionmodels as approximate gradient descent applied to the Euclidean distancefunction. We then provide straight-forward convergence analysis of the DDIMsampler under simple assumptions on the projection-error of the denoiser.Finally, we propose a new sampler based on two simple modifications to DDIMusing insights from our theoretical results. In as few as 5-10 functionevaluations, our sampler achieves state-of-the-art FID scores on pretrainedCIFAR-10 and CelebA models and can generate high quality samples on latentdiffusion models.", "output": "Interpreting and Improving Diffusion Models Using the Euclidean Distance Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-dataset training provides a viable solution for exploitingheterogeneous large-scale datasets without extra annotation cost. In this work,we propose a scalable multi-dataset detector (ScaleDet) that can scale up itsgeneralization across datasets when increasing the number of training datasets.Unlike existing multi-dataset learners that mostly rely on manual relabellingefforts or sophisticated optimizations to unify labels across datasets, weintroduce a simple yet scalable formulation to derive a unified semantic labelspace for multi-dataset training. ScaleDet is trained by visual-textualalignment to learn the label assignment with label semantic similarities acrossdatasets. Once trained, ScaleDet can generalize well on any given upstream anddownstream datasets with seen and unseen classes. We conduct extensiveexperiments using LVIS, COCO, Objects365, OpenImages as upstream datasets, and13 datasets from Object Detection in the Wild (ODinW) as downstream datasets.Our results show that ScaleDet achieves compelling strong model performancewith an mAP of 50.7 on LVIS, 58.8 on COCO, 46.8 on Objects365, 76.2 onOpenImages, and 71.8 on ODinW, surpassing state-of-the-art detectors with thesame backbone.", "output": "ScaleDet: A Scalable Multi-Dataset Object Detector."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an approach to obtain a personalized generativeprior with explicit control over a set of attributes. We build upon MyStyle, arecently introduced method, that tunes the weights of a pre-trained StyleGANface generator on a few images of an individual. This system allowssynthesizing, editing, and enhancing images of the target individual with highfidelity to their facial features. However, MyStyle does not demonstrateprecise control over the attributes of the generated images. We propose toaddress this problem through a novel optimization system that organizes thelatent space in addition to tuning the generator. Our key contribution is toformulate a loss that arranges the latent codes, corresponding to the inputimages, along a set of specific directions according to their attributes. Wedemonstrate that our approach, dubbed MyStyle++, is able to synthesize, edit,and enhance images of an individual with great control over the attributes,while preserving the unique facial characteristics of that individual.", "output": "MyStyle++: A Controllable Personalized Generative Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to data's unavailability or large size, and the high computational andhuman labor costs of training machine learning models, it is a common practiceto rely on open source pre-trained models whenever possible. However, thispractice is worry some from the security perspective. Pre-trained models can beinfected with Trojan attacks, in which the attacker embeds a trigger in themodel such that the model's behavior can be controlled by the attacker when thetrigger is present in the input. In this paper, we present our preliminary workon a novel method for Trojan model detection. Our method creates a signaturefor a model based on activation optimization. A classifier is then trained todetect a Trojan model given its signature. Our method achieves state of the artperformance on two public datasets.", "output": "Trojan Model Detection Using Activation Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ShaDDR, an example-based deep generative neural network whichproduces a high-resolution textured 3D shape through geometry detailization andconditional texture generation applied to an input coarse voxel shape. Trainedon a small set of detailed and textured exemplar shapes, our method learns todetailize the geometry via multi-resolution voxel upsampling and generatetextures on voxel surfaces via differentiable rendering against exemplartexture images from a few views. The generation is real-time, taking less than1 second to produce a 3D model with voxel resolutions up to 512^3. Thegenerated shape preserves the overall structure of the input coarse voxelmodel, while the style of the generated geometric details and textures can bemanipulated through learned latent codes. In the experiments, we show that ourmethod can generate higher-resolution shapes with plausible and improvedgeometric details and clean textures compared to prior works. Furthermore, weshowcase the ability of our method to learn geometric details and textures fromshapes reconstructed from real-world photos. In addition, we have developed aninteractive modeling application to demonstrate the generalizability of ourmethod to various user inputs and the controllability it offers, allowing usersto interactively sculpt a coarse voxel shape to define the overall structure ofthe detailized 3D shape.", "output": "ShaDDR: Real-Time Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fine-grained visual categorization (FGVC) is a challenging task due tosimilar visual appearances between various species. Previous studies alwaysimplicitly assume that the training and test data have the same underlyingdistributions, and that features extracted by modern backbone architecturesremain discriminative and generalize well to unseen test data. However, weempirically justify that these conditions are not always true on benchmarkdatasets. To this end, we combine the merits of invariant risk minimization(IRM) and information bottleneck (IB) principle to learn invariant and minimumsufficient (IMS) representations for FGVC, such that the overall model canalways discover the most succinct and consistent fine-grained features. Weapply the matrix-based R{'e}nyi's $alpha$-order entropy to simplify andstabilize the training of IB; we also design a ``soft\" environment partitionscheme to make IRM applicable to FGVC task. To the best of our knowledge, weare the first to address the problem of FGVC from a generalization perspectiveand develop a new information-theoretic solution accordingly. Extensiveexperiments demonstrate the consistent performance gain offered by our IMS.", "output": "Coping with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since its inception, Vision Transformer (ViT) has emerged as a prevalentmodel in the computer vision domain. Nonetheless, the multi-head self-attention(MHSA) mechanism in ViT is computationally expensive due to its calculation ofrelationships among all tokens. Although some techniques mitigate computationaloverhead by discarding tokens, this also results in the loss of potentialinformation from those tokens. To tackle these issues, we propose a novel tokenpruning method that retains information from non-crucial tokens by merging themwith more crucial tokens, thereby mitigating the impact of pruning on modelperformance. Crucial and non-crucial tokens are identified by their importancescores and merged based on similarity scores. Furthermore, multi-scale featuresare exploited to represent images, which are fused prior to token pruning toproduce richer feature representations. Importantly, our method can beseamlessly integrated with various ViTs, enhancing their adaptability.Experimental evidence substantiates the efficacy of our approach in reducingthe influence of token pruning on model performance. For instance, on theImageNet dataset, it achieves a remarkable 33% reduction in computational costswhile only incurring a 0.1% decrease in accuracy on DeiT-S.", "output": "Muti-Scale And Token Mergence: Make Your ViT More Efficient."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Masked autoencoder (MAE), a simple and effective self-supervised learningframework based on the reconstruction of masked image regions, has recentlyachieved prominent success in a variety of vision tasks. Despite the emergenceof intriguing empirical observations on MAE, a theoretically principledunderstanding is still lacking. In this work, we formally characterize andjustify existing empirical insights and provide theoretical guarantees of MAE.We formulate the underlying data-generating process as a hierarchical latentvariable model and show that under reasonable assumptions, MAE provablyidentifies a set of latent variables in the hierarchical model, explaining whyMAE can extract high-level information from pixels. Further, we show how keyhyperparameters in MAE (the masking ratio and the patch size) determine whichtrue latent variables to be recovered, therefore influencing the level ofsemantic information in the representation. Specifically, extremely large orsmall masking ratios inevitably lead to low-level representations. Our theoryoffers coherent explanations of existing empirical observations and providesinsights for potential empirical improvements and fundamental limitations ofthe masking-reconstruction paradigm. We conduct extensive experiments tovalidate our theoretical insights.", "output": "Understanding Masked Autoencoders via Hierarchical Latent Variable Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks have been widely used in medical image analysis andmedical image segmentation is one of the most important tasks. U-shaped neuralnetworks with encoder-decoder are prevailing and have succeeded greatly invarious segmentation tasks. While CNNs treat an image as a grid of pixels inEuclidean space and Transformers recognize an image as a sequence of patches,graph-based representation is more generalized and can construct connectionsfor each part of an image. In this paper, we propose a novel ViG-UNet, a graphneural network-based U-shaped architecture with the encoder, the decoder, thebottleneck, and skip connections. The downsampling and upsampling modules arealso carefully designed. The experimental results on ISIC 2016, ISIC 2017 andKvasir-SEG datasets demonstrate that our proposed architecture outperforms mostexisting classic and state-of-the-art U-shaped networks.", "output": "ViG-UNet: Vision Graph Neural Networks for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In domain generalization (DG), the target domain is unknown when the model isbeing trained, and the trained model should successfully work on an arbitrary(and possibly unseen) target domain during inference. This is a difficultproblem, and despite active studies in recent years, it remains a greatchallenge. In this paper, we take a simple yet effective approach to tacklethis issue. We propose test-time style shifting, which shifts the style of thetest sample (that has a large style gap with the source domains) to the nearestsource domain that the model is already familiar with, before making theprediction. This strategy enables the model to handle any target domains witharbitrary style statistics, without additional model update at test-time.Additionally, we propose style balancing, which provides a great platform formaximizing the advantage of test-time style shifting by handling theDG-specific imbalance issues. The proposed ideas are easy to implement andsuccessfully work in conjunction with various other DG schemes. Experimentalresults on different datasets show the effectiveness of our methods.", "output": "Test-Time Style Shifting: Handling Arbitrary Styles in Domain Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately detecting lane lines in 3D space is crucial for autonomousdriving. Existing methods usually first transform image-view features intobird-eye-view (BEV) by aid of inverse perspective mapping (IPM), and thendetect lane lines based on the BEV features. However, IPM ignores the changesin road height, leading to inaccurate view transformations. Additionally, thetwo separate stages of the process can cause cumulative errors and increasedcomplexity. To address these limitations, we propose an efficient transformerfor 3D lane detection. Different from the vanilla transformer, our modelcontains a decomposed cross-attention mechanism to simultaneously learn laneand BEV representations. The mechanism decomposes the cross-attention betweenimage-view and BEV features into the one between image-view and lane features,and the one between lane and BEV features, both of which are supervised withground-truth lane lines. Our method obtains 2D and 3D lane predictions byapplying the lane features to the image-view and BEV features, respectively.This allows for a more accurate view transformation than IPM-based methods, asthe view transformation is learned from data with a supervised cross-attention.Additionally, the cross-attention between lane and BEV features enables them toadjust to each other, resulting in more accurate lane detection than the twoseparate stages. Finally, the decomposed cross-attention is more efficient thanthe original one. Experimental results on OpenLane and ONCE-3DLanes demonstratethe state-of-the-art performance of our method.", "output": "An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Though Self-supervised learning (SSL) has been widely studied as a promisingtechnique for representation learning, it doesn't generalize well onlong-tailed datasets due to the majority classes dominating the feature space.Recent work shows that the long-tailed learning performance could be boosted bysampling extra in-domain (ID) data for self-supervised training, however,large-scale ID data which can rebalance the minority classes are expensive tocollect. In this paper, we propose an alternative but easy-to-use and effectivesolution, Contrastive with Out-of-distribution (OOD) data for Long-Taillearning (COLT), which can effectively exploit OOD data to dynamicallyre-balance the feature space. We empirically identify the counter-intuitiveusefulness of OOD samples in SSL long-tailed learning and principally design anovel SSL method. Concretely, we first localize the `head' and `tail' samplesby assigning a tailness score to each OOD sample based on its neighborhoods inthe feature space. Then, we propose an online OOD sampling strategy todynamically re-balance the feature space. Finally, we enforce the model to becapable of distinguishing ID and OOD samples by a distribution-level supervisedcontrastive loss. Extensive experiments are conducted on various datasets andseveral state-of-the-art SSL frameworks to verify the effectiveness of theproposed method. The results show that our method significantly improves theperformance of SSL on long-tailed datasets by a large margin, and evenoutperforms previous work which uses external ID data. Our code is available at", "output": "On the Effectiveness of Out-of-Distribution Data in Self-Supervised Long-Tail Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual question answering (VQA) is a Multidisciplinary research problem thatpursued through practices of natural language processing and computer vision.Visual question answering automatically answers natural language questionsaccording to the content of an image. Some testing questions require externalknowledge to derive a solution. Such knowledge-based VQA uses various methodsto retrieve features of image and text, and combine them to generate theanswer. To generate knowledgebased answers either question dependent or imagedependent knowledge retrieval methods are used. If knowledge about all theobjects in the image is derived, then not all knowledge is relevant to thequestion. On other side only question related knowledge may lead to incorrectanswers and over trained model that answers question that is irrelevant toimage. Our proposed method takes image attributes and question features asinput for knowledge derivation module and retrieves only question relevantknowledge about image objects which can provide accurate answers.", "output": "Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we propose a novel activation mechanism aimed at establishinglayer-level activation (LayerAct) functions. These functions are designed to bemore noise-robust compared to traditional element-level activation functions byreducing the layer-level fluctuation of the activation outputs due to shift ininputs. Moreover, the LayerAct functions achieve a zero-like mean activationoutput without restricting the activation output space. We present an analysisand experiments demonstrating that LayerAct functions exhibit superiornoise-robustness compared to element-level activation functions, andempirically show that these functions have a zero-like mean activation.Experimental results on three benchmark image classification tasks show thatLayerAct functions excel in handling noisy image datasets, outperformingelement-level activation functions, while the performance on clean datasets isalso superior in most cases.", "output": "Layer-level activation mechanism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the domain of remote sensing image interpretation, road extraction fromhigh-resolution aerial imagery has already been a hot research topic. Althoughdeep CNNs have presented excellent results for semantic segmentation, theefficiency and capabilities of vision transformers are yet to be fullyresearched. As such, for accurate road extraction, a deep semantic segmentationneural network that utilizes the abilities of residual learning, HetConvs,UNet, and vision transformers, which is called texttt{ResUNetFormer}, isproposed in this letter. The developed texttt{ResUNetFormer} is evaluated onvarious cutting-edge deep learning-based road extraction techniques on thepublic Massachusetts road dataset. Statistical and visual results demonstratethe superiority of the texttt{ResUNetFormer} over the state-of-the-art CNNsand vision transformers for segmentation. The code will be made availablepublicly at url{", "output": "Neighborhood Attention Makes the Encoder of ResUNet Stronger for Accurate Road Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is well-known that modern computer vision systems often exhibit behaviorsmisaligned with those of humans: from adversarial attacks to image corruptions,deep learning vision models suffer in a variety of settings that humans capablyhandle. In light of these phenomena, here we introduce another, orthogonalperspective studying the human-machine vision gap. We revisit the task ofrecovering images under degradation, first introduced over 30 years ago in theRecognition-by-Components theory of human vision. Specifically, we study theperformance and behavior of neural networks on the seemingly simple task ofclassifying regular polygons at varying orders of degradation along theirperimeters. To this end, we implement the Automated Shape Recoverability Testfor rapidly generating large-scale datasets of perimeter-degraded regularpolygons, modernizing the historically manual creation of image recoverabilityexperiments. We then investigate the capacity of neural networks to recognizeand recover such degraded shapes when initialized with different priors.Ultimately, we find that neural networks' behavior on this simple taskconflicts with human behavior, raising a fundamental question of the robustnessand learning capabilities of modern computer vision models.", "output": "Degraded Polygons Raise Fundamental Questions of Neural Network Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reenacting facial images is an important task that can find numerousapplications. We proposed IFaceUV, a fully differentiable pipeline thatproperly combines 2D and 3D information to conduct the facial reenactment task.The three-dimensional morphable face models (3DMMs) and corresponding UV mapsare utilized to intuitively control facial motions and textures, respectively.Two-dimensional techniques based on 2D image warping is further required tocompensate for missing components of the 3DMMs such as backgrounds, ear, hairand etc. In our pipeline, we first extract 3DMM parameters and corresponding UVmaps from source and target images. Then, initial UV maps are refined by the UVmap refinement network and it is rendered to the image with the motionmanipulated 3DMM parameters. In parallel, we warp the source image according tothe 2D flow field obtained from the 2D warping network. Rendered and warpedimages are combined in the final editing network to generate the finalreenactment image. Additionally, we tested our model for the audio-drivenfacial reenactment task. Extensive qualitative and quantitative experimentsillustrate the remarkable performance of our method compared to otherstate-of-the-art methods.", "output": "IFaceUV: Intuitive Motion Facial Image Generation by Identity Preservation via UV map."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel multi-view implicit surface reconstruction technique,termed StreetSurf, that is readily applicable to street view images inwidely-used autonomous driving datasets, such as Waymo-perception sequences,without necessarily requiring LiDAR data. As neural rendering research expandsrapidly, its integration into street views has started to draw interests.Existing approaches on street views either mainly focus on novel view synthesiswith little exploration of the scene geometry, or rely heavily on dense LiDARdata when investigating reconstruction. Neither of them investigates multi-viewimplicit surface reconstruction, especially under settings without LiDAR data.Our method extends prior object-centric neural surface reconstructiontechniques to address the unique challenges posed by the unbounded street viewsthat are captured with non-object-centric, long and narrow camera trajectories.We delimit the unbounded space into three parts, close-range, distant-view andsky, with aligned cuboid boundaries, and adapt cuboid/hyper-cuboid hash-gridsalong with road-surface initialization scheme for finer and disentangledrepresentation. To further address the geometric errors arising fromtextureless regions and insufficient viewing angles, we adopt geometric priorsthat are estimated using general purpose monocular models. Coupled with ourimplementation of efficient and fine-grained multi-stage ray marching strategy,we achieve state of the art reconstruction quality in both geometry andappearance within only one to two hours of training time with a single RTX3090GPU for each street view sequence. Furthermore, we demonstrate that thereconstructed implicit surfaces have rich potential for various downstreamtasks, including ray tracing and LiDAR simulation.", "output": "StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have achieved impressive results in generating diverse andrealistic data by employing multi-step denoising processes. However, the needfor accommodating significant variations in input noise at each time-step hasled to diffusion models requiring a large number of parameters for theirdenoisers. We have observed that diffusion models effectively act as filtersfor different frequency ranges at each time-step noise. While some previousworks have introduced multi-expert strategies, assigning denoisers to differentnoise intervals, they overlook the importance of specialized operations forhigh and low frequencies. For instance, self-attention operations are effectiveat handling low-frequency components (low-pass filters), while convolutionsexcel at capturing high-frequency features (high-pass filters). In other words,existing diffusion models employ denoisers with the same architecture, withoutconsidering the optimal operations for each time-step noise. To address thislimitation, we propose a novel approach called Multi-architecturE Multi-Expert(MEME), which consists of multiple experts with specialized architecturestailored to the operations required at each time-step interval. Throughextensive experiments, we demonstrate that MEME outperforms large competitorsin terms of both generation performance and computational efficiency.", "output": "Multi-Architecture Multi-Expert Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of the multi-media internet, visual characteristics havebecome an important factor affecting user interests. Thus, incorporating visualfeatures is a promising direction for further performance improvements inclick-through rate (CTR) prediction. However, we found that simply injectingthe image embeddings trained with established pre-training methods only hasmarginal improvements. We attribute the failure to two reasons: First, Thepre-training methods are designed for well-defined computer vision tasksconcentrating on semantic features, and they cannot learn personalized interestin recommendations. Secondly, pre-trained image embeddings only containingsemantic information have little information gain, considering we already havesemantic features such as categories and item titles as inputs in the CTRprediction task. We argue that a pre-training method tailored forrecommendation is necessary for further improvements. To this end, we propose arecommendation-aware image pre-training method that can learn visual featuresfrom user click histories. Specifically, we propose a user interestreconstruction module to mine visual features related to user interests frombehavior histories. We further propose a contrastive training method to avoidcollapsing of embedding vectors. We conduct extensive experiments to verifythat our method can learn users' visual interests, and our method achieves$0.46%$ improvement in offline AUC and $0.88%$ improvement in Taobao onlineGMV with p-value$<0.01$.", "output": "COURIER: Contrastive User Intention Reconstruction for Large-Scale Pre-Train of Image Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based on atransformer-based attention mechanism and temporal pooling for Non-IntrusiveLoad Monitoring (NILM) of smart buildings. The paper aims to improve theaccuracy of NILM by using a deep learning-based method. The proposed methoduses a Seq2Seq model with a transformer-based attention mechanism to capturethe long-term dependencies of NILM data. Additionally, temporal pooling is usedto improve the model's accuracy by capturing both the steady-state andtransient behavior of appliances. The paper evaluates the proposed method on apublicly available dataset and compares the results with other state-of-the-artNILM techniques. The results demonstrate that the proposed method outperformsthe existing methods in terms of both accuracy and computational efficiency.", "output": "Sequence-to-Sequence Model with Transformer-based Attention Mechanism and Temporal Pooling for Non-Intrusive Load Monitoring."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whole slide image (WSI) refers to a type of high-resolution scanned tissueimage, which is extensively employed in computer-assisted diagnosis (CAD). Theextremely high resolution and limited availability of region-level annotationsmake it challenging to employ deep learning methods for WSI-based digitaldiagnosis. Multiple instance learning (MIL) is a powerful tool to address theweak annotation problem, while Transformer has shown great success in the fieldof visual tasks. The combination of both should provide new insights for deeplearning based image diagnosis. However, due to the limitations of single-levelMIL and the attention mechanism's constraints on sequence length, directlyapplying Transformer to WSI-based MIL tasks is not practical. To tackle thisissue, we propose a Multi-level MIL with Transformer (MMIL-Transformer)approach. By introducing a hierarchical structure to MIL, this approach enablesefficient handling of MIL tasks that involve a large number of instances. Tovalidate its effectiveness, we conducted a set of experiments on WSIsclassification task, where MMIL-Transformer demonstrate superior performancecompared to existing state-of-the-art methods. Our proposed approach achievestest AUC 94.74% and test accuracy 93.41% on CAMELYON16 dataset, test AUC 99.04%and test accuracy 94.37% on TCGA-NSCLC dataset, respectively. All code andpre-trained models are available at: ", "output": "Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many information processing systems, it may be desirable to ensure thatany change of the input, whether by shifting or scaling, results in acorresponding change in the system response. While deep neural networks aregradually replacing all traditional automatic processing methods, theysurprisingly do not guarantee such normalization-equivariance (scale + shift)property, which can be detrimental in many applications. To address this issue,we propose a methodology for adapting existing neural networks so thatnormalization-equivariance holds by design. Our main claim is that not onlyordinary convolutional layers, but also all activation functions, including theReLU (rectified linear unit), which are applied element-wise to thepre-activated neurons, should be completely removed from neural networks andreplaced by better conditioned alternatives. To this end, we introduceaffine-constrained convolutions and channel-wise sort pooling layers assurrogates and show that these two architectural modifications do preservenormalization-equivariance without loss of performance. Experimental results inimage denoising show that normalization-equivariant neural networks, inaddition to their better conditioning, also provide much better generalizationacross noise levels.", "output": "Normalization-Equivariant Neural Networks with Application to Image Denoising."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Each year, wildfires destroy larger areas of Spain, threatening numerousecosystems. Humans cause 90% of them (negligence or provoked) and the behaviourof individuals is unpredictable. However, atmospheric and environmentalvariables affect the spread of wildfires, and they can be analysed by usingdeep learning. In order to mitigate the damage of these events we proposed thenovel Wildfire Assessment Model (WAM). Our aim is to anticipate the economicand ecological impact of a wildfire, assisting managers resource allocation anddecision making for dangerous regions in Spain, Castilla y Le'on andAndaluc'ia. The WAM uses a residual-style convolutional network architectureto perform regression over atmospheric variables and the greenness index,computing necessary resources, the control and extinction time, and theexpected burnt surface area. It is first pre-trained with self-supervision over100,000 examples of unlabelled data with a masked patch prediction objectiveand fine-tuned using 311 samples of wildfires. The pretraining allows the modelto understand situations, outclassing baselines with a 1,4%, 3,7% and 9%improvement estimating human, heavy and aerial resources; 21% and 10,2% inexpected extinction and control time; and 18,8% in expected burnt area. Usingthe WAM we provide an example assessment map of Castilla y Le'on, visualizingthe expected resources over an entire region.", "output": "Spain on Fire: A novel wildfire risk assessment model based on image satellite processing and atmospheric information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing pruning methods utilize the importance of each weight based onspecified criteria only when searching for a sparse structure but do notutilize it during training. In this work, we propose a novel approach -textbf{M}agnitude textbf{A}ttention-based Dynamic textbf{P}runing (MAP)method, which applies the importance of weights throughout both the forward andbackward paths to explore sparse model structures dynamically. Magnitudeattention is defined based on the magnitude of weights as continuousreal-valued numbers enabling a seamless transition from a redundant to aneffective sparse network by promoting efficient exploration. Additionally, theattention mechanism ensures more effective updates for important layers withinthe sparse network. In later stages of training, our approach shifts fromexploration to exploitation, exclusively updating the sparse model composed ofcrucial weights based on the explored structure, resulting in pruned modelsthat not only achieve performance comparable to dense models but alsooutperform previous pruning methods on CIFAR-10/100 and ImageNet.", "output": "Magnitude Attention-based Dynamic Pruning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-task visual perception has a wide range of applications in sceneunderstanding such as autonomous driving. In this work, we devise an efficientunified framework to solve multiple common perception tasks, including instancesegmentation, semantic segmentation, monocular 3D detection, and depthestimation. Simply sharing the same visual feature representations for thesetasks impairs the performance of tasks, while independent task-specific featureextractors lead to parameter redundancy and latency. Thus, we design twofeature-merge branches to learn feature basis, which can be useful to, and thusshared by, multiple perception tasks. Then, each task takes the correspondingfeature basis as the input of the prediction task head to fulfill a specifictask. In particular, one feature merge branch is designed for instance-levelrecognition the other for dense predictions. To enhance inter-branchcommunication, the instance branch passes pixel-wise spatial information ofeach instance to the dense branch using efficient dynamic convolutionweighting. Moreover, a simple but effective dynamic routing mechanism isproposed to isolate task-specific features and leverage common properties amongtasks. Our proposed framework, termed D2BNet, demonstrates a unique approach toparameter-efficient predictions for multi-task perception. In addition, astasks benefit from co-training with each other, our solution achieves on parresults on partially labeled settings on nuScenes and outperforms previousworks for 3D detection and depth estimation on the Cityscapes dataset with fullsupervision.", "output": "A Dynamic Feature Interaction Framework for Multi-task Visual Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual Prompt Tuning (VPT) is an effective tuning method for adaptingpretrained Vision Transformers (ViTs) to downstream tasks. It leverages extralearnable tokens, known as prompts, which steer the frozen pretrained ViTs.Although VPT has demonstrated its applicability with supervised visiontransformers, it often underperforms with self-supervised ones. Throughempirical observations, we deduce that the effectiveness of VPT hinges largelyon the ViT blocks with which the prompt tokens interact. Specifically, VPTshows improved performance on image classification tasks for MAE and MoCo v3when the prompt tokens are inserted into later blocks rather than the firstblock. These observations suggest that there exists an optimal location ofblocks for the insertion of prompt tokens. Unfortunately, identifying theoptimal blocks for prompts within each self-supervised ViT for diverse futurescenarios is a costly process. To mitigate this problem, we propose a simpleyet effective method that learns a gate for each ViT block to adjust itsintervention into the prompt tokens. With our method, prompt tokens areselectively influenced by blocks that require steering for task adaptation. Ourmethod outperforms VPT variants in FGVC and VTAB image classification andADE20K semantic segmentation. The code is available at", "output": "Improving Visual Prompt Tuning for Self-supervised Vision Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a review of techniques used for the detection andtracking of UAVs or drones. There are different techniques that depend oncollecting measurements of the position, velocity, and image of the UAV andthen using them in detection and tracking. Hybrid detection techniques are alsopresented. The paper is a quick reference for a wide spectrum of methods thatare used in the drone detection process.", "output": "A review of UAV Visual Detection and Tracking Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The augmentation parameters matter to few-shot semantic segmentation sincethey directly affect the training outcome by feeding the networks with varyingperturbated samples. However, searching optimal augmentation parameters forfew-shot segmentation models without annotations is a challenge that currentmethods fail to address. In this paper, we first propose a framework todetermine the ``optimal'' parameters without human annotations by solving adistribution-matching problem between the intra-instance and intra-classsimilarity distribution, with the intra-instance similarity describing thesimilarity between the original sample of a particular anatomy and itsaugmented ones and the intra-class similarity representing the similaritybetween the selected sample and the others in the same class. Extensiveexperiments demonstrate the superiority of our optimized augmentation inboosting few-shot segmentation models. We greatly improve the top competingmethod by 1.27% and 1.11% on Abd-MRI and Abd-CT datasets, respectively, andeven achieve a significant improvement for SSL-ALP on the left kidney by 3.39%on the Abd-CT dataset.", "output": "Unsupervised augmentation optimization for few-shot medical image segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work considers supervised learning to count from images and theircorresponding point annotations. Where density-based counting methods typicallyuse the point annotations only to create Gaussian-density maps, which act asthe supervision signal, the starting point of this work is that pointannotations have counting potential beyond density map generation. We introducetwo methods that repurpose the available point annotations to enhance countingperformance. The first is a counting-specific augmentation that leverages pointannotations to simulate occluded objects in both input and density images toenhance the network's robustness to occlusions. The second method, foregrounddistillation, generates foreground masks from the point annotations, from whichwe train an auxiliary network on images with blacked-out backgrounds. By doingso, it learns to extract foreground counting knowledge without interferencefrom the background. These methods can be seamlessly integrated with existingcounting advances and are adaptable to different loss functions. We demonstratecomplementary effects of the approaches, allowing us to achieve robust countingresults even in challenging scenarios such as background clutter, occlusion,and varying crowd densities. Our proposed approach achieves strong countingresults on multiple datasets, including ShanghaiTech Part_A and Part_B,UCF_QNRF, JHU-Crowd++, and NWPU-Crowd.", "output": "Focus for Free in Density-Based Counting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image anonymization is widely adapted in practice to comply with privacyregulations in many regions. However, anonymization often degrades the qualityof the data, reducing its utility for computer vision development. In thispaper, we investigate the impact of image anonymization for training computervision models on key computer vision tasks (detection, instance segmentation,and pose estimation). Specifically, we benchmark the recognition drop on commondetection datasets, where we evaluate both traditional and realisticanonymization for faces and full bodies. Our comprehensive experiments reflectthat traditional image anonymization substantially impacts final modelperformance, particularly when anonymizing the full body. Furthermore, we findthat realistic anonymization can mitigate this decrease in performance, whereour experiments reflect a minimal performance drop for face anonymization. Ourstudy demonstrates that realistic anonymization can enable privacy-preservingcomputer vision development with minimal performance degradation across a rangeof important computer vision benchmarks.", "output": "Does Image Anonymization Impact Computer Vision Training?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Mesogeos, a large-scale multi-purpose dataset for wildfiremodeling in the Mediterranean. Mesogeos integrates variables representingwildfire drivers (meteorology, vegetation, human activity) and historicalrecords of wildfire ignitions and burned areas for 17 years (2006-2022). It isdesigned as a cloud-friendly spatio-temporal dataset, namely a datacube,harmonizing all variables in a grid of 1km x 1km x 1-day resolution. Thedatacube structure offers opportunities to assess machine learning (ML) usagein various wildfire modeling tasks. We extract two ML-ready datasets thatestablish distinct tracks to demonstrate this potential: (1) short-termwildfire danger forecasting and (2) final burned area estimation given thepoint of ignition. We define appropriate metrics and baselines to evaluate theperformance of models in each track. By publishing the datacube, along with thecode to create the ML datasets and models, we encourage the community to fosterthe implementation of additional tracks for mitigating the increasing threat ofwildfires in the Mediterranean.", "output": "Mesogeos: A multi-purpose dataset for data-driven wildfire modeling in the Mediterranean."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reconstructing category-specific objects from a single image is a challengingtask that requires inferring the geometry and appearance of an object from alimited viewpoint. Existing methods typically rely on local feature retrievalbased on re-projection with known camera intrinsic, which are slow and prone todistortion at viewpoints distant from the input image. In this paper, wepresent Variable Radiance Field (VRF), a novel framework that can efficientlyreconstruct category-specific objects from a single image without known cameraparameters. Our key contributions are: (1) We parameterize the geometry andappearance of the object using a multi-scale global feature extractor, whichavoids frequent point-wise feature retrieval and camera dependency. We alsopropose a contrastive learning-based pretraining strategy to improve thefeature extractor. (2) We reduce the geometric complexity of the object bylearning a category template, and use hypernetworks to generate a small neuralradiance field for fast and instance-specific rendering. (3) We align eachtraining instance to the template space using a learned similaritytransformation, which enables semantic-consistent learning across differentobjects. We evaluate our method on the CO3D dataset and show that itoutperforms existing methods in terms of quality and speed. We also demonstrateits applicability to shape interpolation and object placement tasks.", "output": "Variable Radiance Field for Real-Life Category-Specifc Reconstruction from Single Image."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Egocentric action recognition is essential for healthcare and assistivetechnology that relies on egocentric cameras because it allows for theautomatic and continuous monitoring of activities of daily living (ADLs)without requiring any conscious effort from the user. This study explores thefeasibility of using 2D hand and object pose information for egocentric actionrecognition. While current literature focuses on 3D hand pose information, ourwork shows that using 2D skeleton data is a promising approach for hand-basedaction classification, might offer privacy enhancement, and could be lesscomputationally demanding. The study uses a state-of-the-art transformer-basedmethod to classify sequences and achieves validation results of 94%,outperforming other existing solutions. The accuracy of the test subset dropsto 76%, indicating the need for further generalization improvement. Thisresearch highlights the potential of 2D hand and object pose information foraction recognition tasks and offers a promising alternative to 3D-basedmethods.", "output": "Human Action Recognition in Egocentric Perspective Using 2D Object and Hands Pose."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The state of the art of many learning tasks, e.g., image classification, isadvanced by collecting larger datasets and then training larger models on them.As the outcome, the increasing computational cost is becoming unaffordable. Inthis paper, we investigate how to prune the large-scale datasets, and thusproduce an informative subset for training sophisticated deep models withnegligible performance drop. We propose a simple yet effective dataset pruningmethod by exploring both the prediction uncertainty and training dynamics. Toour knowledge, this is the first work to study dataset pruning on large-scaledatasets, i.e., ImageNet-1K and ImageNet-21K, and advanced models, i.e., SwinTransformer and ConvNeXt. Extensive experimental results indicate that ourmethod outperforms the state of the art and achieves 75% lossless compressionratio on both ImageNet-1K and ImageNet-21K. The code and pruned datasets areavailable at ", "output": "Large-scale Dataset Pruning with Dynamic Uncertainty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The remarkable capabilities of pretrained image diffusion models have beenutilized not only for generating fixed-size images but also for creatingpanoramas. However, naive stitching of multiple images often results in visibleseams. Recent techniques have attempted to address this issue by performingjoint diffusions in multiple windows and averaging latent features inoverlapping regions. However, these approaches, which focus on seamless montagegeneration, often yield incoherent outputs by blending different scenes withina single image. To overcome this limitation, we propose SyncDiffusion, aplug-and-play module that synchronizes multiple diffusions through gradientdescent from a perceptual similarity loss. Specifically, we compute thegradient of the perceptual loss using the predicted denoised images at eachdenoising step, providing meaningful guidance for achieving coherent montages.Our experimental results demonstrate that our method produces significantlymore coherent outputs compared to previous methods (66.35% vs. 33.65% in ouruser study) while still maintaining fidelity (as assessed by GIQA) andcompatibility with the input prompt (as measured by CLIP score).", "output": "SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the existence of various benchmarks for evaluating natural languageprocessing models, we argue that human exams are a more suitable means ofevaluating general intelligence for large language models (LLMs), as theyinherently demand a much wider range of abilities such as languageunderstanding, domain knowledge, and problem-solving skills. To this end, weintroduce M3Exam, a novel benchmark sourced from real and official human examquestions for evaluating LLMs in a multilingual, multimodal, and multilevelcontext. M3Exam exhibits three unique characteristics: (1) multilingualism,encompassing questions from multiple countries that require strong multilingualproficiency and cultural knowledge; (2) multimodality, accounting for themultimodal nature of many exam questions to test the model's multimodalunderstanding capability; and (3) multilevel structure, featuring exams fromthree critical educational periods to comprehensively assess a model'sproficiency at different levels. In total, M3Exam contains 12,317 questions in9 diverse languages with three educational levels, where about 23% of thequestions require processing images for successful solving. We assess theperformance of top-performing LLMs on M3Exam and find that current models,including GPT-4, still struggle with multilingual text, particularly inlow-resource and non-Latin script languages. Multimodal LLMs also performpoorly with complex multimodal questions. We believe that M3Exam can be avaluable resource for comprehensively evaluating LLMs by examining theirmultilingual and multimodal abilities and tracking their development. Data andevaluation code is available at url{", "output": "M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fashionable image generation aims to synthesize images of diverse fashionprevalent around the globe, helping fashion designers in real-timevisualization by giving them a basic customized structure of how a specificdesign preference would look in real life and what further improvements can bemade for enhanced customer satisfaction. Moreover, users can alone interact andgenerate fashionable images by just giving a few simple prompts. Recently,diffusion models have gained popularity as generative models owing to theirflexibility and generation of realistic images from Gaussian noise. Latentdiffusion models are a type of generative model that use diffusion processes tomodel the generation of complex data, such as images, audio, or text. They arecalled \"latent\" because they learn a hidden representation, or latent variable,of the data that captures its underlying structure. We propose a methodexploiting the equivalence between diffusion models and energy-based models(EBMs) and suggesting ways to compose multiple probability distributions. Wedescribe a pipeline on how our method can be used specifically for newfashionable outfit generation and virtual try-on using LLM-guided text-to-imagegeneration. Our results indicate that using an LLM to refine the prompts to thelatent diffusion model assists in generating globally creative and culturallydiversified fashion styles and reducing bias.", "output": "Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Characteristics such as low contrast and significant organ shape variationsare often exhibited in medical images. The improvement of segmentationperformance in medical imaging is limited by the generally insufficientadaptive capabilities of existing attention mechanisms. An efficient ChannelPrior Convolutional Attention (CPCA) method is proposed in this paper,supporting the dynamic distribution of attention weights in both channel andspatial dimensions. Spatial relationships are effectively extracted whilepreserving the channel prior by employing a multi-scale depth-wiseconvolutional module. The ability to focus on informative channels andimportant regions is possessed by CPCA. A segmentation network called CPCANetfor medical image segmentation is proposed based on CPCA. CPCANet is validatedon two publicly available datasets. Improved segmentation performance isachieved by CPCANet while requiring fewer computational resources throughcomparisons with state-of-the-art algorithms. Our code is publicly available aturl{", "output": "Channel prior convolutional attention for medical image segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have been remarkably successful in data synthesis. Suchsuccesses have also driven diffusion models to apply to sensitive data, such ashuman face data, but this might bring about severe privacy concerns. In thiswork, we systematically present the first privacy study about propertyinference attacks against diffusion models, in which adversaries aim to extractsensitive global properties of the training set from a diffusion model, such asthe proportion of the training data for certain sensitive properties.Specifically, we consider the most practical attack scenario: adversaries areonly allowed to obtain synthetic data. Under this realistic scenario, weevaluate the property inference attacks on different types of samplers anddiffusion models. A broad range of evaluations shows that various diffusionmodels and their samplers are all vulnerable to property inference attacks.Furthermore, one case study on off-the-shelf pre-trained diffusion models alsodemonstrates the effectiveness of the attack in practice. Finally, we propose anew model-agnostic plug-in method PriSampler to mitigate the property inferenceof diffusion models. PriSampler can be directly applied to well-traineddiffusion models and support both stochastic and deterministic sampling.Extensive experiments illustrate the effectiveness of our defense and it makesadversaries infer the proportion of properties as close as random guesses.PriSampler also shows its significantly superior performance to diffusionmodels trained with differential privacy on both model utility and defenseperformance.", "output": "PriSampler: Mitigating Property Inference of Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer-based attack adopts the adversarial examples generated on thesurrogate model to attack various models, making it applicable in the physicalworld and attracting increasing interest. Recently, various adversarial attackshave emerged to boost adversarial transferability from different perspectives.In this work, inspired by the fact that flat local minima are correlated withgood generalization, we assume and empirically validate that adversarialexamples at a flat local region tend to have good transferability byintroducing a penalized gradient norm to the original loss function. Sincedirectly optimizing the gradient regularization norm is computationallyexpensive and intractable for generating adversarial examples, we propose anapproximation optimization method to simplify the gradient update of theobjective function. Specifically, we randomly sample an example and adopt thefirst-order gradient to approximate the second-order Hessian matrix, whichmakes computing more efficient by interpolating two Jacobian matrices.Meanwhile, in order to obtain a more stable gradient direction, we randomlysample multiple examples and average the gradients of these examples to reducethe variance due to random sampling during the iterative process. Extensiveexperimental results on the ImageNet-compatible dataset show that the proposedmethod can generate adversarial examples at flat local regions, andsignificantly improve the adversarial transferability on either normallytrained models or adversarially trained models than the state-of-the-artattacks.", "output": "Boosting Adversarial Transferability by Achieving Flat Local Maxima."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative adversarial networks (GANs) have shown remarkable success in imagesynthesis, making GAN models themselves commercially valuable to legitimatemodel owners. Therefore, it is critical to technically protect the intellectualproperty of GANs. Prior works need to tamper with the training set or trainingprocess, and they are not robust to emerging model extraction attacks. In thispaper, we propose a new ownership protection method based on the commoncharacteristics of a target model and its stolen models. Our method can bedirectly applicable to all well-trained GANs as it does not require retrainingtarget models. Extensive experimental results show that our new method canachieve the best protection performance, compared to the state-of-the-artmethods. Finally, we demonstrate the effectiveness of our method with respectto the number of generations of model extraction attacks, the number ofgenerated samples, different datasets, as well as adaptive attacks.", "output": "Ownership Protection of Generative Adversarial Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised person re-identification has achieved great success through theself-improvement of individual neural networks. However, limited by the lack ofdiversity of discriminant information, a single network has difficulty learningsufficient discrimination ability by itself under unsupervised conditions. Toaddress this limit, we develop a population-based evolutionary gaming (PEG)framework in which a population of diverse neural networks is trainedconcurrently through selection, reproduction, mutation, and population mutuallearning iteratively. Specifically, the selection of networks to preserve ismodeled as a cooperative game and solved by the best-response dynamics, thenthe reproduction and mutation are implemented by cloning and fluctuatinghyper-parameters of networks to learn more diversity, and population mutuallearning improves the discrimination of networks by knowledge distillation fromeach other within the population. In addition, we propose a cross-referencescatter (CRS) to approximately evaluate re-ID models without labeled samplesand adopt it as the criterion of network selection in PEG. CRS measures amodel's performance by indirectly estimating the accuracy of its predictedpseudo-labels according to the cohesion and separation of the feature space.Extensive experiments demonstrate that (1) CRS approximately measures theperformance of models without labeled samples; (2) and PEG produces newstate-of-the-art accuracy for person re-identification, indicating the greatpotential of population-based network cooperative training for unsupervisedlearning.", "output": "Population-Based Evolutionary Gaming for Unsupervised Person Re-identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Exploring robust and efficient association methods has always been animportant issue in multiple-object tracking (MOT). Although existing trackingmethods have achieved impressive performance, congestion and frequentocclusions still pose challenging problems in multi-object tracking. We revealthat performing sparse decomposition on dense scenes is a crucial step toenhance the performance of associating occluded targets. To this end, wepropose a pseudo-depth estimation method for obtaining the relative depth oftargets from 2D images. Secondly, we design a depth cascading matching (DCM)algorithm, which can use the obtained depth information to convert a densetarget set into multiple sparse target subsets and perform data association onthese sparse target subsets in order from near to far. By integrating thepseudo-depth method and the DCM strategy into the data association process, wepropose a new tracker, called SparseTrack. SparseTrack provides a newperspective for solving the challenging crowded scene MOT problem. Only usingIoU matching, SparseTrack achieves comparable performance with thestate-of-the-art (SOTA) methods on the MOT17 and MOT20 benchmarks. Code andmodels are publicly available at url{", "output": "SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Considering the balance of performance and efficiency, sampled point andvoxel methods are usually employed to down-sample dense events into sparseones. After that, one popular way is to leverage a graph model which treats thesparse points/voxels as nodes and adopts graph neural networks (GNNs) to learnthe representation for event data. Although good performance can be obtained,however, their results are still limited mainly due to two issues. (1) Existingevent GNNs generally adopt the additional max (or mean) pooling layer tosummarize all node embeddings into a single graph-level representation for thewhole event data representation. However, this approach fails to capture theimportance of graph nodes and also fails to be fully aware of the noderepresentations. (2) Existing methods generally employ either a sparse point orvoxel graph representation model which thus lacks consideration of thecomplementary between these two types of representation models. To addressthese issues, in this paper, we propose a novel dual point-voxel absorbinggraph representation learning for event stream data representation. To bespecific, given the input event stream, we first transform it into the sparseevent cloud and voxel grids and build dual absorbing graph models for themrespectively. Then, we design a novel absorbing graph convolutional network(AGCN) for our dual absorbing graph representation and learning. The key aspectof the proposed AGCN is its ability to effectively capture the importance ofnodes and thus be fully aware of node representations in summarizing all noderepresentations through the introduced absorbing nodes. Finally, the eventrepresentations of dual learning branches are concatenated together to extractthe complementary information of two cues. The output is then fed into a linearlayer for event data classification.", "output": "Point-Voxel Absorbing Graph Representation Learning for Event Stream based Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intelligent systems that aim at mastering language as humans do must dealwith its semantic underspecification, namely, the possibility for a linguisticsignal to convey only part of the information needed for communication tosucceed. Consider the usages of the pronoun they, which can leave the genderand number of its referent(s) underspecified. Semantic underspecification isnot a bug but a crucial language feature that boosts its storage and processingefficiency. Indeed, human speakers can quickly and effortlessly integratesemantically-underspecified linguistic signals with a wide range ofnon-linguistic information, e.g., the multimodal context, social or culturalconventions, and shared knowledge. Standard NLP models have, in principle, noor limited access to such extra information, while multimodal systems groundinglanguage into other modalities, such as vision, are naturally equipped toaccount for this phenomenon. However, we show that they struggle with it, whichcould negatively affect their performance and lead to harmful consequences whenused for applications. In this position paper, we argue that our communityshould be aware of semantic underspecification if it aims to develop languagetechnology that can successfully interact with human users. We discuss someapplications where mastering it is crucial and outline a few directions towardachieving this goal.", "output": "Dealing with Semantic Underspecification in Multimodal NLP."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scene analysis is essential for enabling autonomous systems, such as mobilerobots, to operate in real-world environments. However, obtaining acomprehensive understanding of the scene requires solving multiple tasks, suchas panoptic segmentation, instance orientation estimation, and sceneclassification. Solving these tasks given limited computing and batterycapabilities on mobile platforms is challenging. To address this challenge, weintroduce an efficient multi-task scene analysis approach, called EMSAFormer,that uses an RGB-D Transformer-based encoder to simultaneously perform theaforementioned tasks. Our approach builds upon the previously publishedEMSANet. However, we show that the dual CNN-based encoder of EMSANet can bereplaced with a single Transformer-based encoder. To achieve this, weinvestigate how information from both RGB and depth data can be effectivelyincorporated in a single encoder. To accelerate inference on robotic hardware,we provide a custom NVIDIA TensorRT extension enabling highly optimization forour EMSAFormer approach. Through extensive experiments on the commonly usedindoor datasets NYUv2, SUNRGB-D, and ScanNet, we show that our approachachieves state-of-the-art performance while still enabling inference with up to39.1 FPS on an NVIDIA Jetson AGX Orin 32 GB.", "output": "Efficient Multi-Task Scene Analysis with RGB-D Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapid development of geometric deep learning techniques, manymesh-based convolutional operators have been proposed to bridge irregular meshstructures and popular backbone networks. In this paper, we show that whileconvolutions are helpful, a simple architecture based exclusively onmulti-layer perceptrons (MLPs) is competent enough to deal with meshclassification and semantic segmentation. Our new network architecture, namedMesh-MLP, takes mesh vertices equipped with the heat kernel signature (HKS) anddihedral angles as the input, replaces the convolution module of a ResNet withMulti-layer Perceptron (MLP), and utilizes layer normalization (LN) to performthe normalization of the layers. The all-MLP architecture operates in anend-to-end fashion and does not include a pooling module. Extensiveexperimental results on the mesh classification/segmentation tasks validate theeffectiveness of the all-MLP architecture.", "output": "Mesh-MLP: An all-MLP Architecture for Mesh Classification and Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning-based medical image segmentation models suffer from performancedegradation when deployed to a new healthcare center. To address this issue,unsupervised domain adaptation and multi-source domain generalization methodshave been proposed, which, however, are less favorable for clinical practicedue to the cost of acquiring target-domain data and the privacy concernsassociated with redistributing the data from multiple source domains. In thispaper, we propose a textbf{C}hannel-level textbf{C}ontrastive textbf{S}ingletextbf{D}omain textbf{G}eneralization (textbf{C$^2$SDG}) model for medicalimage segmentation. In C$^2$SDG, the shallower features of each image and itsstyle-augmented counterpart are extracted and used for contrastive training,resulting in the disentangled style representations and structurerepresentations. The segmentation is performed based solely on the structurerepresentations. Our method is novel in the contrastive perspective thatenables channel-wise feature disentanglement using a single source domain. Weevaluated C$^2$SDG against six SDG methods on a multi-domain joint optic cupand optic disc segmentation benchmark. Our results suggest the effectiveness ofeach module in C$^2$SDG and also indicate that C$^2$SDG outperforms thebaseline and all competing methods with a large margin. The code will beavailable at url{", "output": "Devil is in Channels: Contrastive Single Domain Generalization for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Variational Autoencoder (VAE) is a seminal approach in deep generativemodeling with latent variables. Interpreting its reconstruction process as anonlinear transformation of samples from the latent posterior distribution, weapply the Unscented Transform (UT) -- a well-known distribution approximationused in the Unscented Kalman Filter (UKF) from the field of filtering. A finiteset of statistics called sigma points, sampled deterministically, provides amore informative and lower-variance posterior representation than theubiquitous noise-scaling of the reparameterization trick, while ensuringhigher-quality reconstruction. We further boost the performance by replacingthe Kullback-Leibler (KL) divergence with the Wasserstein distribution metricthat allows for a sharper posterior. Inspired by the two components, we derivea novel, deterministic-sampling flavor of the VAE, the Unscented Autoencoder(UAE), trained purely with regularization-like terms on the per-sampleposterior. We empirically show competitive performance in Fr'echet InceptionDistance (FID) scores over closely-related models, in addition to a lowertraining variance than the VAE.", "output": "Unscented Autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current robotic hand manipulation narrowly operates with objects inpredictable positions in limited environments. Thus, when the location of thetarget object deviates severely from the expected location, a robot sometimesresponds in an unexpected way, especially when it operates with a human. Forsafe robot operation, we propose the EXit-aware Object Tracker (EXOT) on arobot hand camera that recognizes an object's absence during manipulation. Therobot decides whether to proceed by examining the tracker's bounding box outputcontaining the target object. We adopt an out-of-distribution classifier formore accurate object recognition since trackers can mistrack a background as atarget object. To the best of our knowledge, our method is the first approachof applying an out-of-distribution classification technique to a trackeroutput. We evaluate our method on the first-person video benchmark dataset,TREK-150, and on the custom dataset, RMOT-223, that we collect from the UR5erobot. Then we test our tracker on the UR5e robot in real-time with aconveyor-belt sushi task, to examine the tracker's ability to track targetdishes and to determine the exit status. Our tracker shows 38% higherexit-aware performance than a baseline method. The dataset and the code will bereleased at ", "output": "EXOT: Exit-aware Object Tracker for Safe Robotic Manipulation of Moving Object."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a wide range of multimodal tasks, contrastive learning has become aparticularly appealing approach since it can successfully learn representationsfrom abundant unlabeled data with only pairing information (e.g., image-captionor video-audio pairs). Underpinning these approaches is the assumption ofmulti-view redundancy - that shared information between modalities is necessaryand sufficient for downstream tasks. However, in many real-world settings,task-relevant information is also contained in modality-unique regions:information that is only present in one modality but still relevant to thetask. How can we learn self-supervised multimodal representations to captureboth shared and unique information relevant to downstream tasks? This paperproposes FactorCL, a new multimodal representation learning method to go beyondmulti-view redundancy. FactorCL is built from three new contributions: (1)factorizing task-relevant information into shared and unique representations,(2) capturing task-relevant information via maximizing MI lower bounds andremoving task-irrelevant information via minimizing MI upper bounds, and (3)multimodal data augmentations to approximate task relevance without labels. Onlarge-scale real-world datasets, FactorCL captures both shared and uniqueinformation and achieves state-of-the-art results on six benchmarks.", "output": "Factorized Contrastive Learning: Going Beyond Multi-view Redundancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advent of large pre-trained models has brought about a paradigm shift inboth visual representation learning and natural language processing. However,clustering unlabeled images, as a fundamental and classic machine learningproblem, still lacks effective solution, particularly for large-scale datasets.In this paper, we propose a novel image clustering pipeline that leverages thepowerful feature representation of large pre-trained models such as CLIP andcluster images effectively and efficiently at scale. We show that thepre-trained features are significantly more structured by further optimizingthe rate reduction objective. The resulting features may significantly improvethe clustering accuracy, e.g., from 57% to 66% on ImageNet-1k. Furthermore,by leveraging CLIP's image-text binding, we show how the new clustering methodleads to a simple yet effective self-labeling algorithm that successfully workson unlabeled large datasets such as MS-COCO and LAION-Aesthetics. We willrelease the code in ", "output": "Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Structural magnetic resonance imaging (sMRI) has shown great clinical valueand has been widely used in deep learning (DL) based computer-aided braindisease diagnosis. Previous approaches focused on local shapes and textures insMRI that may be significant only within a particular domain. The learnedrepresentations are likely to contain spurious information and have a poorgeneralization ability in other diseases and datasets. To facilitate capturingmeaningful and robust features, it is necessary to first comprehensivelyunderstand the intrinsic pattern of the brain that is not restricted within asingle data/task domain. Considering that the brain is a complex connectome ofinterlinked neurons, the connectional properties in the brain have strongbiological significance, which is shared across multiple domains and coversmost pathological information. In this work, we propose a connectional stylecontextual representation learning model (CS-CRL) to capture the intrinsicpattern of the brain, used for multiple brain disease diagnosis. Specifically,it has a vision transformer (ViT) encoder and leverages mask reconstruction asthe proxy task and Gram matrices to guide the representation of connectionalinformation. It facilitates the capture of global context and the aggregationof features with biological plausibility. The results indicate that CS-CRLachieves superior accuracy in multiple brain disease diagnosis tasks across sixdatasets and three diseases and outperforms state-of-the-art models.Furthermore, we demonstrate that CS-CRL captures more brain-network-likeproperties, better aggregates features, is easier to optimize and is morerobust to noise, which explains its superiority in theory. Our source code willbe released soon.", "output": "Connectional-Style-Guided Contextual Representation Learning for Brain Disease Diagnosis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The quality of three-dimensional reconstruction is a key factor affecting theeffectiveness of its application in areas such as virtual reality (VR) andaugmented reality (AR) technologies. Neural Radiance Fields (NeRF) can generaterealistic images from any viewpoint. It simultaneously reconstructs the shape,lighting, and materials of objects, and without surface defects, which breaksdown the barrier between virtuality and reality. The potential spatialcorrespondences displayed by NeRF between reconstructed scenes and real-worldscenes offer a wide range of practical applications possibilities. Despitesignificant progress in 3D reconstruction since NeRF were introduced, thereremains considerable room for exploration and experimentation. NeRF-basedmodels are susceptible to interference issues caused by colored \"fog\" noise.Additionally, they frequently encounter instabilities and failures whileattempting to reconstruct unbounded scenes. Moreover, the model takes asignificant amount of time to converge, making it even more challenging to usein such scenarios. Our approach, coined Enhance-NeRF, which adopts joint colorto balance low and high reflectivity objects display, utilizes a decodingarchitecture with prior knowledge to improve recognition, and employsmulti-layer performance evaluation mechanisms to enhance learning capacity. Itachieves reconstruction of outdoor scenes within one hour under single-cardcondition. Based on experimental results, Enhance-NeRF partially enhancesfitness capability and provides some support to outdoor scene reconstruction.The Enhance-NeRF method can be used as a plug-and-play component, making iteasy to integrate with other NeRF-based models. The code is available at:", "output": "Enhance-NeRF: Multiple Performance Evaluation for Neural Radiance Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we present a pipeline to reconstruct the 3D pose of a horsefrom 4 simultaneous surveillance camera recordings. Our environment posesinteresting challenges to tackle, such as limited field view of the cameras anda relatively closed and small environment. The pipeline consists of training a2D markerless pose estimation model to work on every viewpoint, then applyingit to the videos and performing triangulation. We present numerical evaluationof the results (error analysis), as well as show the utility of the achievedposes in downstream tasks of selected behavioral predictions. Our analysis ofthe predictive model for equine behavior showed a bias towards pain-inducedhorses, which aligns with our understanding of how behavior varies acrosspainful and healthy subjects.", "output": "Predictive Modeling of Equine Activity Budgets Using a 3D Skeleton Reconstructed from Surveillance Recordings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a real-time GeoAI workflow for large-scale imageanalysis and the segmentation of Arctic permafrost features at afine-granularity. Very high-resolution (0.5m) commercial imagery is used inthis analysis. To achieve real-time prediction, our workflow employs alightweight, deep learning-based instance segmentation model, SparseInst, whichintroduces and uses Instance Activation Maps to accurately locate the positionof objects within the image scene. Experimental results show that the model canachieve better accuracy of prediction at a much faster inference speed than thepopular Mask-RCNN model.", "output": "Real-time GeoAI for High-resolution Mapping and Segmentation of Arctic Permafrost Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Almost all advanced face swapping approaches use reconstruction as the proxytask, i.e., supervision only exists when the target and source belong to thesame person. Otherwise, lacking pixel-level supervision, these methods strugglefor source identity preservation. This paper proposes to construct reliablesupervision, dubbed cycle triplets, which serves as the image-level guidancewhen the source identity differs from the target one during training.Specifically, we use face reenactment and blending techniques to synthesize theswapped face from real images in advance, where the synthetic face preservessource identity and target attributes. However, there may be some artifacts insuch a synthetic face. To avoid the potential artifacts and drive thedistribution of the network output close to the natural one, we reversely takesynthetic images as input while the real face as reliable supervision duringthe training stage of face swapping. Besides, we empirically find that theexisting methods tend to lose lower-face details like face shape and mouth fromthe source. This paper additionally designs a FixerNet, providingdiscriminative embeddings of lower faces as an enhancement. Our face swappingframework, named ReliableSwap, can boost the performance of any existing faceswapping network with negligible overhead. Extensive experiments demonstratethe efficacy of our ReliableSwap, especially in identity preservation. Theproject page is ", "output": "ReliableSwap: Boosting General Face Swapping Via Reliable Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-image generative models have enabled high-resolution image synthesisacross different domains, but require users to specify the content they wish togenerate. In this paper, we consider the inverse problem -- given a collectionof different images, can we discover the generative concepts that representeach image? We present an unsupervised approach to discover generative conceptsfrom a collection of images, disentangling different art styles in paintings,objects, and lighting from kitchen scenes, and discovering image classes givenImageNet images. We show how such generative concepts can accurately representthe content of images, be recombined and composed to generate new artistic andhybrid images, and be further used as a representation for downstreamclassification tasks.", "output": "Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The definition of anomaly detection is the identification of an unexpectedevent. Real-time detection of extreme events such as wildfires, cyclones, orfloods using satellite data has become crucial for disaster management.Although several earth-observing satellites provide information aboutdisasters, satellites in the geostationary orbit provide data at intervals asfrequent as every minute, effectively creating a video from space. There aremany techniques that have been proposed to identify anomalies in surveillancevideos; however, the available datasets do not have dynamic behavior, so wediscuss an anomaly framework that can work on very high-frequency datasets tofind very fast-moving anomalies. In this work, we present a diffusion modelwhich does not need any motion component to capture the fast-moving anomaliesand outperforms the other baseline methods.", "output": "Anomaly Detection in Satellite Videos using Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Car-following is a control process in which a following vehicle (FV) adjustsits acceleration to keep a safe distance from the lead vehicle (LV). Recently,there has been a booming of data-driven models that enable more accuratemodeling of car-following through real-world driving datasets. Although thereare several public datasets available, their formats are not always consistent,making it challenging to determine the state-of-the-art models and how well anew model performs compared to existing ones. In contrast, research fields suchas image recognition and object detection have benchmark datasets likeImageNet, Microsoft COCO, and KITTI. To address this gap and promote thedevelopment of microscopic traffic flow modeling, we establish a publicbenchmark dataset for car-following behavior modeling. The benchmark consistsof more than 80K car-following events extracted from five public drivingdatasets using the same criteria. These events cover diverse situationsincluding different road types, various weather conditions, and mixed trafficflows with autonomous vehicles. Moreover, to give an overview of currentprogress in car-following modeling, we implemented and tested representativebaseline models with the benchmark. Results show that the deep deterministicpolicy gradient (DDPG) based model performs competitively with a lower MSE forspacing compared to traditional intelligent driver model (IDM) andGazis-Herman-Rothery (GHR) models, and a smaller collision rate compared tofully connected neural network (NN) and long short-term memory (LSTM) models inmost datasets. The established benchmark will provide researchers withconsistent data formats and metrics for cross-comparing different car-followingmodels, promoting the development of more accurate models. We open-source ourdataset and implementation code in", "output": "FollowNet: A Comprehensive Benchmark for Car-Following Behavior Modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The field of image blending has gained significant popularity in recent yearsdue to its ability to create visually stunning content. The main objective ofimage blending is to merge an object from one image onto another seamlessly,with minor masking adjustments. With the recent development of SAM, which candetect and segment targets in images automatically. Our approach (1) combinessemantic object detection and segmentation with corresponding mask generationto automatically fuse images and (2) introduces the use of PAN for furtherquality enhancement during the fusion process. Our approach surpasses manyclassical visual fusion models in various performance indicators such as PSNR,SSIM, and Realism. Notably, our process is highly efficient and speedy, makingit widely applicable in industrial settings. This new process has the potentialto revolutionize visual content creation and improve productivity acrossvarious industries.", "output": "Automatic Image Blending Algorithm Based on SAM and DINO."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a new large-scale image restoration dataset, calledHQ-50K, which contains 50,000 high-quality images with rich texture details andsemantic diversity. We analyze existing image restoration datasets from fivedifferent perspectives, including data scale, resolution, compression rates,texture details, and semantic coverage. However, we find that all of thesedatasets are deficient in some aspects. In contrast, HQ-50K considers all ofthese five aspects during the data curation process and meets all requirements.We also present a new Degradation-Aware Mixture of Expert (DAMoE) model, whichenables a single model to handle multiple corruption types and unknown levels.Our extensive experiments demonstrate that HQ-50K consistently improves theperformance on various image restoration tasks, such as super-resolution,denoising, dejpeg, and deraining. Furthermore, our proposed DAMoE, trained onour dataset, outperforms existing state-of-the-art unified models designed formultiple restoration tasks and levels. The dataset and code are available aturl{", "output": "HQ-50K: A Large-scale, High-quality Dataset for Image Restoration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present a simple yet effective provable method (namedABSGD) for addressing the data imbalance or label noise problem in deeplearning. Our method is a simple modification to momentum SGD where we assignan individual importance weight to each sample in the mini-batch. Theindividual-level weight of sampled data is systematically proportional to theexponential of a scaled loss value of the data, where the scaling factor isinterpreted as the regularization parameter in the framework ofdistributionally robust optimization (DRO). Depending on whether the scalingfactor is positive or negative, ABSGD is guaranteed to converge to a stationarypoint of an information-regularized min-max or min-min DRO problem,respectively. Compared with existing class-level weighting schemes, our methodcan capture the diversity between individual examples within each class.Compared with existing individual-level weighting methods using meta-learningthat require three backward propagations for computing mini-batch stochasticgradients, our method is more efficient with only one backward propagation ateach iteration as in standard deep learning methods. ABSGD is flexible enoughto combine with other robust losses without any additional cost. Our empiricalstudies on several benchmark datasets demonstrate the effectiveness of theproposed method.footnote{Code is availableat:url{", "output": "Attentional-Biased Stochastic Gradient Descent."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural network-based image compression has been extensively studied.However, the model robustness which is crucial to practical application islargely overlooked. We propose to examine the robustness of prevailing learnedimage compression models by injecting negligible adversarial perturbation intothe original source image. Severe distortion in decoded reconstruction revealsthe general vulnerability in existing methods regardless of their settings(e.g., network architecture, loss function, quality scale). A variety ofdefense strategies including geometric self-ensemble based pre-processing, andadversarial training, are investigated against the adversarial attack toimprove the model's robustness. Later the defense efficiency is furtherexemplified in real-life image recompression case studies. Overall, ourmethodology is simple, effective, and generalizable, making it attractive fordeveloping robust learned image compression solutions. All materials are madepublicly accessible at for reproducibleresearch.", "output": "Towards Robust Neural Image Compression: Adversarial Attack and Model Finetuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain generalization (DG) for person re-identification (ReID) is achallenging problem, as access to target domain data is not permitted duringthe training process. Most existing DG ReID methods update the featureextractor and classifier parameters based on the same features. This commonpractice causes the model to overfit to existing feature styles in the sourcedomain, resulting in sub-optimal generalization ability on target domains. Tosolve this problem, we propose a novel style interleaved learning (IL)framework. Unlike conventional learning strategies, IL incorporates two forwardpropagations and one backward propagation for each iteration. We employ thefeatures of interleaved styles to update the feature extractor and classifiersusing different forward propagations, which helps to prevent the model fromoverfitting to certain domain styles. To generate interleaved feature styles,we further propose a new feature stylization approach. It produces a wide rangeof meaningful styles that are both different and independent from the originalstyles in the source domain, which caters to the IL methodology. Extensiveexperimental results show that our model not only consistently outperformsstate-of-the-art methods on large-scale benchmarks for DG ReID, but also hasclear advantages in computational efficiency. The code is available at", "output": "Style Interleaved Learning for Generalizable Person Re-identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we tackle an emerging computer vision task, open-vocabularyuniversal image segmentation, that aims to perform semantic/instance/panopticsegmentation (background semantic labeling + foreground instance segmentation)for arbitrary categories of text-based descriptions in inference time. We firstbuild a baseline method by directly adopting pre-trained CLIP models withoutfinetuning or distillation. We then develop MaskCLIP, a Transformer-basedapproach with a MaskCLIP Visual Encoder, which is an encoder-only module thatseamlessly integrates mask tokens with a pre-trained ViT CLIP model forsemantic/instance segmentation and class prediction. MaskCLIP learns toefficiently and effectively utilize pre-trained partial/dense CLIP featureswithin the MaskCLIP Visual Encoder that avoids the time-consumingstudent-teacher training process. MaskCLIP outperforms previous methods forsemantic/instance/panoptic segmentation on ADE20K and PASCAL datasets. We showqualitative illustrations for MaskCLIP with online custom categories. Projectwebsite: ", "output": "Open-Vocabulary Universal Image Segmentation with MaskCLIP."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The problem of anticipating human actions is an inherently uncertain one.However, we can reduce this uncertainty if we have a sense of the goal that theactor is trying to achieve. Here, we present an action anticipation model thatleverages goal information for the purpose of reducing the uncertainty infuture predictions. Since we do not possess goal information or the observedactions during inference, we resort to visual representation to encapsulateinformation about both actions and goals. Through this, we derive a novelconcept called abstract goal which is conditioned on observed sequences ofvisual features for action anticipation. We design the abstract goal as adistribution whose parameters are estimated using a variational recurrentnetwork. We sample multiple candidates for the next action and introduce a goalconsistency measure to determine the best candidate that follows from theabstract goal. Our method obtains impressive results on the very challengingEpic-Kitchens55 (EK55), EK100, and EGTEA Gaze+ datasets. We obtain absoluteimprovements of +13.69, +11.24, and +5.19 for Top-1 verb, Top-1 noun, and Top-1action anticipation accuracy respectively over prior state-of-the-art methodsfor seen kitchens (S1) of EK55. Similarly, we also obtain significantimprovements in the unseen kitchens (S2) set for Top-1 verb (+10.75), noun(+5.84) and action (+2.87) anticipation. Similar trend is observed for EGTEAGaze+ dataset, where absolute improvement of +9.9, +13.1 and +6.8 is obtainedfor noun, verb, and action anticipation. It is through the submission of thispaper that our method is currently the new state-of-the-art for actionanticipation in EK55 and EGTEA Gaze+ Code available at", "output": "Predicting the Next Action by Modeling the Abstract Goal."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current multimodal models, aimed at solving Vision and Language (V+L) tasks,predominantly repurpose Vision Encoders (VE) as feature extractors. While manyVEs -- of different architectures, trained on different data and objectives --are publicly available, they are not designed for the downstream V+L tasks.Nonetheless, most current work assumes that a textit{single} pre-trained VEcan serve as a general-purpose encoder. In this work, we focus on analysis andaim to understand whether the information stored within different VEs iscomplementary, i.e. if providing the model with features from multiple VEs canimprove the performance on a target task, and how they are combined. Weexhaustively experiment with three popular VEs on six downstream V+L tasks andanalyze the attention and VE-dropout patterns. Our analyses suggest thatdiverse VEs complement each other, resulting in improved downstream V+L taskperformance, where the improvements are not due to simple ensemble effects(i.e. the performance does not always improve when increasing the number ofencoders). We demonstrate that future VEs, which are not textit{repurposed},but explicitly textit{designed} for V+L tasks, have the potential of improvingperformance on the target V+L tasks.", "output": "One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For the task of change detection (CD) in remote sensing images, deepconvolution neural networks (CNNs)-based methods have recently aggregatedtransformer modules to improve the capability of global feature extraction.However, they suffer degraded CD performance on small changed areas due to thesimple single-scale integration of deep CNNs and transformer modules. Toaddress this issue, we propose a hybrid network based on multi-scaleCNN-transformer structure, termed MCTNet, where the multi-scale global andlocal information are exploited to enhance the robustness of the CD performanceon changed areas with different sizes. Especially, we design the ConvTransblock to adaptively aggregate global features from transformer modules andlocal features from CNN layers, which provides abundant global-local featureswith different scales. Experimental results demonstrate that our MCTNetachieves better detection performance than existing state-of-the-art CDmethods.", "output": "MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in Optical Remote Sensing Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To deploy and operate deep neural models in production, the quality of theirpredictions, which might be contaminated benignly or manipulated maliciously byinput distributional deviations, must be monitored and assessed. Specifically,we study the case of monitoring the healthy operation of a deep neural network(DNN) receiving a stream of data, with the aim of detecting inputdistributional deviations over which the quality of the network's predictionsis potentially damaged. Using selective prediction principles, we propose adistribution deviation detection method for DNNs. The proposed method isderived from a tight coverage generalization bound computed over a sample ofinstances drawn from the true underlying distribution. Based on this bound, ourdetector continuously monitors the operation of the network out-of-sample overa test window and fires off an alarm whenever a deviation is detected. Ournovel detection method performs on-par or better than the state-of-the-art,while consuming substantially lower computation time (five orders of magnitudereduction) and space complexities. Unlike previous methods, which require atleast linear dependence on the size of the source distribution for eachdetection, rendering them inapplicable to ``Google-Scale'' datasets, ourapproach eliminates this dependence, making it suitable for real-worldapplications.", "output": "Window-Based Distribution Shift Detection for Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised visual representation methods are closing the gap withsupervised learning performance. These methods rely on maximizing thesimilarity between embeddings of related synthetic inputs created through dataaugmentations. This can be seen as a task that encourages embeddings to leaveout factors modified by these augmentations, i.e. to be invariant to them.However, this only considers one side of the trade-off in the choice of theaugmentations: they need to strongly modify the images to avoid simple solutionshortcut learning (e.g. using only color histograms), but on the other hand,augmentations-related information may be lacking in the representations forsome downstream tasks (e.g. color is important for birds and flowerclassification). Few recent works proposed to mitigate the problem of usingonly an invariance task by exploring some form of equivariance toaugmentations. This has been performed by learning additional embeddingsspace(s), where some augmentation(s) cause embeddings to differ, yet in anon-controlled way. In this work, we introduce EquiMod a generic equivariancemodule that structures the learned latent space, in the sense that our modulelearns to predict the displacement in the embedding space caused by theaugmentations. We show that applying that module to state-of-the-art invariancemodels, such as SimCLR and BYOL, increases the performances on CIFAR10 andImageNet datasets. Moreover, while our model could collapse to a trivialequivariance, i.e. invariance, we observe that it instead automatically learnsto keep some augmentations-related information beneficial to therepresentations.", "output": "EquiMod: An Equivariance Module to Improve Self-Supervised Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Physical simulations produce excellent predictions of weather effects. Neuralradiance fields produce SOTA scene models. We describe a novel NeRF-editingprocedure that can fuse physical simulations with NeRF models of scenes,producing realistic movies of physical phenomena in those scenes. Ourapplication -- Climate NeRF -- allows people to visualize what climate changeoutcomes will do to them. ClimateNeRF allows us to render realistic weathereffects, including smog, snow, and flood. Results can be controlled withphysically meaningful variables like water level. Qualitative and quantitativestudies show that our simulated results are significantly more realistic thanthose from SOTA 2D image editing and SOTA 3D NeRF stylization.", "output": "ClimateNeRF: Extreme Weather Synthesis in Neural Radiance Field."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-object interaction is one of the most important visual cues and wepropose a novel way to represent human-object interactions for egocentricaction anticipation. We propose a novel transformer variant to modelinteractions by computing the change in the appearance of objects and humanhands due to the execution of the actions and use those changes to refine thevideo representation. Specifically, we model interactions between hands andobjects using Spatial Cross-Attention (SCA) and further infuse contextualinformation using Trajectory Cross-Attention to obtain environment-refinedinteraction tokens. Using these tokens, we construct an interaction-centricvideo representation for action anticipation. We term our model InAViT whichachieves state-of-the-art action anticipation performance on large-scaleegocentric datasets EPICKTICHENS100 (EK100) and EGTEA Gaze+. InAViT outperformsother visual transformer-based methods including object-centric videorepresentation. On the EK100 evaluation server, InAViT is the top-performingmethod on the public leaderboard (at the time of submission) where itoutperforms the second-best model by 3.3% on mean-top5 recall.", "output": "Interaction Visual Transformer for Egocentric Action Anticipation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The fidelity of Generative Adversarial Networks (GAN) inversion is impeded byOut-Of-Domain (OOD) areas (e.g., background, accessories) in the image.Detecting the OOD areas beyond the generation ability of the pre-trained modeland blending these regions with the input image can enhance fidelity. The\"invertibility mask\" figures out these OOD areas, and existing methods predictthe mask with the reconstruction error. However, the estimated mask is usuallyinaccurate due to the influence of the reconstruction error in the In-Domain(ID) area. In this paper, we propose a novel framework that enhances thefidelity of human face inversion by designing a new module to decompose theinput images to ID and OOD partitions with invertibility masks. Unlike previousworks, our invertibility detector is simultaneously learned with a spatialalignment module. We iteratively align the generated features to the inputgeometry and reduce the reconstruction error in the ID regions. Thus, the OODareas are more distinguishable and can be precisely predicted. Then, we improvethe fidelity of our results by blending the OOD areas from the input image withthe ID GAN inversion results. Our method produces photo-realistic results forreal-world human face image inversion and manipulation. Extensive experimentsdemonstrate our method's superiority over existing methods in the quality ofGAN inversion and attribute manipulation.", "output": "Out-of-domain GAN inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D reconstruction and novel view rendering can greatly benefit from geometricpriors when the input views are not sufficient in terms of coverage andinter-view baselines. Deep learning of geometric priors from 2D images oftenrequires each image to be represented in a $2D$ canonical frame and the priorto be learned in a given or learned $3D$ canonical frame. In this paper, givenonly the relative poses of the cameras, we show how to learn priors frommultiple views equivariant to coordinate frame transformations by proposing an$SE(3)$-equivariant convolution and transformer in the space of rays in 3D.This enables the creation of a light field that remains equivariant to thechoice of coordinate frame. The light field as defined in our work, refers bothto the radiance field and the feature field defined on the ray space. We modelthe ray space, the domain of the light field, as a homogeneous space of $SE(3)$and introduce the $SE(3)$-equivariant convolution in ray space. Depending onthe output domain of the convolution, we present convolution-based$SE(3)$-equivariant maps from ray space to ray space and to $mathbb{R}^3$. Ourmathematical framework allows us to go beyond convolution to$SE(3)$-equivariant attention in the ray space. We demonstrate how to tailorand adapt the equivariant convolution and transformer in the tasks ofequivariant neural rendering and $3D$ reconstruction from multiple views. Wedemonstrate $SE(3)$-equivariance by obtaining robust results in roto-translateddatasets without performing transformation augmentation.", "output": "Equivariant Light Field Convolution and Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video-language embeddings are a promising avenue for injecting semantics intovisual representations, but existing methods capture only short-termassociations between seconds-long video clips and their accompanying text. Wepropose HierVL, a novel hierarchical video-language embedding thatsimultaneously accounts for both long-term and short-term associations. Astraining data, we take videos accompanied by timestamped text descriptions ofhuman actions, together with a high-level text summary of the activitythroughout the long video (as are available in Ego4D). We introduce ahierarchical contrastive training objective that encourages text-visualalignment at both the clip level and video level. While the clip-levelconstraints use the step-by-step descriptions to capture what is happening inthat instant, the video-level constraints use the summary text to capture whyit is happening, i.e., the broader context for the activity and the intent ofthe actor. Our hierarchical scheme yields a clip representation thatoutperforms its single-level counterpart as well as a long-term videorepresentation that achieves SotA results on tasks requiring long-term videomodeling. HierVL successfully transfers to multiple challenging downstreamtasks (in EPIC-KITCHENS-100, Charades-Ego, HowTo100M) in both zero-shot andfine-tuned settings.", "output": "HierVL: Learning Hierarchical Video-Language Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face morphing attacks seek to deceive a Face Recognition (FR) system bypresenting a morphed image consisting of the biometric qualities from twodifferent identities with the aim of triggering a false acceptance with one ofthe two identities, thereby presenting a significant threat to biometricsystems. The success of a morphing attack is dependent on the ability of themorphed image to represent the biometric characteristics of both identitiesthat were used to create the image. We present a novel morphing attack thatuses a Diffusion-based architecture to improve the visual fidelity of the imageand the ability of the morphing attack to represent characteristics from bothidentities. We demonstrate the effectiveness of the proposed attack byevaluating its visual fidelity via the Frechet Inception Distance (FID). Also,extensive experiments are conducted to measure the vulnerability of FR systemsto the proposed attack. The ability of a morphing attack detector to detect theproposed attack is measured and compared against two state-of-the-art GAN-basedmorphing attacks along with two Landmark-based attacks. Additionally, a novelmetric to measure the relative strength between different morphing attacks isintroduced and evaluated.", "output": "Leveraging Diffusion For Strong and High Quality Face Morphing Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-centric computer vision (HCCV) data curation practices often neglectprivacy and bias concerns, leading to dataset retractions and unfair models.Further, HCCV datasets constructed through nonconsensual web scraping lack thenecessary metadata for comprehensive fairness and robustness evaluations.Current remedies address issues post hoc, lack persuasive justification foradoption, or fail to provide proper contextualization for appropriateapplication. Our research focuses on proactive, domain-specific recommendationsfor curating HCCV datasets, addressing privacy and bias. We adopt an ante hocreflective perspective and draw from current practices and guidelines, guidedby the ethical framework of principlism.", "output": "Principlism Guided Responsible Data Curation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have achieved great success in image synthesis throughiterative noise estimation using deep neural networks. However, the slowinference, high memory consumption, and computation intensity of the noiseestimation model hinder the efficient adoption of diffusion models. Althoughpost-training quantization (PTQ) is considered a go-to compression method forother tasks, it does not work out-of-the-box on diffusion models. We propose anovel PTQ method specifically tailored towards the unique multi-timesteppipeline and model architecture of the diffusion models, which compresses thenoise estimation network to accelerate the generation process. We identify thekey difficulty of diffusion model quantization as the changing outputdistributions of noise estimation networks over multiple time steps and thebimodal activation distribution of the shortcut layers within the noiseestimation network. We tackle these challenges with timestep-aware calibrationand split shortcut quantization in this work. Experimental results show thatour proposed method is able to quantize full-precision unconditional diffusionmodels into 4-bit while maintaining comparable performance (small FID change ofat most 2.34 compared to >100 for traditional PTQ) in a training-free manner.Our approach can also be applied to text-guided image generation, where we canrun stable diffusion in 4-bit weights with high generation quality for thefirst time.", "output": "Q-Diffusion: Quantizing Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Designing a real-time framework for the spatio-temporal action detection taskis still a challenge. In this paper, we propose a novel real-time actiondetection framework, YOWOv2. In this new framework, YOWOv2 takes advantage ofboth the 3D backbone and 2D backbone for accurate action detection. Amulti-level detection pipeline is designed to detect action instances ofdifferent scales. To achieve this goal, we carefully build a simple andefficient 2D backbone with a feature pyramid network to extract differentlevels of classification features and regression features. For the 3D backbone,we adopt the existing efficient 3D CNN to save development time. By combining3D backbones and 2D backbones of different sizes, we design a YOWOv2 familyincluding YOWOv2-Tiny, YOWOv2-Medium, and YOWOv2-Large. We also introduce thepopular dynamic label assignment strategy and anchor-free mechanism to make theYOWOv2 consistent with the advanced model architecture design. With ourimprovement, YOWOv2 is significantly superior to YOWO, and can still keepreal-time detection. Without any bells and whistles, YOWOv2 achieves 87.0 %frame mAP and 52.8 % video mAP with over 20 FPS on the UCF101-24. On the AVA,YOWOv2 achieves 21.7 % frame mAP with over 20 FPS. Our code is available on", "output": "YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The imbalance problem is widespread in the field of machine learning, whichalso exists in multimodal learning areas caused by the intrinsic discrepancybetween modalities of samples. Recent works have attempted to solve themodality imbalance problem from algorithm perspective, however, they do notfully analyze the influence of modality bias in datasets. Concretely, existingmultimodal datasets are usually collected under specific tasks, where onemodality tends to perform better than other ones in most conditions. In thiswork, to comprehensively explore the influence of modality bias, we first splitexisting datasets into different subsets by estimating sample-wise modalitydiscrepancy. We surprisingly find that: the multimodal models with existingimbalance algorithms consistently perform worse than the unimodal one onspecific subsets, in accordance with the modality bias. To further explore theinfluence of modality bias and analyze the effectiveness of existing imbalancealgorithms, we build a balanced audiovisual dataset, with uniformly distributedmodality discrepancy over the whole dataset. We then conduct extensiveexperiments to re-evaluate existing imbalance algorithms and draw someinteresting findings: existing algorithms only provide a compromise betweenmodalities and suffer from the large modality discrepancy of samples. We hopethat these findings could facilitate future research on the modality imbalanceproblem.", "output": "Balanced Audiovisual Dataset for Imbalance Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since their introduction, diffusion models have quickly become the prevailingapproach to generative modeling in many domains. They can be interpreted aslearning the gradients of a time-varying sequence of log-probability densityfunctions. This interpretation has motivated classifier-based andclassifier-free guidance as methods for post-hoc control of diffusion models.In this work, we build upon these ideas using the score-based interpretation ofdiffusion models, and explore alternative ways to condition, modify, and reusediffusion models for tasks involving compositional generation and guidance. Inparticular, we investigate why certain types of composition fail using currenttechniques and present a number of solutions. We conclude that the sampler (notthe model) is responsible for this failure and propose new samplers, inspiredby MCMC, which enable successful compositional generation. Further, we proposean energy-based parameterization of diffusion models which enables the use ofnew compositional operators and more sophisticated, Metropolis-correctedsamplers. Intriguingly we find these samplers lead to notable improvements incompositional generation across a wide set of problems such asclassifier-guided ImageNet modeling and compositional text-to-image generation.", "output": "Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Photorealistic object appearance modeling from 2D images is a constant topicin vision and graphics. While neural implicit methods (such as Neural RadianceFields) have shown high-fidelity view synthesis results, they cannot relightthe captured objects. More recent neural inverse rendering approaches haveenabled object relighting, but they represent surface properties as simpleBRDFs, and therefore cannot handle translucent objects. We proposeObject-Centric Neural Scattering Functions (OSFs) for learning to reconstructobject appearance from only images. OSFs not only support free-viewpoint objectrelighting, but also can model both opaque and translucent objects. Whileaccurately modeling subsurface light transport for translucent objects can behighly complex and even intractable for neural methods, OSFs learn toapproximate the radiance transfer from a distant light to an outgoing directionat any spatial location. This approximation avoids explicitly modeling complexsubsurface scattering, making learning a neural implicit model tractable.Experiments on real and synthetic data show that OSFs accurately reconstructappearances for both opaque and translucent objects, allowing faithfulfree-viewpoint relighting as well as scene composition.", "output": "Learning Object-Centric Neural Scattering Functions for Free-Viewpoint Relighting and Scene Composition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper addresses fine-grained object detection in scenarios with limitedcomputing resources, such as edge computing. In particular, we focus on ascenario where a single image contains objects of the same category but varyingsizes, and we desire an algorithm that can not only recognize the physicalclass of objects but also detect their size. Deep learning (DL), particularlythrough the use of deep neural networks (DNNs), has become the primary approachto object detection. However, obtaining accurate fine-grained detectionrequires a large DNN model and a significant amount of annotated data,presenting a challenge to solve our problem particularly forresource-constrained scenarios. To this end, we propose an approach thatutilizes commonsense knowledge to assist a coarse-grained object detector inachieving accurate size-related fine-grained detection results. Specifically,we introduce a commonsense knowledge inference module (CKIM) that processes thecoarse-grained labels produced by a benchmark coarse-grained DL detector togenerate size-related fine-grained labels. Our CKIM explores both crisp-ruleand fuzzy-rule based inference methods, with the latter being employed tohandle ambiguity in the target size-related labels. We implement our methodbased on two modern DL detectors, including Mobilenet-SSD, and YOLOv7-tiny.Experimental results demonstrate that our approach achieves accuratefine-grained detections with a reduced amount of annotated data, and smallermodel size. Our code is available at ", "output": "Commonsense Knowledge Assisted Deep Learning with Application to Size-Related Fine-Grained Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As one of the most fundamental techniques in multimodal learning, cross-modalmatching aims to project various sensory modalities into a shared featurespace. To achieve this, massive and correctly aligned data pairs are requiredfor model training. However, unlike unimodal datasets, multimodal datasets areextremely harder to collect and annotate precisely. As an alternative, theco-occurred data pairs (e.g., image-text pairs) collected from the Internethave been widely exploited in the area. Unfortunately, the cheaply collecteddataset unavoidably contains many mismatched data pairs, which have been provento be harmful to the model's performance. To address this, we propose a generalframework called BiCro (Bidirectional Cross-modal similarity consistency),which can be easily integrated into existing cross-modal matching models andimprove their robustness against noisy data. Specifically, BiCro aims toestimate soft labels for noisy data pairs to reflect their true correspondencedegree. The basic idea of BiCro is motivated by that -- taking image-textmatching as an example -- similar images should have similar textualdescriptions and vice versa. Then the consistency of these two similarities canbe recast as the estimated soft labels to train the matching model. Theexperiments on three popular cross-modal matching datasets demonstrate that ourmethod significantly improves the noise-robustness of various matching models,and surpass the state-of-the-art by a clear margin.", "output": "BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Skin and subcutaneous diseases rank high among the leading contributors tothe global burden of nonfatal diseases, impacting a considerable portion of thepopulation. Nonetheless, the field of dermatology diagnosis faces threesignificant hurdles. Firstly, there is a shortage of dermatologists accessibleto diagnose patients, particularly in rural regions. Secondly, accuratelyinterpreting skin disease images poses a considerable challenge. Lastly,generating patient-friendly diagnostic reports is usually a time-consuming andlabor-intensive task for dermatologists. To tackle these challenges, we presentSkinGPT-4, which is the world's first interactive dermatology diagnostic systempowered by an advanced visual large language model. SkinGPT-4 leverages afine-tuned version of MiniGPT-4, trained on an extensive collection of skindisease images (comprising 52,929 publicly available and proprietary images)along with clinical concepts and doctors' notes. We designed a two-steptraining process to allow SkinGPT to express medical features in skin diseaseimages with natural language and make accurate diagnoses of the types of skindiseases. With SkinGPT-4, users could upload their own skin photos fordiagnosis, and the system could autonomously evaluate the images, identifiesthe characteristics and categories of the skin conditions, performs in-depthanalysis, and provides interactive treatment recommendations. Meanwhile,SkinGPT-4's local deployment capability and commitment to user privacy alsorender it an appealing choice for patients in search of a dependable andprecise diagnosis of their skin ailments. To demonstrate the robustness ofSkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, whichwere independently reviewed by certified dermatologists, and showed thatSkinGPT-4 could provide accurate diagnoses of skin diseases.", "output": "SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-view Clustering (MVC) has achieved significant progress, with manyefforts dedicated to learn knowledge from multiple views. However, mostexisting methods are either not applicable or require additional steps forincomplete MVC. Such a limitation results in poor-quality clusteringperformance and poor missing view adaptation. Besides, noise or outliers mightsignificantly degrade the overall clustering performance, which are not handledwell by most existing methods. In this paper, we propose a novel unifiedframework for incomplete and complete MVC named self-learning symmetricmulti-view probabilistic clustering (SLS-MPC). SLS-MPC proposes a novelsymmetric multi-view probability estimation and equivalently transformsmulti-view pairwise posterior matching probability into composition of eachview's individual distribution, which tolerates data missing and might extendto any number of views. Then, SLS-MPC proposes a novel self-learningprobability function without any prior knowledge and hyper-parameters to learneach view's individual distribution. Next, graph-context-aware refinement withpath propagation and co-neighbor propagation is used to refine pairwiseprobability, which alleviates the impact of noise and outliers. Finally,SLS-MPC proposes a probabilistic clustering algorithm to adjust clusteringassignments by maximizing the joint probability iteratively without categoryinformation. Extensive experiments on multiple benchmarks show that SLS-MPCoutperforms previous state-of-the-art methods.", "output": "Self-Learning Symmetric Multi-view Probabilistic Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large models have recently played a dominant role in natural languageprocessing and multimodal vision-language learning. It remains less exploredabout their efficacy in text-related visual tasks. We conducted a comprehensivestudy of existing publicly available multimodal models, evaluating theirperformance in text recognition (document text, artistic text, handwrittentext, scene text), text-based visual question answering (document text, scenetext, and bilingual text), key information extraction (receipts, documents, andnutrition facts) and handwritten mathematical expression recognition. Ourfindings reveal strengths and weaknesses in these models, which primarily relyon semantic understanding for word recognition and exhibit inferior perceptionof individual character shapes. They also display indifference towards textlength and have limited capabilities in detecting finegrained features inimages. Consequently, these results demonstrate that even the current mostpowerful large multimodal models cannot match domain-specific methods intraditional text tasks and face greater challenges in more complex tasks. Mostimportantly, the baseline results showcased in this study could provide afoundational framework for the conception and assessment of innovativestrategies targeted at enhancing zero-shot multimodal techniques. Evaluationpipeline is available at ", "output": "On the Hidden Mystery of OCR in Large Multimodal Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have recently dominated image synthesis and other relatedgenerative tasks. However, the iterative denoising process is expensive incomputations at inference time, making diffusion models less practical forlow-latency and scalable real-world applications. Post-training quantization ofdiffusion models can significantly reduce the model size and accelerate thesampling process without requiring any re-training. Nonetheless, applyingexisting post-training quantization methods directly to low-bit diffusionmodels can significantly impair the quality of generated samples. Specifically,for each denoising step, quantization noise leads to deviations in theestimated mean and mismatches with the predetermined variance schedule.Moreover, as the sampling process proceeds, the quantization noise mayaccumulate, resulting in a low signal-to-noise ratio (SNR) in late denoisingsteps. To address these challenges, we propose a unified formulation for thequantization noise and diffusion perturbed noise in the quantized denoisingprocess. We first disentangle the quantization noise into its correlated andresidual uncorrelated parts regarding its full-precision counterpart. Thecorrelated part can be easily corrected by estimating the correlationcoefficient. For the uncorrelated part, we calibrate the denoising varianceschedule to absorb the excess variance resulting from quantization. Moreover,we propose a mixed-precision scheme to choose the optimal bitwidth for eachdenoising step, which prefers low bits to accelerate the early denoising stepswhile high bits maintain the high SNR for the late steps. Extensive experimentsdemonstrate that our method outperforms previous post-training quantizeddiffusion models in generating high-quality samples, with only a 0.06 increasein FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving19.9x bit operations.", "output": "PTQD: Accurate Post-Training Quantization for Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Controllable scene synthesis aims to create interactive environments forvarious industrial use cases. Scene graphs provide a highly suitable interfaceto facilitate these applications by abstracting the scene context in a compactmanner. Existing methods, reliant on retrieval from extensive databases orpre-trained shape embeddings, often overlook scene-object and object-objectrelationships, leading to inconsistent results due to their limited generationcapacity. To address this issue, we present CommonScenes, a fully generativemodel that converts scene graphs into corresponding controllable 3D scenes,which are semantically realistic and conform to commonsense. Our pipelineconsists of two branches, one predicting the overall scene layout via avariational auto-encoder and the other generating compatible shapes via latentdiffusion, capturing global scene-object and local inter-object relationshipswhile preserving shape diversity. The generated scenes can be manipulated byediting the input scene graph and sampling the noise in the diffusion model.Due to lacking a scene graph dataset offering high-quality object-level mesheswith relations, we also construct SG-FRONT, enriching the off-the-shelf indoordataset 3D-FRONT with additional scene graph labels. Extensive experiments areconducted on SG-FRONT where CommonScenes shows clear advantages over othermethods regarding generation consistency, quality, and diversity. Codes and thedataset will be released upon acceptance.", "output": "CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Building generalizable AI models is one of the primary challenges in thehealthcare domain. While radiologists rely on generalizable descriptive rulesof abnormality, Neural Network (NN) models suffer even with a slight shift ininput distribution (e.g., scanner type). Fine-tuning a model to transferknowledge from one domain to another requires a significant amount of labeleddata in the target domain. In this paper, we develop an interpretable modelthat can be efficiently fine-tuned to an unseen target domain with minimalcomputational cost. We assume the interpretable component of NN to beapproximately domain-invariant. However, interpretable models typicallyunderperform compared to their Blackbox (BB) variants. We start with a BB inthe source domain and distill it into a emph{mixture} of shallow interpretablemodels using human-understandable concepts. As each interpretable model coversa subset of data, a mixture of interpretable models achieves comparableperformance as BB. Further, we use the pseudo-labeling technique fromsemi-supervised learning (SSL) to learn the concept classifier in the targetdomain, followed by fine-tuning the interpretable models in the target domain.We evaluate our model using a real-life large-scale chest-X-ray (CXR)classification dataset. The code is available at:url{", "output": "Distilling BlackBox to Interpretable models for Efficient Transfer Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks are notorious for being overconfident predictors, posing asignificant challenge to their safe deployment in real-world applications.While feature normalization has garnered considerable attention within the deeplearning literature, current train-time regularization methods forOut-of-Distribution(OOD) detection are yet to fully exploit this potential.Indeed, the naive incorporation of feature normalization within neural networksdoes not guarantee substantial improvement in OOD detection performance. Inthis work, we introduce T2FNorm, a novel approach to transforming features tohyperspherical space during training, while employing non-transformed space forOOD-scoring purposes. This method yields a surprising enhancement in OODdetection capabilities without compromising model accuracy inin-distribution(ID). Our investigation demonstrates that the proposed techniquesubstantially diminishes the norm of the features of all samples, more so inthe case of out-of-distribution samples, thereby addressing the prevalentconcern of overconfidence in neural networks. The proposed method alsosignificantly improves various post-hoc OOD detection methods.", "output": "T2FNorm: Extremely Simple Scaled Train-time Feature Normalization for OOD Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The intrinsic difficulty in adapting deep learning models to non-stationaryenvironments limits the applicability of neural networks to real-world tasks.This issue is critical in practical supervised learning settings, such as theones in which a pre-trained model computes projections toward a latent spacewhere different task predictors are sequentially learned over time. As a matterof fact, incrementally fine-tuning the whole model to better adapt to new tasksusually results in catastrophic forgetting, with decreasing performance overthe past experiences and losing valuable knowledge from the pre-training stage.In this paper, we propose a novel strategy to make the fine-tuning proceduremore effective, by avoiding to update the pre-trained part of the network andlearning not only the usual classification head, but also a set ofnewly-introduced learnable parameters that are responsible for transforming theinput data. This process allows the network to effectively leverage thepre-training knowledge and find a good trade-off between plasticity andstability with modest computational efforts, thus especially suitable foron-the-edge settings. Our experiments on four image classification problems ina continual learning setting confirm the quality of the proposed approach whencompared to several fine-tuning procedures and to popular continual learningmethods.", "output": "Continual Learning with Pretrained Backbones by Tuning in the Input Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video instance segmentation (VIS) is a critical task with diverseapplications, including autonomous driving and video editing. Existing methodsoften underperform on complex and long videos in real world, primarily due totwo factors. Firstly, offline methods are limited by the tightly-coupledmodeling paradigm, which treats all frames equally and disregards theinterdependencies between adjacent frames. Consequently, this leads to theintroduction of excessive noise during long-term temporal alignment. Secondly,online methods suffer from inadequate utilization of temporal information. Totackle these challenges, we propose a decoupling strategy for VIS by dividingit into three independent sub-tasks: segmentation, tracking, and refinement.The efficacy of the decoupling strategy relies on two crucial elements: 1)attaining precise long-term alignment outcomes via frame-by-frame associationduring tracking, and 2) the effective utilization of temporal informationpredicated on the aforementioned accurate alignment outcomes during refinement.We introduce a novel referring tracker and temporal refiner to construct thetextbf{D}ecoupled textbf{VIS} framework (textbf{DVIS}). DVIS achieves newSOTA performance in both VIS and VPS, surpassing the current SOTA methods by7.3 AP and 9.6 VPQ on the OVIS and VIPSeg datasets, which are the mostchallenging and realistic benchmarks. Moreover, thanks to the decouplingstrategy, the referring tracker and temporal refiner are super light-weight(only 1.69% of the segmenter FLOPs), allowing for efficient training andinference on a single GPU with 11G memory. The code is available athref{", "output": "DVIS: Decoupled Video Instance Segmentation Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthesizing novel view images from a few views is a challenging butpractical problem. Existing methods often struggle with producing high-qualityresults or necessitate per-object optimization in such few-view settings due tothe insufficient information provided. In this work, we explore leveraging thestrong 2D priors in pre-trained diffusion models for synthesizing novel viewimages. 2D diffusion models, nevertheless, lack 3D awareness, leading todistorted image synthesis and compromising the identity. To address theseproblems, we propose DreamSparse, a framework that enables the frozenpre-trained diffusion model to generate geometry and identity-consistent novelview image. Specifically, DreamSparse incorporates a geometry module designedto capture 3D features from sparse views as a 3D prior. Subsequently, a spatialguidance model is introduced to convert these 3D feature maps into spatialinformation for the generative process. This information is then used to guidethe pre-trained diffusion model, enabling it to generate geometricallyconsistent images without tuning it. Leveraging the strong image priors in thepre-trained diffusion models, DreamSparse is capable of synthesizinghigh-quality novel views for both object and scene-level images andgeneralising to open-set images. Experimental results demonstrate that ourframework can effectively synthesize novel view images from sparse views andoutperforms baselines in both trained and open-set category images. Moreresults can be found on our project page:", "output": "DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces an approach, named DFormer, for universal imagesegmentation. The proposed DFormer views universal image segmentation task as adenoising process using a diffusion model. DFormer first adds various levels ofGaussian noise to ground-truth masks, and then learns a model to predictdenoising masks from corrupted masks. Specifically, we take deep pixel-levelfeatures along with the noisy masks as inputs to generate mask features andattention masks, employing diffusion-based decoder to perform mask predictiongradually. At inference, our DFormer directly predicts the masks andcorresponding categories from a set of randomly-generated masks. Extensiveexperiments reveal the merits of our proposed contributions on different imagesegmentation tasks: panoptic segmentation, instance segmentation, and semanticsegmentation. Our DFormer outperforms the recent diffusion-based panopticsegmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val2017 set.Further, DFormer achieves promising semantic segmentation performanceoutperforming the recent diffusion-based method by 2.2% on ADE20K val set. Oursource code and models will be publicly on ", "output": "DFormer: Diffusion-guided Transformer for Universal Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video panoptic segmentation is a challenging task that serves as thecornerstone of numerous downstream applications, including video editing andautonomous driving. We believe that the decoupling strategy proposed by DVISenables more effective utilization of temporal information for both \"thing\" and\"stuff\" objects. In this report, we successfully validated the effectiveness ofthe decoupling strategy in video panoptic segmentation. Finally, our methodachieved a VPQ score of 51.4 and 53.7 in the development and test phases,respectively, and ultimately ranked 1st in the VPS track of the 2nd PVUWChallenge. The code is available at ", "output": "1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial lights commonly leave strong lens flare artifacts on the imagescaptured at night, degrading both the visual quality and performance of visionalgorithms. Existing flare removal approaches mainly focus on removing daytimeflares and fail in nighttime cases. Nighttime flare removal is challenging dueto the unique luminance and spectrum of artificial lights, as well as thediverse patterns and image degradation of the flares. The scarcity of thenighttime flare removal dataset constraints the research on this crucial task.In this paper, we introduce Flare7K++, the first comprehensive nighttime flareremoval dataset, consisting of 962 real-captured flare images (Flare-R) and7,000 synthetic flares (Flare7K). Compared to Flare7K, Flare7K++ isparticularly effective in eliminating complicated degradation around the lightsource, which is intractable by using synthetic flares alone. Besides, theprevious flare removal pipeline relies on the manual threshold and blur kernelsettings to extract light sources, which may fail when the light sources aretiny or not overexposed. To address this issue, we additionally provide theannotations of light sources in Flare7K++ and propose a new end-to-end pipelineto preserve the light source while removing lens flares. Our dataset andpipeline offer a valuable foundation and benchmark for future investigationsinto nighttime flare removal studies. Extensive experiments demonstrate thatFlare7K++ supplements the diversity of existing flare datasets and pushes thefrontier of nighttime flare removal towards real-world scenarios.", "output": "Flare7K++: Mixing Synthetic and Real Datasets for Nighttime Flare Removal and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most self-supervised learning (SSL) methods often work on curated datasetswhere the object-centric assumption holds. This assumption breaks down inuncurated images. Existing scene image SSL methods try to find the two viewsfrom original scene images that are well matched or dense, which is bothcomplex and computationally heavy. This paper proposes a conceptually differentpipeline: first find regions that are coarse objects (with adequateobjectness), crop them out as pseudo object-centric images, then any SSL methodcan be directly applied as in a real object-centric dataset. That is, coarsecrops benefits scene images SSL. A novel cropping strategy that produces coarseobject box is proposed. The new pipeline and cropping strategy successfullylearn quality features from uncurated datasets without ImageNet. Experimentsshow that our pipeline outperforms existing SSL methods (MoCo-v2, DenseCL andMAE) on classification, detection and segmentation tasks. We further conductextensively ablations to verify that: 1) the pipeline do not rely on pretrainedmodels; 2) the cropping strategy is better than existing object discoverymethods; 3) our method is not sensitive to hyperparameters and dataaugmentations.", "output": "Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parking guidance systems have recently become a popular trend as a part ofthe smart cities' paradigm of development. The crucial part of such systems isthe algorithm allowing drivers to search for available parking lots acrossregions of interest. The classic approach to this task is based on theapplication of neural network classifiers to camera records. However, existingsystems demonstrate a lack of generalization ability and appropriate testingregarding specific visual conditions. In this study, we extensively evaluatestate-of-the-art parking lot occupancy detection algorithms, compare theirprediction quality with the recently emerged vision transformers, and propose anew pipeline based on EfficientNet architecture. Performed computationalexperiments have demonstrated the performance increase in the case of ourmodel, which was evaluated on 5 different datasets.", "output": "Revising deep learning methods in parking lot occupancy detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Instruction tuning has significantly advanced large language models (LLMs)such as ChatGPT, enabling them to align with human instructions across diversetasks. However, progress in open vision-language models (VLMs) has been limiteddue to the scarcity of high-quality instruction datasets. To tackle thischallenge and promote research in the vision-language field, we introduce theMulti-Modal, Multilingual Instruction Tuning (M$^3$IT) dataset, designed tooptimize VLM alignment with human instructions. Our M$^3$IT dataset comprises40 carefully curated datasets, including 2.4 million instances and 400 manuallywritten task instructions, reformatted into a vision-to-text structure. Keytasks are translated into 80 languages with an advanced translation system,ensuring broader accessibility. M$^3$IT surpasses previous datasets regardingtask coverage, instruction number and instance scale. Moreover, we developYing-VLM, a VLM model trained on our M$^3$IT dataset, showcasing its potentialto answer complex questions requiring world knowledge, generalize to unseenvideo tasks, and comprehend unseen instructions in Chinese. We haveopen-sourced the dataset to encourage further research.", "output": "M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have attracted significant attention due to their remarkableability to create content and generate data for tasks such as imageclassification. However, the usage of diffusion models to generate high-qualityobject detection data remains an underexplored area, where not only theimage-level perceptual quality but also geometric conditions such as boundingboxes and camera views are essential. Previous studies have utilized eithercopy-paste synthesis or layout-to-image (L2I) generation with specificallydesigned modules to encode semantic layouts. In this paper, we proposeGeoDiffusion, a simple framework that can flexibly translate various geometricconditions into text prompts and empower the pre-trained text-to-image (T2I)diffusion models for high-quality detection data generation. Unlike previousL2I methods, our GeoDiffusion is able to encode not only bounding boxes butalso extra geometric conditions such as camera views in self-driving scenes.Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methodswhile maintaining 4x training time faster. To the best of our knowledge, thisis the first work to adopt diffusion models for layout-to-image generation withgeometric conditions and demonstrate that L2I-generated images can bebeneficial for improving the performance of object detectors.", "output": "Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human activity recognition (HAR) is a time series classification task thatfocuses on identifying the motion patterns from human sensor readings. Adequatedata is essential but a major bottleneck for training a generalizable HARmodel, which assists customization and optimization of online web applications.However, it is costly in time and economy to collect large-scale labeled datain reality, i.e., the low-resource challenge. Meanwhile, data collected fromdifferent persons have distribution shifts due to different living habits, bodyshapes, age groups, etc. The low-resource and distribution shift challenges aredetrimental to HAR when applying the trained model to new unseen subjects. Inthis paper, we propose a novel approach called Diverse and Discriminativerepresentation Learning (DDLearn) for generalizable low-resource HAR. DDLearnsimultaneously considers diversity and discrimination learning. With theconstructed self-supervised learning task, DDLearn enlarges the data diversityand explores the latent activity properties. Then, we propose a diversitypreservation module to preserve the diversity of learned features by enlargingthe distribution divergence between the original and augmented domains.Meanwhile, DDLearn also enhances semantic discrimination by learningdiscriminative representations with supervised contrastive learning. Extensiveexperiments on three public HAR datasets demonstrate that our methodsignificantly outperforms state-of-art methods by an average accuracyimprovement of 9.5% under the low-resource distribution shift scenarios, whilebeing a generic, explainable, and flexible framework.", "output": "Generalizable Low-Resource Activity Recognition with Diverse and Discriminative Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Generative Diffusion Models (GDMs) have showcased their remarkablecapabilities in learning and generating images. A large community of GDMs hasnaturally emerged, further promoting the diversified applications of GDMs invarious fields. However, this unrestricted proliferation has raised seriousconcerns about copyright protection. For example, artists including paintersand photographers are becoming increasingly concerned that GDMs couldeffortlessly replicate their unique creative works without authorization. Inresponse to these challenges, we introduce a novel watermarking scheme,DiffusionShield, tailored for GDMs. DiffusionShield protects images fromcopyright infringement by GDMs through encoding the ownership information intoan imperceptible watermark and injecting it into the images. Its watermark canbe easily learned by GDMs and will be reproduced in their generated images. Bydetecting the watermark from generated images, copyright infringement can beexposed with evidence. Benefiting from the uniformity of the watermarks and thejoint optimization method, DiffusionShield ensures low distortion of theoriginal image, high watermark detection performance, and the ability to embedlengthy messages. We conduct rigorous and comprehensive experiments to show theeffectiveness of DiffusionShield in defending against infringement by GDMs andits superiority over traditional watermarking methods.", "output": "DiffusionShield: A Watermark for Copyright Protection against Generative Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Learning, and in particular, Deep Neural Network (DNN) is nowadayswidely used in many scenarios, including safety-critical applications such asautonomous driving. In this context, besides energy efficiency and performance,reliability plays a crucial role since a system failure can jeopardize humanlife. As with any other device, the reliability of hardware architecturesrunning DNNs has to be evaluated, usually through costly fault injectioncampaigns. This paper explores the approximation and fault resiliency of DNNaccelerators. We propose to use approximate (AxC) arithmetic circuits toagilely emulate errors in hardware without performing fault injection on theDNN. To allow fast evaluation of AxC DNN, we developed an efficient GPU-basedsimulation framework. Further, we propose a fine-grain analysis of faultresiliency by examining fault propagation and masking in networks", "output": "Special Session: Approximation and Fault Resiliency of DNN Accelerators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate yield forecasting is essential for making informed policies andlong-term decisions for food security. Earth Observation (EO) data and machinelearning algorithms play a key role in providing a comprehensive and timelyview of crop conditions from field to national scales. However, machinelearning algorithms' prediction accuracy is often harmed by spatialheterogeneity caused by exogenous factors not reflected in remote sensing data,such as differences in crop management strategies. In this paper, we proposeand investigate a simple technique called state-wise additive bias toexplicitly address the cross-region yield heterogeneity in Kazakhstan. Comparedto baseline machine learning models (Random Forest, CatBoost, XGBoost), ourmethod reduces the overall RMSE by 8.9% and the highest state-wise RMSE by28.37%. The effectiveness of state-wise additive bias indicates machinelearning's performance can be significantly improved by explicitly addressingthe spatial heterogeneity, motivating future work on spatial-aware machinelearning algorithms for yield forecasts as well as for general geospatialforecasting problems.", "output": "Improve State-Level Wheat Yield Forecasts in Kazakhstan on GEOGLAM's EO Data by Leveraging A Simple Spatial-Aware Technique."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the Compressed Sensing (CS) problem, which is the problem of findingthe most sparse vector that satisfies a set of linear measurements up to somenumerical tolerance. CS is a central problem in Statistics, Operations Researchand Machine Learning which arises in applications such as signal processing,data compression and image reconstruction. We introduce an $ell_2$ regularizedformulation of CS which we reformulate as a mixed integer second order coneprogram. We derive a second order cone relaxation of this problem and show thatunder mild conditions on the regularization parameter, the resulting relaxationis equivalent to the well studied basis pursuit denoising problem. We present asemidefinite relaxation that strengthens the second order cone relaxation anddevelop a custom branch-and-bound algorithm that leverages our second ordercone relaxation to solve instances of CS to certifiable optimality. Ournumerical results show that our approach produces solutions that are on average$6.22%$ more sparse than solutions returned by state of the art benchmarkmethods on synthetic data in minutes. On real world ECG data, for a given$ell_2$ reconstruction error our approach produces solutions that are onaverage $9.95%$ more sparse than benchmark methods, while for a given sparsitylevel our approach produces solutions that have on average $10.77%$ lowerreconstruction error than benchmark methods in minutes.", "output": "Compressed Sensing: A Discrete Optimization Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We address the problem of making Conformal Prediction (CP) intervals locallyadaptive. Most existing methods focus on approximating the object-conditionalvalidity of the intervals by partitioning or re-weighting the calibration set.Our strategy is new and conceptually different. Instead of re-weighting thecalibration data, we redefine the conformity measure through a trainable changeof variables, $A to phi_X(A)$, that depends explicitly on the objectattributes, $X$. Under certain conditions and if $phi_X$ is monotonic in $A$for any $X$, the transformations produce prediction intervals that areguaranteed to be marginally valid and have $X$-dependent sizes. We describe howto parameterize and train $phi_X$ to maximize the interval efficiency.Contrary to other CP-aware training methods, the objective function is smoothand can be minimized through standard gradient methods without approximations.", "output": "On training locally adaptive CP."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The emergence of smart cities demands harnessing advanced technologies likethe Internet of Things (IoT) and Artificial Intelligence (AI) and promises tounlock cities' potential to become more sustainable, efficient, and ultimatelylivable for their inhabitants. This work introduces an intelligent citymanagement system that provides a data-driven approach to three use cases: (i)analyze traffic information to reduce the risk of traffic collisions andimprove driver and pedestrian safety, (ii) identify when and where energyconsumption can be reduced to improve cost savings, and (iii) detectmaintenance issues like potholes in the city's roads and sidewalks, as well asthe beginning of hazards like floods and fires. A case study in Aveiro Citydemonstrates the system's effectiveness in generating actionable insights thatenhance security, energy efficiency, and sustainability, while highlighting thepotential of AI and IoT-driven solutions for smart city development.", "output": "From Data to Action: Exploring AI and IoT-driven Solutions for Smarter Cities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modulation classification is an essential step of signal processing and hasbeen regularly applied in the field of tele-communication. Since variations offrequency with respect to time remains a vital distinction among radio signalshaving different modulation formats, these variations can be used for featureextraction by converting 1-D radio signals into frequency domain. In thispaper, we propose a scheme for Automatic Modulation Classification (AMC) usingmodern architectures of Convolutional Neural Networks (CNN), through generatingspectrum images of eleven different modulation types. Additionally, we performresolution transformation of spectrograms that results up to 99.61% ofcomputational load reduction and 8x faster conversion from the received I/Qdata. This proposed AMC is implemented on CPU and GPU, to recognize digital aswell as analogue signal modulation schemes on signals. The performance isevaluated on existing CNN models including SqueezeNet, Resnet-50,InceptionResnet-V2, Inception-V3, VGG-16 and Densenet-201. Best results of91.2% are achieved in presence of AWGN and other noise impairments in thesignals, stating that the transformed spectrogram-based AMC has goodclassification accuracy as the spectral features are highly discriminant, andCNN based models have capability to extract these high-dimensional features.The spectrograms were created under different SNRs ranging from 5 to 30db witha step size of 5db to observe the experimental results at various SNR levels.The proposed methodology is efficient to be applied in wireless communicationnetworks for real-time applications.", "output": "Modulation Classification Through Deep Learning Using Resolution Transformed Spectrograms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Directed evolution is a molecular biology technique that is transformingprotein engineering by creating proteins with desirable properties andfunctions. However, it is experimentally impossible to perform the deepmutational scanning of the entire protein library due to the enormousmutational space, which scales as $20^N$ , where N is the number of aminoacids. This has led to the rapid growth of AI-assisted directed evolution(AIDE) or AI-assisted protein engineering (AIPE) as an emerging research field.Aided with advanced natural language processing (NLP) techniques, includinglong short-term memory, autoencoder, and transformer, sequence-based embeddingshave been dominant approaches in AIDE and AIPE. Persistent Laplacians, anemerging technique in topological data analysis (TDA), have madestructure-based embeddings a superb option in AIDE and AIPE. We argue that aclass of persistent topological Laplacians (PTLs), including persistentLaplacians, persistent path Laplacians, persistent sheaf Laplacians, persistenthypergraph Laplacians, persistent hyperdigraph Laplacians, and evolutionary deRham-Hodge theory, can effectively overcome the limitations of the current TDAand offer a new generation of more powerful TDA approaches. In the generalframework of topological deep learning, mathematics-assisted directed evolution(MADE) has a great potential for future protein engineering.", "output": "Mathematics-assisted directed evolution and protein engineering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Green Light Optimal Speed Advisory (GLOSA) system suggests speeds to vehiclesto assist them in passing through intersections during green intervals, thusreducing traffic congestion and fuel consumption by minimizing the number ofstops and idle times at intersections. However, previous research has focusedon optimizing the GLOSA algorithm, neglecting the frequency of speed advisoryby the GLOSA system. Specifically, some studies provide speed advisory profileat each decision step, resulting in redundant advisory, while others calculatethe optimal speed for the vehicle only once, which cannot adapt to dynamictraffic. In this paper, we propose an Adaptive Frequency GLOSA (AF-GLOSA) modelbased on Hybrid Proximal Policy Optimization (H-PPO), which employs anactor-critic architecture with a hybrid actor network. The hybrid actor networkconsists of a discrete actor that outputs advisory frequency and a continuousactor that outputs acceleration profiles. Additionally, we design a novelreward function that considers both travel efficiency and fuel consumption. TheAF-GLOSA model is evaluated in comparison to traditional GLOSA andlearning-based GLOSA methods in a three-lane intersection with a traffic signalin SUMO, under three different levels of traffic density. The resultsdemonstrate that the AF-GLOSA model performs best in reducing average stoptimes, fuel consumption and CO2 emissions.", "output": "Adaptive Frequency Green Light Optimal Speed Advisory based on Hybrid Actor-Critic Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Researchers are constantly leveraging new forms of data with the goal ofunderstanding how people perceive the built environment and build thecollective place identity of cities. Latest advancements in generativeartificial intelligence (AI) models have enabled the production of realisticrepresentations learned from vast amounts of data. In this study, we aim totest the potential of generative AI as the source of textual and visualinformation in capturing the place identity of cities assessed by filtereddescriptions and images. We asked questions on the place identity of a set of31 global cities to two generative AI models, ChatGPT and DALL-E2. Sincegenerative AI has raised ethical concerns regarding its trustworthiness, weperformed cross-validation to examine whether the results show similar patternsto real urban settings. In particular, we compared the outputs with Wikipediadata for text and images searched from Google for image. Our results indicatethat generative AI models have the potential to capture the collective image ofcities that can make them distinguishable. This study is among the firstattempts to explore the capabilities of generative AI in understanding humanperceptions of the built environment. It contributes to urban design literatureby discussing future research opportunities and potential limitations.", "output": "Understanding Place Identity with Generative AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As machine learning becomes increasingly prevalent in critical fields such ashealthcare, ensuring the safety and reliability of machine learning systemsbecomes paramount. A key component of reliability is the ability to estimateuncertainty, which enables the identification of areas of high and lowconfidence and helps to minimize the risk of error. In this study, we propose amachine learning pipeline called U-PASS tailored for clinical applications thatincorporates uncertainty estimation at every stage of the process, includingdata acquisition, training, and model deployment. The training process isdivided into a supervised pre-training step and a semi-supervised finetuningstep. We apply our uncertainty-guided deep learning pipeline to the challengingproblem of sleep staging and demonstrate that it systematically improvesperformance at every stage. By optimizing the training dataset, activelyseeking informative samples, and deferring the most uncertain samples to anexpert, we achieve an expert-level accuracy of 85% on a challenging clinicaldataset of elderly sleep apnea patients, representing a significant improvementover the baseline accuracy of 75%. U-PASS represents a promising approach toincorporating uncertainty estimation into machine learning pipelines, therebyimproving their reliability and unlocking their potential in clinical settings.", "output": "U-PASS: an Uncertainty-guided deep learning Pipeline for Automated Sleep Staging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Positron emission tomography (PET) is an important functional medical imagingtechnique often used in the evaluation of certain brain disorders, whosereconstruction problem is ill-posed. The vast majority of reconstructionmethods in PET imaging, both iterative and deep learning, return a singleestimate without quantifying the associated uncertainty. Due to ill-posednessand noise, a single solution can be misleading or inaccurate. Thus, providing ameasure of uncertainty in PET image reconstruction can help medicalpractitioners in making critical decisions. This paper proposes a deeplearning-based method for uncertainty quantification in PET imagereconstruction via posterior sampling. The method is based on training aconditional generative adversarial network whose generator approximatessampling from the posterior in Bayesian inversion. The generator is conditionedon reconstruction from a low-dose PET scan obtained by a conventionalreconstruction method and a high-quality magnetic resonance image and learnedto estimate a corresponding standard-dose PET scan reconstruction. We show thatthe proposed model generates high-quality posterior samples and yieldsphysically-meaningful uncertainty estimates.", "output": "Estimating Uncertainty in PET Image Reconstruction via Deep Posterior Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Proteins perform much of the work in living organisms, and consequently thedevelopment of efficient computational methods for protein representation isessential for advancing large-scale biological research. Most currentapproaches struggle to efficiently integrate the wealth of informationcontained in the protein sequence and structure. In this paper, we propose anovel framework for embedding protein graphs in geometric vector spaces, bylearning an encoder function that preserves the structural distance betweenprotein graphs. Utilizing Graph Neural Networks (GNNs) and Large LanguageModels (LLMs), the proposed framework generates structure- and sequence-awareprotein representations. We demonstrate that our embeddings are successful inthe task of comparing protein structures, while providing a significantspeed-up compared to traditional approaches based on structural alignment. Ourframework achieves remarkable results in the task of protein structureclassification; in particular, when compared to other work, the proposed methodshows an average F1-Score improvement of 26% on out-of-distribution (OOD)samples and of 32% when tested on samples coming from the same distribution asthe training data. Our approach finds applications in areas such as drugprioritization, drug re-purposing, disease sub-type analysis and elsewhere.", "output": "Neural Embeddings for Protein Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose to formulate point cloud extraction from ultrasound volumes as animage segmentation problem. Through this convenient formulation, a quickprototype exploring various variants of the U-Net architecture was developedand evaluated. This report documents the experimental results compiled using atraining dataset of 5 labelled ultrasound volumes and 84 unlabelled volumesthat got completed in a two-week period as part of a challenge submission to anopen challenge entitled ``Deep Learning in Ultrasound Image Analysis''. Sourcecode is shared with the research community at this GitHub URLurl{", "output": "SMRVIS: Point cloud extraction from 3-D ultrasound for non-destructive testing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We systematically study a wide variety of image-based generative modelsspanning semantically-diverse datasets to understand and improve the featureextractors and metrics used to evaluate them. Using best practices inpsychophysics, we measure human perception of image realism for generatedsamples by conducting the largest experiment evaluating generative models todate, and find that no existing metric strongly correlates with humanevaluations. Comparing to 16 modern metrics for evaluating the overallperformance, fidelity, diversity, and memorization of generative models, wefind that the state-of-the-art perceptual realism of diffusion models as judgedby humans is not reflected in commonly reported metrics such as FID. Thisdiscrepancy is not explained by diversity in generated samples, though onecause is over-reliance on Inception-V3. We address these flaws through a studyof alternative self-supervised feature extractors, find that the semanticinformation encoded by individual networks strongly depends on their trainingprocedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation ofgenerative models. Next, we investigate data memorization, and find thatgenerative models do memorize training examples on simple, smaller datasetslike CIFAR10, but not necessarily on more complex datasets like ImageNet.However, our experiments show that current metrics do not properly detectmemorization; none in the literature is able to separate memorization fromother phenomena such as underfitting or mode shrinkage. To facilitate furtherdevelopment of generative models and their evaluation we release all generatedimage datasets, human evaluation data, and a modular library to compute 16common metrics for 8 different encoders at", "output": "Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose Multiscale Flow, a generative Normalizing Flow that createssamples and models the field-level likelihood of two-dimensional cosmologicaldata such as weak lensing. Multiscale Flow uses hierarchical decomposition ofcosmological fields via a wavelet basis, and then models different waveletcomponents separately as Normalizing Flows. The log-likelihood of the originalcosmological field can be recovered by summing over the log-likelihood of eachwavelet term. This decomposition allows us to separate the information fromdifferent scales and identify distribution shifts in the data such as unknownscale-dependent systematics. The resulting likelihood analysis can not onlyidentify these types of systematics, but can also be made optimal, in the sensethat the Multiscale Flow can learn the full likelihood at the field without anydimensionality reduction. We apply Multiscale Flow to weak lensing mockdatasets for cosmological inference, and show that it significantly outperformstraditional summary statistics such as power spectrum and peak counts, as wellas novel Machine Learning based summary statistics such as scattering transformand convolutional neural networks. We further show that Multiscale Flow is ableto identify distribution shifts not in the training data such as baryoniceffects. Finally, we demonstrate that Multiscale Flow can be used to generaterealistic samples of weak lensing data.", "output": "Multiscale Flow for Robust and Optimal Cosmological Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to understand visual concepts and replicate and compose theseconcepts from images is a central goal for computer vision. Recent advances intext-to-image (T2I) models have lead to high definition and realistic imagequality generation by learning from large databases of images and theirdescriptions. However, the evaluation of T2I models has focused on photorealismand limited qualitative measures of visual understanding. To quantify theability of T2I models in learning and synthesizing novel visual concepts, weintroduce ConceptBed, a large-scale dataset that consists of 284 unique visualconcepts, 5K unique concept compositions, and 33K composite text prompts. Alongwith the dataset, we propose an evaluation metric, Concept Confidence Deviation(CCD), that uses the confidence of oracle concept classifiers to measure thealignment between concepts generated by T2I generators and concepts containedin ground truth images. We evaluate visual concepts that are either objects,attributes, or styles, and also evaluate four dimensions of compositionality:counting, attributes, relations, and actions. Our human study shows that CCD ishighly correlated with human understanding of concepts. Our results point to atrade-off between learning the concepts and preserving the compositionalitywhich existing approaches struggle to overcome.", "output": "ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud registration is a fundamental problem in computer vision thataims to estimate the transformation between corresponding sets of points.Non-rigid registration, in particular, involves addressing challenges includingvarious levels of deformation, noise, outliers, and data incompleteness. Thispaper introduces Robust-DefReg, a robust non-rigid point cloud registrationmethod based on graph convolutional networks (GCNNs). Robust-DefReg is acoarse-to-fine registration approach within an end-to-end pipeline, leveragingthe advantages of both coarse and fine methods. The method learns globalfeatures to find correspondences between source and target point clouds, toenable appropriate initial alignment, and subsequently fine registration. Thesimultaneous achievement of high accuracy and robustness across all challengesis reported less frequently in existing studies, making it a key objective ofthe Robust-DefReg method. The proposed method achieves high accuracy in largedeformations while maintaining computational efficiency. This method possessesthree primary attributes: high accuracy, robustness to different challenges,and computational efficiency. The experimental results show that the proposedRobust-DefReg holds significant potential as a foundational architecture forfuture investigations in non-rigid point cloud registration. The source code ofRobust-DefReg is available.", "output": "Robust-DefReg: A Robust Deformable Point Cloud Registration Method based on Graph Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nested pairwise frames is a method for relative benchmarking of cell ortissue digital pathology models against manual pathologist annotations on a setof sampled patches. At a high level, the method compares agreement between acandidate model and pathologist annotations with agreement among pathologists'annotations. This evaluation framework addresses fundamental issues of datasize and annotator variability in using manual pathologist annotations as asource of ground truth for model validation. We implemented nested pairwiseframes evaluation for tissue classification, cell classification, and cellcount prediction tasks and show results for cell and tissue models deployed onan H&E-stained melanoma dataset.", "output": "Improved statistical benchmarking of digital pathology models using pairwise frames evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Earth mover's distance (EMD) is a useful metric for image recognition andclassification, but its usual implementations are not differentiable or tooslow to be used as a loss function for training other algorithms via gradientdescent. In this paper, we train a convolutional neural network (CNN) to learna differentiable, fast approximation of the EMD and demonstrate that it can beused as a substitute for computing-intensive EMD implementations. We apply thisdifferentiable approximation in the training of an autoencoder-inspired neuralnetwork (encoder NN) for data compression at the high-luminosity LHC at CERN.The goal of this encoder NN is to compress the data while preserving theinformation related to the distribution of energy deposits in particledetectors. We demonstrate that the performance of our encoder NN trained usingthe differentiable EMD CNN surpasses that of training with loss functions basedon mean squared error.", "output": "Differentiable Earth Mover's Distance for Data Compression at the High-Luminosity LHC."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Symbolic regression (SR) is a powerful technique for discovering theanalytical mathematical expression from data, finding various applications innatural sciences due to its good interpretability of results. However, existingmethods face scalability issues when dealing with complex equations involvingmultiple variables. To address this challenge, we propose SRCV, a novel neuralsymbolic regression method that leverages control variables to enhance bothaccuracy and scalability. The core idea is to decompose multi-variable symbolicregression into a set of single-variable SR problems, which are then combinedin a bottom-up manner. The proposed method involves a four-step process. First,we learn a data generator from observed data using deep neural networks (DNNs).Second, the data generator is used to generate samples for a certain variableby controlling the input variables. Thirdly, single-variable symbolicregression is applied to estimate the corresponding mathematical expression.Lastly, we repeat steps 2 and 3 by gradually adding variables one by one untilcompletion. We evaluate the performance of our method on multiple benchmarkdatasets. Experimental results demonstrate that the proposed SRCV significantlyoutperforms state-of-the-art baselines in discovering mathematical expressionswith multiple variables. Moreover, it can substantially reduce the search spacefor symbolic regression. The source code will be made publicly available uponpublication.", "output": "Neural Symbolic Regression using Control Variables."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How do neural networks extract patterns from pixels? Feature visualizationsattempt to answer this important question by visualizing highly activatingpatterns through optimization. Today, visualization methods form the foundationof our knowledge about the internal workings of neural networks, as a type ofmechanistic interpretability. Here we ask: How reliable are featurevisualizations? We start our investigation by developing network circuits thattrick feature visualizations into showing arbitrary patterns that arecompletely disconnected from normal network behavior on natural input. We thenprovide evidence for a similar phenomenon occurring in standard, unmanipulatednetworks: feature visualizations are processed very differently from standardinput, casting doubt on their ability to \"explain\" how neural networks processnatural images. We underpin this empirical finding by theory proving that theset of functions that can be reliably understood by feature visualization isextremely small and does not include general black-box neural networks.Therefore, a promising way forward could be the development of networks thatenforce certain structures in order to ensure more reliable featurevisualizations.", "output": "Don't trust your eyes: on the (un)reliability of feature visualizations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse signal recovery is one of the most fundamental problems in variousapplications, including medical imaging and remote sensing. Many greedyalgorithms based on the family of hard thresholding operators have beendeveloped to solve the sparse signal recovery problem. More recently, NaturalThresholding (NT) has been proposed with improved computational efficiency.This paper proposes and discusses convergence guarantees for stochastic naturalthresholding algorithms by extending the NT from the deterministic version withlinear measurements to the stochastic version with a general objectivefunction. We also conduct various numerical experiments on linear and nonlinearmeasurements to demonstrate the performance of StoNT.", "output": "Stochastic Natural Thresholding Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompting large language models has gained immense popularity in recent yearsdue to the advantage of producing good results even without the need forlabelled data. However, this requires prompt tuning to get optimal prompts thatlead to better model performances. In this paper, we explore the use ofsoft-prompt tuning on sentiment classification task to quantify the biases oflarge language models (LLMs) such as Open Pre-trained Transformers (OPT) andGalactica language model. Since these models are trained on real-world datathat could be prone to bias toward certain groups of populations, it isimportant to identify these underlying issues. Using soft-prompts to evaluatebias gives us the extra advantage of avoiding the human-bias injection that canbe caused by manually designed prompts. We check the model biases on differentsensitive attributes using the group fairness (bias) and find interesting biaspatterns. Since LLMs have been used in the industry in various applications, itis crucial to identify the biases before deploying these models in practice. Weopen-source our pipeline and encourage industry researchers to adapt our workto their use cases.", "output": "Soft-prompt Tuning for Large Language Models to Evaluate Bias."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Skeletal muscle atrophy is a common occurrence in critically ill patients inthe intensive care unit (ICU) who spend long periods in bed. Muscle mass mustbe recovered through physiotherapy before patient discharge and ultrasoundimaging is frequently used to assess the recovery process by measuring themuscle size over time. However, these manual measurements are subject to largevariability, particularly since the scans are typically acquired on differentdays and potentially by different operators. In this paper, we propose aself-supervised contrastive learning approach to automatically retrieve similarultrasound muscle views at different scan times. Three different models werecompared using data from 67 patients acquired in the ICU. Results indicate thatour contrastive model outperformed a supervised baseline model in the task ofview retrieval with an AUC of 73.52% and when combined with an automaticsegmentation model achieved 5.7%+/-0.24% error in cross-sectional area.Furthermore, a user study survey confirmed the efficacy of our model for muscleview retrieval.", "output": "Automatic retrieval of corresponding US views in longitudinal examinations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In computational social science (CSS), researchers analyze documents toexplain social and political phenomena. In most scenarios, CSS researchersfirst obtain labels for documents and then explain labels using interpretableregression analyses in the second step. The recent advancements in largelanguage models (LLMs) can lower costs for CSS research by annotating documentscheaply at scale, but such surrogate labels are often imperfect and biased. Wepresent a new algorithm for using outputs from LLMs for downstream statisticalanalyses while guaranteeing statistical properties -- like asymptoticunbiasedness and proper uncertainty quantification -- which are fundamental toCSS research. We show that direct use of LLM-predicted surrogate labels indownstream statistical analyses leads to substantial bias and invalidconfidence intervals, even with high surrogate accuracy of 80--90%. To addressthis, we build on debiased machine learning to propose the design-basedsemi-supervised learning (DSL) estimator. DSL employs a doubly-robust procedureto combine surrogate labels with a smaller number of gold-standard labels. Ourapproach guarantees valid inference for downstream statistical analyses, evenwhen surrogates are arbitrarily biased, without requiring stringentassumptions, by controlling the probability of sampling documents forgold-standard labeling. Both our theoretical analysis and experimental resultsshow that DSL provides valid statistical inference while achieving root meansquared errors comparable to existing alternatives that focus only onprediction without statistical guarantees.", "output": "Using Large Language Model Annotations for Valid Downstream Statistical Inference in Social Science: Design-Based Semi-Supervised Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parkinson's disease (PD) is a prevalent neurodegenerative disorder withvarying patient trajectories, yet little is understood about the underlyingcauses and symptom progression. The Parkinson's Progression Markers Initiative(PPMI) has collected comprehensive longitudinal data from diverse patientcohorts to identify biomarkers and aid in the development of interventions.Despite over 110 machine learning studies using the PPMI database, the majorityhave focused on supervised models for diagnosis prediction, which has limitedimpact on understanding patient variability and progression. This paperaddresses this gap by combining supervised and unsupervised machine learningmethods to identify subtypes that accurately predict disease progression inParkinson's patients. Building upon previous work, we replicate and extend thestudy by integrating unsupervised patient clustering and prediction of presentand future symptoms using 5 additional years of longitudinal data from theProgressive Parkinson's Markers Initiative (PPMI) database. Our findingsdemonstrate accurate prediction of disease trajectories and symptoms atbaseline, offering valuable insights into patient heterogeneity and thepotential for personalized interventions. The integration of supervised andunsupervised models presents a promising avenue for uncovering latent subgroupsand understanding the complexity of Parkinson's disease progression.", "output": "Analysis, Identification and Prediction of Parkinson's disease sub-types and progression through Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computational modeling of Multiresolution- Fractional Brownian motion (fBm)has been effective in stochastic multiscale fractal texture feature extractionand machine learning of abnormal brain tissue segmentation. Further, deepmultiresolution methods have been used for pixel-wise brain tissuesegmentation. Robust tissue segmentation and volumetric measurement may providemore objective quantification of disease burden and offer improved tracking oftreatment response for the disease. However, we posit that computationalmodeling of deep multiresolution fractal texture features may offer elegantfeature learning. Consequently, this work proposes novel modeling ofMultiresolution Fractal Deep Neural Network (MFDNN) and its computationalimplementation that mathematically combines a multiresolution fBm model anddeep multiresolution analysis. The proposed full 3D MFDNN model offers thedesirable properties of estimating multiresolution stochastic texture featuresby analyzing large amount of raw MRI image data for brain tumor segmentation.We apply the proposed MFDNN to estimate stochastic deep multiresolution fractaltexture features for tumor tissues in brain MRI images. The MFDNN model isevaluated using 1251 patient cases for brain tumor segmentation using the mostrecent BRATS 2021 Challenges dataset. The evaluation of the proposed modelusing Dice overlap score, Husdorff distance and associated uncertaintyestimation offers either better or comparable performances in abnormal braintissue segmentation when compared to the state-of-the-art methods in theliterature. Index Terms: Computational Modeling, Multiresolution FractionalBrownian Motion (fBm), Deep Multiresolution Analysis, Fractal Dimension (FD),Texture Features, Brain tumor segmentation, Deep Learning.", "output": "Computational Modeling of Deep Multiresolution-Fractal Texture and Its Application to Abnormal Brain Tissue Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An important aspect of developing reliable deep learning systems is devisingstrategies that make these systems robust to adversarial attacks. There is along line of work that focuses on developing defenses against these attacks,but recently, researchers have began to study ways to reverse engineer theattack process. This allows us to not only defend against several attackmodels, but also classify the threat model. However, there is still a lack oftheoretical guarantees for the reverse engineering process. Current approachesthat give any guarantees are based on the assumption that the data lies in aunion of linear subspaces, which is not a valid assumption for more complexdatasets. In this paper, we build on prior work and propose a novel frameworkfor reverse engineering of deceptions which supposes that the clean data liesin the range of a GAN. To classify the signal and attack, we jointly solve aGAN inversion problem and a block-sparse recovery problem. For the first timein the literature, we provide deterministic linear convergence guarantees forthis problem. We also empirically demonstrate the merits of the proposedapproach on several nonlinear datasets as compared to state-of-the-art methods.", "output": "A Linearly Convergent GAN Inversion-based Algorithm for Reverse Engineering of Deceptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Presenting whole slide images (WSIs) as graph will enable a more efficientand accurate learning framework for cancer diagnosis. Due to the fact that asingle WSI consists of billions of pixels and there is a lack of vast annotateddatasets required for computational pathology, the problem of learning fromWSIs using typical deep learning approaches such as convolutional neuralnetwork (CNN) is challenging. Additionally, WSIs down-sampling may lead to theloss of data that is essential for cancer detection. A novel two-stage learningtechnique is presented in this work. Since context, such as topologicalfeatures in the tumor surroundings, may hold important information for cancergrading and diagnosis, a graph representation capturing all dependencies amongregions in the WSI is very intuitive. Graph convolutional network (GCN) isdeployed to include context from the tumor and adjacent tissues, andself-supervised learning is used to enhance training through unlabeled data.More specifically, the entire slide is presented as a graph, where the nodescorrespond to the patches from the WSI. The proposed framework is then testedusing WSIs from prostate and kidney cancers. To assess the performanceimprovement through self-supervised mechanism, the proposed context-aware modelis tested with and without use of pre-trained self-supervised layer. Theoverall model is also compared with multi-instance learning (MIL) based andother existing approaches.", "output": "Context-Aware Self-Supervised Learning of Whole Slide Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning models exhibit strong performance on datasets with abundantlabeled samples. However, for tabular datasets with extremely high$d$-dimensional features but limited $n$ samples (i.e. $d gg n$), machinelearning models struggle to achieve strong performance due to the risk ofoverfitting. Here, our key insight is that there is often abundant, auxiliarydomain information describing input features which can be structured as aheterogeneous knowledge graph (KG). We propose PLATO, a method that achievesstrong performance on tabular data with $d gg n$ by using an auxiliary KGdescribing input features to regularize a multilayer perceptron (MLP). InPLATO, each input feature corresponds to a node in the auxiliary KG. In theMLP's first layer, each input feature also corresponds to a weight vector.PLATO is based on the inductive bias that two input features corresponding tosimilar nodes in the auxiliary KG should have similar weight vectors in theMLP's first layer. PLATO captures this inductive bias by inferring the weightvector for each input feature from its corresponding node in the KG via atrainable message-passing function. Across 6 $d gg n$ datasets, PLATOoutperforms 13 state-of-the-art baselines by up to 10.19%.", "output": "Enabling tabular deep learning when $d \\gg n$ with an auxiliary knowledge graph."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider a variant of matrix completion where entries are revealed in abiased manner, adopting a model akin to that introduced by Ma and Chen. Insteadof treating this observation bias as a disadvantage, as is typically the case,our goal is to exploit the shared information between the bias and the outcomeof interest to improve predictions. Towards this, we propose a simple two-stagealgorithm: (i) interpreting the observation pattern as a fully observed noisymatrix, we apply traditional matrix completion methods to the observationpattern to estimate the distances between the latent factors; (ii) we applysupervised learning on the recovered features to impute missing observations.We establish finite-sample error rates that are competitive with thecorresponding supervised learning parametric rates, suggesting that ourlearning performance is comparable to having access to the unobservedcovariates. Empirical evaluation using a real-world dataset reflects similarperformance gains, with our algorithm's estimates having 30x smaller meansquared error compared to traditional matrix completion methods.", "output": "Exploiting Observation Bias to Improve Matrix Completion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rule-based models, such as decision trees, appeal to practitioners due totheir interpretable nature. However, the learning algorithms that produce suchmodels are often vulnerable to spurious associations and thus, they are notguaranteed to extract causally-relevant insights. In this work, we build onideas from the invariant causal prediction literature to propose InvariantCausal Set Covering Machines, an extension of the classical Set CoveringMachine algorithm for conjunctions/disjunctions of binary-valued rules thatprovably avoids spurious associations. We demonstrate both theoretically andempirically that our method can identify the causal parents of a variable ofinterest in polynomial time.", "output": "Invariant Causal Set Covering Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Behavioral game theorists all use experimental data to evaluate predictivemodels of human behavior. However, they differ greatly in their choice of lossfunction for these evaluations, with error rate, negative log-likelihood,cross-entropy, Brier score, and L2 error all being common choices. We attemptto offer a principled answer to the question of which loss functions make sensefor this task, formalizing desiderata that we argue loss functions shouldsatisfy. We construct a family of loss functions, which we dub \"diagonalbounded Bregman divergences\", that satisfy all of these axioms and includes thesquared L2 error. In fact, the squared L2 error is the only acceptable lossthat is relatively commonly used in practice; we thus recommend its continueduse to behavioral game theorists.", "output": "Loss Functions for Behavioral Game Theory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Aerial operation in turbulent environments is a challenging problem due tothe chaotic behavior of the flow. This problem is made even more complex when ateam of aerial robots is trying to achieve coordinated motion in turbulent windconditions. In this paper, we present a novel multi-robot controller tonavigate in turbulent flows, decoupling the trajectory-tracking control fromthe turbulence compensation via a nested control architecture. Unlike previousworks, our method does not learn to compensate for the air-flow at a specifictime and space. Instead, our method learns to compensate for the flow based onits effect on the team. This is made possible via a deep reinforcement learningapproach, implemented via a Graph Convolutional Neural Network (GCNN)-basedarchitecture, which enables robots to achieve better wind compensation byprocessing the spatial-temporal correlation of wind flows across the team. Ourapproach scales well to large robot teams -- as each robot only usesinformation from its nearest neighbors -- , and generalizes well to robot teamslarger than seen in training. Simulated experiments demonstrate how informationsharing improves turbulence compensation in a team of aerial robots anddemonstrate the flexibility of our method over different team configurations.", "output": "Learning to Navigate in Turbulent Flows with Aerial Robot Swarms: A Cooperative Deep Reinforcement Learning Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Clustering is a fundamental learning task widely used as a first step in dataanalysis. For example, biologists often use cluster assignments to analyzegenome sequences, medical records, or images. Since downstream analysis istypically performed at the cluster level, practitioners seek reliable andinterpretable clustering models. We propose a new deep-learning framework thatpredicts interpretable cluster assignments at the instance and cluster levels.First, we present a self-supervised procedure to identify a subset ofinformative features from each data point. Then, we design a model thatpredicts cluster assignments and a gate matrix that leads to cluster-levelfeature selection. We show that the proposed method can reliably predictcluster assignments using synthetic and real data. Furthermore, we verify thatour model leads to interpretable results at a sample and cluster level.", "output": "Interpretable Deep Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-document summarization (MDS) refers to the task of summarizing the textin multiple documents into a concise summary. The generated summary can savethe time of reading many documents by providing the important content in theform of a few sentences. Abstractive MDS aims to generate a coherent and fluentsummary for multiple documents using natural language generation techniques. Inthis paper, we consider the unsupervised abstractive MDS setting where thereare only documents with no groundtruh summaries provided, and we proposeAbsformer, a new Transformer-based method for unsupervised abstractive summarygeneration. Our method consists of a first step where we pretrain aTransformer-based encoder using the masked language modeling (MLM) objective asthe pretraining task in order to cluster the documents into semanticallysimilar groups; and a second step where we train a Transformer-based decoder togenerate abstractive summaries for the clusters of documents. To our knowledge,we are the first to successfully incorporate a Transformer-based model to solvethe unsupervised abstractive MDS task. We evaluate our approach using threereal-world datasets from different domains, and we demonstrate both substantialimprovements in terms of evaluation metrics over state-of-the-artabstractive-based methods, and generalization to datasets from differentdomains.", "output": "Absformer: Transformer-based Model for Unsupervised Multi-Document Abstractive Summarization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Progress in graph neural networks has grown rapidly in recent years, withmany new developments in drug discovery, medical diagnosis, and recommendersystems. While this progress is significant, many networks are `black boxes'with little understanding of the `what' exactly the network is learning. Manyhigh-stakes applications, such as drug discovery, require human-intelligibleexplanations from the models so that users can recognize errors and discovernew knowledge. Therefore, the development of explainable AI algorithms isessential for us to reap the benefits of AI.We propose an explainability algorithm for GNNs called eXplainable Insight(XInsight) that generates a distribution of model explanations using GFlowNets.Since GFlowNets generate objects with probabilities proportional to a reward,XInsight can generate a diverse set of explanations, compared to previousmethods that only learn the maximum reward sample. We demonstrate XInsight bygenerating explanations for GNNs trained on two graph classification tasks:classifying mutagenic compounds with the MUTAG dataset and classifying acyclicgraphs with a synthetic dataset that we have open-sourced. We show the utilityof XInsight's explanations by analyzing the generated compounds using QSARmodeling, and we find that XInsight generates compounds that cluster bylipophilicity, a known correlate of mutagenicity. Our results show thatXInsight generates a distribution of explanations that uncovers the underlyingrelationships demonstrated by the model. They also highlight the importance ofgenerating a diverse set of explanations, as it enables us to discover hiddenrelationships in the model and provides valuable guidance for further analysis.", "output": "XInsight: Revealing Model Insights for GNNs with Flow-based Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning features from data is one of the defining characteristics of deeplearning, but our theoretical understanding of the role features play in deeplearning is still rudimentary. To address this gap, we introduce a new tool,the interaction tensor, for empirically analyzing the interaction between dataand model through features. With the interaction tensor, we make several keyobservations about how features are distributed in data and how models withdifferent random seeds learn different features. Based on these observations,we propose a conceptual framework for feature learning. Under this framework,the expected accuracy for a single hypothesis and agreement for a pair ofhypotheses can both be derived in closed-form. We demonstrate that the proposedframework can explain empirically observed phenomena, including the recentlydiscovered Generalization Disagreement Equality (GDE) that allows forestimating the generalization error with only unlabeled data. Further, ourtheory also provides explicit construction of natural data distributions thatbreak the GDE. Thus, we believe this work provides valuable new insight intoour understanding of feature learning.", "output": "On the Joint Interaction of Models, Data, and Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a novel nonlinear model, Sparse Adaptive BottleneckCentroid-Encoder (SABCE), for determining the features that discriminatebetween two or more classes. The algorithm aims to extract discriminatoryfeatures in groups while reconstructing the class centroids in the ambientspace and simultaneously use additional penalty terms in the bottleneck layerto decrease within-class scatter and increase the separation of different classcentroids. The model has a sparsity-promoting layer (SPL) with a one-to-oneconnection to the input layer. Along with the primary objective, we minimizethe $l_{2,1}$-norm of the sparse layer, which filters out unnecessary featuresfrom input data. During training, we update class centroids by taking theHadamard product of the centroids and weights of the sparse layer, thusignoring the irrelevant features from the target. Therefore the proposed methodlearns to reconstruct the critical components of class centroids rather thanthe whole centroids. The algorithm is applied to various real-world data sets,including high-dimensional biological, image, speech, and accelerometer sensordata. We compared our method to different state-of-the-art feature selectiontechniques, including supervised Concrete Autoencoders (SCAE), FeatureSelection Networks (FsNet), Stochastic Gates (STG), and LassoNet. Weempirically showed that SABCE features often produced better classificationaccuracy than other methods on the sequester test sets, setting newstate-of-the-art results.", "output": "Feature Selection using Sparse Adaptive Bottleneck Centroid-Encoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Healthcare knowledge graphs (HKGs) have emerged as a promising tool fororganizing medical knowledge in a structured and interpretable way, whichprovides a comprehensive view of medical concepts and their relationships.However, challenges such as data heterogeneity and limited coverage remain,emphasizing the need for further research in the field of HKGs. This surveypaper serves as the first comprehensive overview of HKGs. We summarize thepipeline and key techniques for HKG construction (i.e., from scratch andthrough integration), as well as the common utilization approaches (i.e.,model-free and model-based). To provide researchers with valuable resources, weorganize existing HKGs (The resource is available at based on thedata types they capture and application domains, supplemented with pertinentstatistical information. In the application section, we delve into thetransformative impact of HKGs across various healthcare domains, spanning fromfine-grained basic science research to high-level clinical decision support.Lastly, we shed light on the opportunities for creating comprehensive andaccurate HKGs in the era of large language models, presenting the potential torevolutionize healthcare delivery and enhance the interpretability andreliability of clinical prediction.", "output": "A Survey on Knowledge Graphs for Healthcare: Resources, Applications, and Promises."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Privately generating synthetic data from a table is an important brick of aprivacy-first world. We propose and investigate a simple approach of treatingeach row in a table as a sentence and training a language model withdifferential privacy. We show this approach obtains competitive results inmodelling tabular data across multiple datasets, even at small scales thatfavor alternative methods based on marginal distributions.", "output": "Privately generating tabular data using language models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The backpropagation algorithm has experienced remarkable success in traininglarge-scale artificial neural networks, however, its biological-plausibility isdisputed, and it remains an open question whether the brain employs supervisedlearning mechanisms akin to it. Here, we propose correlative informationmaximization between layer activations as an alternative normative approach todescribe the signal propagation in biological neural networks in both forwardand backward directions. This new framework addresses many concerns about thebiological-plausibility of conventional artificial neural networks and thebackpropagation algorithm. The coordinate descent-based optimization of thecorresponding objective, combined with the mean square error loss function forfitting labeled supervision data, gives rise to a neural network structure thatemulates a more biologically realistic network of multi-compartment pyramidalneurons with dendritic processing and lateral inhibitory neurons. Furthermore,our approach provides a natural resolution to the weight symmetry problembetween forward and backward signal propagation paths, a significant critiqueagainst the plausibility of the conventional backpropagation algorithm. This isachieved by leveraging two alternative, yet equivalent forms of the correlativemutual information objective. These alternatives intrinsically lead to forwardand backward prediction networks without weight symmetry issues, providing acompelling solution to this long-standing challenge.", "output": "Correlative Information Maximization: A Biologically Plausible Approach to Supervised Deep Neural Networks without Weight Symmetry."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we first present an explanation regarding the commonoccurrence of spikes in the training loss when neural networks are trained withstochastic gradient descent (SGD). We provide evidence that the spikes in thetraining loss of SGD are \"catapults\", an optimization phenomenon originallyobserved in GD with large learning rates in [Lewkowycz et al. 2020]. Weempirically show that these catapults occur in a low-dimensional subspacespanned by the top eigenvectors of the tangent kernel, for both GD and SGD.Second, we posit an explanation for how catapults lead to better generalizationby demonstrating that catapults promote feature learning by increasingalignment with the Average Gradient Outer Product (AGOP) of the true predictor.Furthermore, we demonstrate that a smaller batch size in SGD induces a largernumber of catapults, thereby improving AGOP alignment and test performance.", "output": "Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretable methods for extracting meaningful building blocks (BBs)underlying multi-dimensional time series are vital for discovering valuableinsights in complex systems. Existing techniques, however, encounterlimitations that restrict their applicability to real-world systems, likereliance on orthogonality assumptions, inadequate incorporation of inter- andintra-state variability, and incapability to handle sessions of varyingduration. Here, we present a framework for Similarity-driven Building BlockInference using Graphs across States (SiBBlInGS). SiBBlInGS employs agraph-based dictionary learning approach for BB discovery, simultaneouslyconsiders both inter- and intra-state relationships in the data, can extractnon-orthogonal components, and allows for variations in session counts andduration across states. Additionally, SiBBlInGS allows for cross-statevariations in BB structure and per-trial temporal variability, can identifystate-specific vs state-invariant BBs, and offers both supervised anddata-driven approaches for controlling the level of BB similarity betweenstates. We demonstrate SiBBlInGS on synthetic and real-world data to highlightits ability to provide insights into the underlying mechanisms of complexphenomena and its applicability to data in various fields.", "output": "SiBBlInGS: Similarity-driven Building-Block Inference using Graphs across States."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel feature selection technique, Sparse LinearCentroid-Encoder (SLCE). The algorithm uses a linear transformation toreconstruct a point as its class centroid and, at the same time, uses the$ell_1$-norm penalty to filter out unnecessary features from the input data.The original formulation of the optimization problem is nonconvex, but wepropose a two-step approach, where each step is convex. In the first step, wesolve the linear Centroid-Encoder, a convex optimization problem over a matrix$A$. In the second step, we only search for a sparse solution over a diagonalmatrix $B$ while keeping $A$ fixed. Unlike other linear methods, e.g., SparseSupport Vector Machines and Lasso, Sparse Linear Centroid-Encoder uses a singlemodel for multi-class data. We present an in-depth empirical analysis of theproposed model and show that it promotes sparsity on various data sets,including high-dimensional biological data. Our experimental results show thatSLCE has a performance advantage over some state-of-the-art neuralnetwork-based feature selection techniques.", "output": "Sparse Linear Centroid-Encoder: A Convex Method for Feature Selection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a new effective and scalable framework for training GNNs insupervised node classification tasks, given graph-structured data. Our approachincreasingly refines the weight update operations on a sequence of path graphsobtained by linearizing random spanning trees extracted from the input network.The path graphs are designed to retain essential topological and nodeinformation of the original graph. At the same time, the sparsity of pathgraphs enables a much lighter GNN training which, besides scalability, helps inmitigating classical training issues, like over-squashing and over-smoothing.We carry out an extensive experimental investigation on a number of real-worldgraph benchmarks, where we apply our framework to graph convolutional networks,showing simultaneous improvement of both training speed and test accuracy, ascompared to well-known baselines.", "output": "Fast and Effective GNN Training with Linearized Random Spanning Trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised video-based object-centric learning is a promising avenue tolearn structured representations from large, unlabeled video collections, butprevious approaches have only managed to scale to real-world datasets inrestricted domains. Recently, it was shown that the reconstruction ofpre-trained self-supervised features leads to object-centric representations onunconstrained real-world image datasets. Building on this approach, we proposea novel way to use such pre-trained features in the form of a temporal featuresimilarity loss. This loss encodes temporal correlations between image patchesand is a natural way to introduce a motion bias for object discovery. Wedemonstrate that this loss leads to state-of-the-art performance on thechallenging synthetic MOVi datasets. When used in combination with the featurereconstruction loss, our model is the first object-centric video model thatscales to unconstrained video datasets such as YouTube-VIS.", "output": "Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detection of artificial objects from underwater imagery gathered byAutonomous Underwater Vehicles (AUVs) is a key requirement for many subseaapplications. Real-world AUV image datasets tend to be very large andunlabelled. Furthermore, such datasets are typically imbalanced, containing fewinstances of objects of interest, particularly when searching for unusualobjects in a scene. It is therefore, difficult to fit models capable ofreliably detecting these objects. Given these factors, we propose to treatartificial objects as anomalies and detect them through a semi-supervisedframework based on Variational Autoencoders (VAEs). We develop a method whichclusters image data in a learned low-dimensional latent space and extractsimages that are likely to contain anomalous features. We also devise an anomalyscore based on extracting poorly reconstructed regions of an image. Wedemonstrate that by applying both methods on large image datasets, humanoperators can be shown candidate anomalous samples with a low false positiverate to identify objects of interest. We apply our approach to real seafloorimagery gathered by an AUV and evaluate its sensitivity to the dimensionalityof the latent representation used by the VAE. We evaluate the precision-recalltradeoff and demonstrate that by choosing an appropriate latent dimensionalityand threshold, we are able to achieve an average precision of 0.64 onunlabelled datasets.", "output": "A Semi-supervised Object Detection Algorithm for Underwater Imagery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have various practical applications, such asdrug discovery, recommendation engines, and chip design. However, GNNs lacktransparency as they cannot provide understandable explanations for theirpredictions. To address this issue, counterfactual reasoning is used. The maingoal is to make minimal changes to the input graph of a GNN in order to alterits prediction. While several algorithms have been proposed for counterfactualexplanations of GNNs, most of them have two main drawbacks. Firstly, they onlyconsider edge deletions as perturbations. Secondly, the counterfactualexplanation models are transductive, meaning they do not generalize to unseendata. In this study, we introduce an inductive algorithm called INDUCE, whichovercomes these limitations. By conducting extensive experiments on severaldatasets, we demonstrate that incorporating edge additions leads to bettercounterfactual results compared to the existing methods. Moreover, theinductive modeling approach allows INDUCE to directly predict counterfactualperturbations without requiring instance-specific training. This results insignificant computational speed improvements compared to baseline methods andenables scalable counterfactual analysis for GNNs.", "output": "Empowering Counterfactual Reasoning over Graph Neural Networks through Inductivity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel $K$-nearest neighbor resampling procedure for estimatingthe performance of a policy from historical data containing realized episodesof a decision process generated under a different policy. We focus on feedbackpolicies that depend deterministically on the current state in environmentswith continuous state-action spaces and system-inherent stochasticity effectedby chosen actions. Such settings are common in a wide range of high-stakeapplications and are actively investigated in the context of stochasticcontrol. Our procedure exploits that similar state/action pairs (in a metricsense) are associated with similar rewards and state transitions. This enablesour resampling procedure to tackle the counterfactual estimation problemunderlying off-policy evaluation (OPE) by simulating trajectories similarly toMonte Carlo methods. Compared to other OPE methods, our algorithm does notrequire optimization, can be efficiently implemented via tree-based nearestneighbor search and parallelization and does not explicitly assume a parametricmodel for the environment's dynamics. These properties make the proposedresampling algorithm particularly useful for stochastic control environments.We prove that our method is statistically consistent in estimating theperformance of a policy in the OPE setting under weak assumptions and for datasets containing entire episodes rather than independent transitions. Toestablish the consistency, we generalize Stone's Theorem, a well-known resultin nonparametric statistics on local averaging, to include episodic data andthe counterfactual estimation underlying OPE. Numerical experiments demonstratethe effectiveness of the algorithm in a variety of stochastic control settingsincluding a linear quadratic regulator, trade execution in limit order booksand online stochastic bin packing.", "output": "$K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantum data access and quantum processing can make certain classicallyintractable learning tasks feasible. However, quantum capabilities will only beavailable to a select few in the near future. Thus, reliable schemes that allowclassical clients to delegate learning to untrusted quantum servers arerequired to facilitate widespread access to quantum learning advantages.Building on a recently introduced framework of interactive proof systems forclassical machine learning, we develop a framework for classical verificationof quantum learning. We exhibit learning problems that a classical learnercannot efficiently solve on their own, but that they can efficiently andreliably solve when interacting with an untrusted quantum prover. Concretely,we consider the problems of agnostic learning parities and Fourier-sparsefunctions with respect to distributions with uniform input marginal. We proposea new quantum data access model that we call \"mixture-of-superpositions\"quantum examples, based on which we give efficient quantum learning algorithmsfor these tasks. Moreover, we prove that agnostic quantum parity andFourier-sparse learning can be efficiently verified by a classical verifierwith only random example or statistical query access. Finally, we showcase twogeneral scenarios in learning and verification in which quantummixture-of-superpositions examples do not lead to sample complexityimprovements over classical data. Our results demonstrate that the potentialpower of quantum data for learning tasks, while not unlimited, can be utilizedby classical agents through interaction with untrusted quantum entities.", "output": "Classical Verification of Quantum Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new approach to constructing a neural network for predictingexpectations of stochastic differential equations. The proposed method does notneed data sets of inputs and outputs; instead, the information obtained fromthe time-evolution equations, i.e., the corresponding dual process, is directlycompared with the weights in the neural network. As a demonstration, weconstruct neural networks for the Ornstein-Uhlenbeck process and the noisy vander Pol system. The remarkable feature of learned networks with the proposedmethod is the accuracy of inputs near the origin. Hence, it would be possibleto avoid the overfitting problem because the learned network does not depend ontraining data sets.", "output": "Embedding stochastic differential equations into neural networks via dual processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Denoising is intuitively related to projection. Indeed, under the manifoldhypothesis, adding random noise is approximately equivalent to orthogonalperturbation. Hence, learning to denoise is approximately learning to project.In this paper, we use this observation to reinterpret denoising diffusionmodels as approximate gradient descent applied to the Euclidean distancefunction. We then provide straight-forward convergence analysis of the DDIMsampler under simple assumptions on the projection-error of the denoiser.Finally, we propose a new sampler based on two simple modifications to DDIMusing insights from our theoretical results. In as few as 5-10 functionevaluations, our sampler achieves state-of-the-art FID scores on pretrainedCIFAR-10 and CelebA models and can generate high quality samples on latentdiffusion models.", "output": "Interpreting and Improving Diffusion Models Using the Euclidean Distance Function."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we describe and analyze an island-based random dynamic voltagescaling (iRDVS) approach to thwart power side-channel attacks. We first analyzethe impact of the number of independent voltage islands on the resultingsignal-to-noise ratio and trace misalignment. As part of our analysis ofmisalignment, we propose a novel unsupervised machine learning (ML) basedattack that is effective on systems with three or fewer independent voltages.Our results show that iRDVS with four voltage islands, however, cannot bebroken with 200k encryption traces, suggesting that iRDVS can be effective. Wefinish the talk by describing an iRDVS test chip in a 12nm FinFet process thatincorporates three variants of an AES-256 accelerator, all originating from thesame RTL. This included a synchronous core, an asynchronous core with noprotection, and a core employing the iRDVS technique using asynchronous logic.Lab measurements from the chips indicated that both unprotected variants failedthe test vector leakage assessment (TVLA) security metric test, while the iRDVSwas proven secure in a variety of configurations.", "output": "Island-based Random Dynamic Voltage Scaling vs ML-Enhanced Power Side-Channel Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the arising concerns of privacy within machine learning, federatedlearning (FL) was invented in 2017, in which the clients, such as mobiledevices, train a model and send the update to the centralized server. Choosingclients randomly for FL can harm learning performance due to different reasons.Many studies have proposed approaches to address the challenges of clientselection of FL. However, no systematic literature review (SLR) on this topicexisted. This SLR investigates the state of the art of client selection in FLand answers the challenges, solutions, and metrics to evaluate the solutions.We systematically reviewed 47 primary studies. The main challenges found inclient selection are heterogeneity, resource allocation, communication costs,and fairness. The client selection schemes aim to improve the original randomselection algorithm by focusing on one or several of the aforementionedchallenges. The most common metric used is testing accuracy versuscommunication rounds, as testing accuracy measures the successfulness of thelearning and preferably in as few communication rounds as possible, as they arevery expensive. Although several possible improvements can be made with thecurrent state of client selection, the most beneficial ones are evaluating theimpact of unsuccessful clients and gaining a more theoretical understanding ofthe impact of fairness in FL.", "output": "A Systematic Literature Review on Client Selection in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Origin-Destination~(OD) matrix provides an estimation of number ofindividuals traveling between regions, i.e., mobility flow in the city, whichis widely-used in urban planning, transportation, etc. Given various citycharacteristics of urban regions, generating the city-wide OD matrix withoutusing historical flow information has become increasingly appealing to bothresearchers and practitioners. However, existing works are limited inindependent generation of each element, i.e., flow, in OD matrix, overlookingthe element relations within the matrix that can be well formulated as anetwork. In this paper, we instead propose to generate the city-wide OD matrixfrom the network perspective, and design a graph denoising diffusion method tolearn the conditional joint probability distribution of all elements in the ODmatrix given city characteristics at region level. To overcome the learningdifficulty of the city-wide OD matrix covering over thousands of regions, wedecompose the original one-shot generative modeling of the diffusion model intotwo cascaded stages, corresponding to the generation of network topology andmobility flow, respectively. To further reproduce important network propertiescontained in city-wide OD matrices, we design an elaborated graph denoisingnetwork structure including a node property augmentation module and a graphtransformer backbone. Empirical experiments on data collected in two large UScities have verified that our method can generate OD matrices for new citieswith network statistics remarkably similar with the ground truth, furtherachieving superior outperformance over competitive baselines in terms of thegeneration realism.", "output": "City-wide Origin-Destination Matrix Generation via Graph Denoising Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies have revealed that NLP predictive models are vulnerable toadversarial attacks. Most existing studies focused on designing attacks toevaluate the robustness of NLP models in the English language alone. Literaturehas seen an increasing need for NLP solutions for other languages. We,therefore, ask one natural question: whether state-of-the-art (SOTA) attackmethods generalize to other languages. This paper investigates how to adaptSOTA adversarial attack algorithms in English to the Chinese language. Ourexperiments show that attack methods previously applied to English NLP cangenerate high-quality adversarial examples in Chinese when combined with propertext segmentation and linguistic constraints. In addition, we demonstrate thatthe generated adversarial examples can achieve high fluency and semanticconsistency by focusing on the Chinese language's morphology and phonology,which in turn can be used to improve the adversarial robustness of Chinese NLPmodels.", "output": "Expanding Scope: Adapting English Adversarial Attacks to Chinese."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent works have shown the potential of diffusion models in computer visionand natural language processing. Apart from the classical supervised learningfields, diffusion models have also shown strong competitiveness inreinforcement learning (RL) by formulating decision-making as sequentialgeneration. However, incorporating temporal information of sequential data andutilizing it to guide diffusion models to perform better generation is still anopen challenge. In this paper, we take one step forward to investigatecontrollable generation with temporal conditions that are refined from temporalinformation. We observe the importance of temporal conditions in sequentialgeneration in sufficient explorative scenarios and provide a comprehensivediscussion and comparison of different temporal conditions. Based on theobservations, we propose an effective temporally-conditional diffusion modelcoined Temporally-Composable Diffuser (TCD), which extracts temporalinformation from interaction sequences and explicitly guides generation withtemporal conditions. Specifically, we separate the sequences into three partsaccording to time expansion and identify historical, immediate, and prospectiveconditions accordingly. Each condition preserves non-overlapping temporalinformation of sequences, enabling more controllable generation when we jointlyuse them to guide the diffuser. Finally, we conduct extensive experiments andanalysis to reveal the favorable applicability of TCD in offline RL tasks,where our method reaches or matches the best performance compared with priorSOTA baselines.", "output": "Instructed Diffuser with Temporal Condition Guidance for Offline Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to data's unavailability or large size, and the high computational andhuman labor costs of training machine learning models, it is a common practiceto rely on open source pre-trained models whenever possible. However, thispractice is worry some from the security perspective. Pre-trained models can beinfected with Trojan attacks, in which the attacker embeds a trigger in themodel such that the model's behavior can be controlled by the attacker when thetrigger is present in the input. In this paper, we present our preliminary workon a novel method for Trojan model detection. Our method creates a signaturefor a model based on activation optimization. A classifier is then trained todetect a Trojan model given its signature. Our method achieves state of the artperformance on two public datasets.", "output": "Trojan Model Detection Using Activation Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Efficiently serving neural network models with low latency is becoming morechallenging due to increasing model complexity and parameter count. Modelquantization offers a solution which simultaneously reduces memory footprintand compute requirements. However, aggressive quantization may lead to anunacceptable loss in model accuracy owing to differences in sensitivity tonumerical imperfection across different layers in the model. To address thischallenge, we propose a mixed-precision post training quantization (PTQ)approach that assigns different numerical precisions to tensors in a networkbased on their specific needs, for a reduced memory footprint and improvedlatency while preserving model accuracy. Previous works rely on layer-wiseHessian information to determine numerical precision, but as we demonstrate,Hessian estimation is typically insufficient in determining an effectiveordering of layer sensitivities. We address this by augmenting the estimatedHessian with additional information to capture inter-layer dependencies. Wedemonstrate that this consistently improves PTQ performance along theaccuracy-latency Pareto frontier across multiple models. Our method combinessecond-order information and inter-layer dependencies to guide a bisectionsearch, finding quantization configurations within a user-configurable modelaccuracy degradation range. We evaluate the effectiveness of our method on theResNet50, MobileNetV2, and BERT models. Our experiments demonstrate latencyreductions compared to a 16-bit baseline of $25.48%$, $21.69%$, and $33.28%$respectively, while maintaining model accuracy to within $99.99%$ of thebaseline model.", "output": "Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph clustering is a fundamental task in network analysis where the goal isto detect sets of nodes that are well-connected to each other but sparselyconnected to the rest of the graph. We present faster approximation algorithmsfor an NP-hard parameterized clustering framework called LambdaCC, which isgoverned by a tunable resolution parameter and generalizes many otherclustering objectives such as modularity, sparsest cut, and cluster deletion.Previous LambdaCC algorithms are either heuristics with no approximationguarantees, or computationally expensive approximation algorithms. We providefast new approximation algorithms that can be made purely combinatorial. Theserely on a new parameterized edge labeling problem we introduce that generalizesprevious edge labeling problems that are based on the principle of strongtriadic closure and are of independent interest in social network analysis. Ourmethods are orders of magnitude more scalable than previous approximationalgorithms and our lower bounds allow us to obtain a posteriori approximationguarantees for previous heuristics that have no approximation guarantees oftheir own.", "output": "Faster Approximation Algorithms for Parameterized Graph Clustering and Edge Labeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Protein-ligand binding affinity (PLBA) prediction is the fundamental task indrug discovery. Recently, various deep learning-based models predict bindingaffinity by incorporating the three-dimensional structure of protein-ligandcomplexes as input and achieving astounding progress. However, due to thescarcity of high-quality training data, the generalization ability of currentmodels is still limited. In addition, different bioassays use varying affinitymeasurement labels (i.e., IC50, Ki, Kd), and different experimental conditionsinevitably introduce systematic noise, which poses a significant challenge toconstructing high-precision affinity prediction models. To address theseissues, we (1) propose Multi-task Bioassay Pre-training (MBP), a pre-trainingframework for structure-based PLBA prediction; (2) construct a pre-trainingdataset called ChEMBL-Dock with more than 300k experimentally measured affinitylabels and about 2.8M docked three-dimensional structures. By introducingmulti-task pre-training to treat the prediction of different affinity labels asdifferent tasks and classifying relative rankings between samples from the samebioassay, MBP learns robust and transferrable structural knowledge from our newChEMBL-Dock dataset with varied and noisy labels. Experiments substantiate thecapability of MBP as a general framework that can improve and be tailored tomainstream structure-based PLBA prediction tasks. To the best of our knowledge,MBP is the first affinity pre-training model and shows great potential forfuture development.", "output": "Multi-task Bioassay Pre-training for Protein-ligand Binding Affinity Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ShaDDR, an example-based deep generative neural network whichproduces a high-resolution textured 3D shape through geometry detailization andconditional texture generation applied to an input coarse voxel shape. Trainedon a small set of detailed and textured exemplar shapes, our method learns todetailize the geometry via multi-resolution voxel upsampling and generatetextures on voxel surfaces via differentiable rendering against exemplartexture images from a few views. The generation is real-time, taking less than1 second to produce a 3D model with voxel resolutions up to 512^3. Thegenerated shape preserves the overall structure of the input coarse voxelmodel, while the style of the generated geometric details and textures can bemanipulated through learned latent codes. In the experiments, we show that ourmethod can generate higher-resolution shapes with plausible and improvedgeometric details and clean textures compared to prior works. Furthermore, weshowcase the ability of our method to learn geometric details and textures fromshapes reconstructed from real-world photos. In addition, we have developed aninteractive modeling application to demonstrate the generalizability of ourmethod to various user inputs and the controllability it offers, allowing usersto interactively sculpt a coarse voxel shape to define the overall structure ofthe detailized 3D shape.", "output": "ShaDDR: Real-Time Example-Based Geometry and Texture Generation via 3D Shape Detailization and Differentiable Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In-context learning is one of the surprising and useful features of largelanguage models. How it works is an active area of research. Recently, stylizedmeta-learning-like setups have been devised that train these models on asequence of input-output pairs $(x, f(x))$ from a function class using thelanguage modeling loss and observe generalization to unseen functions from thesame class. One of the main discoveries in this line of research has been thatfor several problems such as linear regression, trained transformers learnalgorithms for learning functions in context. However, the inductive biases ofthese models resulting in this behavior are not clearly understood. A modelwith unlimited training data and compute is a Bayesian predictor: it learns thepretraining distribution. It has been shown that high-capacity transformersmimic the Bayesian predictor for linear regression. In this paper, we showempirical evidence of transformers exhibiting the behavior of this ideallearner across different linear and non-linear function classes. We also extendthe previous setups to work in the multitask setting and verify thattransformers can do in-context learning in this setup as well and the Bayesianperspective sheds light on this setting also. Finally, via the example oflearning Fourier series, we study the inductive bias for in-context learning.We find that in-context learning may or may not have simplicity bias dependingon the pretraining data distribution.", "output": "In-Context Learning through the Bayesian Prism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The discovery of partial differential equations (PDEs) is a challenging taskthat involves both theoretical and empirical methods. Machine learningapproaches have been developed and used to solve this problem; however, it isimportant to note that existing methods often struggle to identify theunderlying equation accurately in the presence of noise. In this study, wepresent a new approach to discovering PDEs by combining variational Bayes andsparse linear regression. The problem of PDE discovery has been posed as aproblem to learn relevant basis from a predefined dictionary of basisfunctions. To accelerate the overall process, a variational Bayes-basedapproach for discovering partial differential equations is proposed. To ensuresparsity, we employ a spike and slab prior. We illustrate the efficacy of ourstrategy in several examples, including Burgers, Korteweg-de Vries, KuramotoSivashinsky, wave equation, and heat equation (1D as well as 2D). Our methodoffers a promising avenue for discovering PDEs from data and has potentialapplications in fields such as physics, engineering, and biology.", "output": "A Bayesian Framework for learning governing Partial Differential Equation from Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The solution of probabilistic inverse problems for which the correspondingforward problem is constrained by physical principles is challenging. This isespecially true if the dimension of the inferred vector is large and the priorinformation about it is in the form of a collection of samples. In this work, anovel deep learning based approach is developed and applied to solving thesetypes of problems. The approach utilizes samples of the inferred vector drawnfrom the prior distribution and a physics-based forward model to generatetraining data for a conditional Wasserstein generative adversarial network(cWGAN). The cWGAN learns the probability distribution for the inferred vectorconditioned on the measurement and produces samples from this distribution. ThecWGAN developed in this work differs from earlier versions in that its criticis required to be 1-Lipschitz with respect to both the inferred and themeasurement vectors and not just the former. This leads to a loss term with thefull (and not partial) gradient penalty. It is shown that this rather simplechange leads to a stronger notion of convergence for the conditional densitylearned by the cWGAN and a more robust and accurate sampling strategy. Throughnumerical examples it is shown that this change also translates to betteraccuracy when solving inverse problems. The numerical examples consideredinclude illustrative problems where the true distribution and/or statistics areknown, and a more complex inverse problem motivated by applications inbiomechanics.", "output": "Solution of physics-based inverse problems using conditional generative adversarial networks with full gradient penalty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Masked autoencoder (MAE), a simple and effective self-supervised learningframework based on the reconstruction of masked image regions, has recentlyachieved prominent success in a variety of vision tasks. Despite the emergenceof intriguing empirical observations on MAE, a theoretically principledunderstanding is still lacking. In this work, we formally characterize andjustify existing empirical insights and provide theoretical guarantees of MAE.We formulate the underlying data-generating process as a hierarchical latentvariable model and show that under reasonable assumptions, MAE provablyidentifies a set of latent variables in the hierarchical model, explaining whyMAE can extract high-level information from pixels. Further, we show how keyhyperparameters in MAE (the masking ratio and the patch size) determine whichtrue latent variables to be recovered, therefore influencing the level ofsemantic information in the representation. Specifically, extremely large orsmall masking ratios inevitably lead to low-level representations. Our theoryoffers coherent explanations of existing empirical observations and providesinsights for potential empirical improvements and fundamental limitations ofthe masking-reconstruction paradigm. We conduct extensive experiments tovalidate our theoretical insights.", "output": "Understanding Masked Autoencoders via Hierarchical Latent Variable Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning is a useful technique for achieving improved performanceand reducing training costs by leveraging the knowledge gained from sourcetasks and applying it to target tasks. Assessing the effectiveness of transferlearning relies on understanding the similarity between the ground truth of thesource and target tasks. In real-world applications, tasks often exhibitpartial similarity, where certain aspects are similar while others aredifferent or irrelevant. To investigate the impact of partial similarity ontransfer learning performance, we focus on a linear regression model with twodistinct sets of features: a common part shared across tasks and atask-specific part. Our study explores various types of transfer learning,encompassing two options for parameter transfer. By establishing a theoreticalcharacterization on the error of the learned model, we compare these transferlearning options, particularly examining how generalization performance changeswith the number of features/parameters in both underparameterized andoverparameterized regimes. Furthermore, we provide practical guidelines fordetermining the number of features in the common and task-specific parts forimproved generalization performance. For example, when the total number offeatures in the source task's learning model is fixed, we show that it is moreadvantageous to allocate a greater number of redundant features to thetask-specific part rather than the common part. Moreover, in specificscenarios, particularly those characterized by high noise levels and small trueparameters, sacrificing certain true features in the common part in favor ofemploying more redundant features in the task-specific part can yield notablebenefits.", "output": "Generalization Performance of Transfer Learning: Overparameterized and Underparameterized Regimes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given a traversal algorithm, cover time is the expected number of stepsneeded to visit all nodes in a given graph. A smaller cover time means a higherexploration efficiency of traversal algorithm. Although random walk algorithmshave been studied extensively in the existing literature, there has been nocover time result for any non-Markovian method. In this work, we stand on atheoretical perspective and show that the negative feedback strategy (acount-based exploration method) is better than the naive random walk search. Inparticular, the former strategy can locally improve the search efficiency foran arbitrary graph. It also achieves smaller cover times for special butimportant graphs, including clique graphs, tree graphs, etc. Moreover, we makeconnections between our results and reinforcement learning literature to givenew insights on why classical UCB and MCTS algorithms are so useful. Variousnumerical results corroborate our theoretical findings.", "output": "A Cover Time Study of a non-Markovian Algorithm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Physics and equality constrained artificial neural networks (PECANN) aregrounded in methods of constrained optimization to properly constrain thesolution of partial differential equations (PDEs) with their boundary andinitial conditions and any high-fidelity data that may be available. To thisend, adoption of the augmented Lagrangian method within the PECANN framework isparamount for learning the solution of PDEs without manually balancing theindividual loss terms in the objective function used for determining theparameters of the neural network. Generally speaking, ALM combines the meritsof the penalty and Lagrange multiplier methods while avoiding the illconditioning and convergence issues associated singly with these methods . Inthe present work, we apply our PECANN framework to solve forward and inverseproblems that have an expanded and diverse set of constraints. We show that ALMwith its conventional formulation to update its penalty parameter and Lagrangemultipliers stalls for such challenging problems. To address this issue, wepropose an adaptive ALM in which each constraint is assigned a unique penaltyparameter that evolve adaptively according to a rule inspired by the adaptivesubgradient method. Additionally, we revise our PECANN formulation for improvedcomputational efficiency and savings which allows for mini-batch training. Wedemonstrate the efficacy of our proposed approach by solving several forwardand PDE-constrained inverse problems with noisy data, including simulation ofincompressible fluid flows with a primitive-variables formulation of theNavier-Stokes equations up to a Reynolds number of 1000.", "output": "An adaptive augmented Lagrangian method for training physics and equality constrained artificial neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven soft sensors are essential for achieving accurate perceptionthrough reliable state inference. However, developing representative softsensor models is challenged by issues such as missing labels, domainadaptability, and temporal coherence in data. To address these challenges, wepropose a deep Particle Flow Bayes (DPFB) framework for cross-domain softsensor modeling in the absence of target state labels. In particular, asequential Bayes objective is first formulated to perform the maximumlikelihood estimation underlying the cross-domain soft sensing problem. At thecore of the framework, we incorporate a physics-inspired particle flow thatoptimizes the sequential Bayes objective to perform an exact Bayes update ofthe model extracted latent and hidden features. As a result, thesecontributions enable the proposed framework to learn a cohesive approximateposterior feature representation capable of characterizing complex cross-domainsystem dynamics and performing effective time series unsupervised domainadaptation (UDA). Finally, we validate the framework on a complex industrialmultiphase flow process system with complex dynamics and multiple operatingconditions. The results demonstrate that the DPFB framework achieves superiorunsupervised cross-domain soft sensing performance, outperformingstate-of-the-art deep UDA and normalizing flow approaches.", "output": "Unsupervised Cross-Domain Soft Sensor Modelling via A Deep Bayesian Particle Flow Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the prediction of the Hamiltonian matrix, which finds use inquantum chemistry and condensed matter physics. Efficiency and equivariance aretwo important, but conflicting factors. In this work, we propose aSE(3)-equivariant network, named QHNet, that achieves efficiency andequivariance. Our key advance lies at the innovative design of QHNetarchitecture, which not only obeys the underlying symmetries, but also enablesthe reduction of number of tensor products by 92%. In addition, QHNet preventsthe exponential growth of channel dimension when more atom types are involved.We perform experiments on MD17 datasets, including four molecular systems.Experimental results show that our QHNet can achieve comparable performance tothe state of the art methods at a significantly faster speed. Besides, ourQHNet consumes 50% less memory due to its streamlined architecture. Our codeis publicly available as part of the AIRS library(url{", "output": "Efficient and Equivariant Graph Networks for Predicting Quantum Hamiltonian."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Algorithms for online learning typically require one or more boundednessassumptions: that the domain is bounded, that the losses are Lipschitz, orboth. In this paper, we develop a new setting for online learning withunbounded domains and non-Lipschitz losses. For this setting we provide analgorithm which guarantees $R_{T}(u)le tildeO(G|u|sqrt{T}+L|u|^{2}sqrt{T})$ regret on any problem where thesubgradients satisfy $|g_{t}|le G+L|w_{t}|$, and show that this bound isunimprovable without further assumptions. We leverage this algorithm to developnew saddle-point optimization algorithms that converge in duality gap inunbounded domains, even in the absence of meaningful curvature. Finally, weprovide the first algorithm achieving non-trivial dynamic regret in anunbounded domain for non-Lipschitz losses, as well as a matching lower bound.The regret of our dynamic regret algorithm automatically improves to a novel$L^{*}$ bound when the losses are smooth.", "output": "Unconstrained Online Learning with Unbounded Losses."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the mean estimation problem under communication and localdifferential privacy constraints. While previous work has proposedemph{order}-optimal algorithms for the same problem (i.e., asymptoticallyoptimal as we spend more bits), emph{exact} optimality (in the non-asymptoticsetting) still has not been achieved. In this work, we take a step towardscharacterizing the emph{exact}-optimal approach in the presence of sharedrandomness (a random variable shared between the server and the user) andidentify several necessary conditions for emph{exact} optimality. We provethat one of the necessary conditions is to utilize a rotationally symmetricshared random codebook. Based on this, we propose a randomization mechanismwhere the codebook is a randomly rotated simplex -- satisfying the necessaryproperties of the emph{exact}-optimal codebook. The proposed mechanism isbased on a $k$-closest encoding which we prove to be emph{exact}-optimal forthe randomly rotated simplex codebook.", "output": "Exact Optimality of Communication-Privacy-Utility Tradeoffs in Distributed Mean Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of largely human-annotated benchmarks has driven the successof deep neural networks in various NLP tasks. To enhance the effectiveness ofexisting benchmarks, collecting new additional input-output pairs is often toocostly and challenging, particularly considering their marginal impact onimproving the current model accuracy. Instead, additional or complementaryannotations on the existing input texts in the benchmarks can be preferable asan efficient way to pay the additional human cost. In this paper, weinvestigate task-specific preferences between pairs of input texts as a newalternative way for such auxiliary data annotation. From 'pair-wise'comparisons with respect to the task, the auxiliary preference learning enablesthe model to learn an additional informative training signal that cannot becaptured with 'instance-wise' task labels. To this end, we propose a novelmulti-task learning framework, called prefer-to-classify (P2C), which can enjoythe cooperative effect of learning both the given classification task and theauxiliary preferences. Here, we provide three different ways to collectpreference signals in practice: (a) implicitly extracting from annotationrecords (for free, but often unavailable), (b) collecting explicitly from crowdworkers (high paid), or (c) pre-trained large language models such as GPT-3(low paid). Given existing classification NLP benchmarks, we demonstrate thatthe proposed auxiliary preference learning via P2C on them is effective inimproving text classifiers. Our codes are publicly available.", "output": "Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The COVID-19 pandemic led to 1.1 million deaths in the United States, despitethe explosion of coronavirus research. These new findings are slow to translateto clinical interventions, leading to poorer patient outcomes and unnecessarydeaths. One reason is that clinicians, overwhelmed by patients, struggle tokeep pace with the rate of new coronavirus literature. A potential solution isdeveloping a tool for evaluating coronavirus literature using large languagemodels (LLMs) -- neural networks that are deployed for natural languageprocessing. LLMs can be used to summarize and extract user-specifiedinformation. The greater availability and advancement of LLMs and pre-processedcoronavirus literature databases provide the opportunity to assist cliniciansin evaluating coronavirus literature through a coronavirus literature specificLLM (covLLM), a tool that directly takes an inputted research article and auser query to return an answer. Using the COVID-19 Open Research Dataset(CORD-19), we produced two datasets: (1) synCovid, which uses a combination ofhandwritten prompts and synthetic prompts generated using OpenAI, and (2) realabstracts, which contains abstract and title pairs. covLLM was trained withLLaMA 7B as a baseline model to produce three models trained on (1) the Alpacaand synCovid datasets, (2) the synCovid dataset, and (3) the synCovid and realabstract datasets. These models were evaluated by two human evaluators andChatGPT. Results demonstrate that training covLLM on the synCovid and abstractpairs datasets performs competitively with ChatGPT and outperforms covLLMtrained primarily using the Alpaca dataset.", "output": "covLLM: Large Language Models for COVID-19 Biomedical Literature."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI powered code-recommendation systems, such as Copilot and CodeWhisperer,provide code suggestions inside a programmer's environment (e.g., an IDE) withthe aim to improve their productivity. Since, in these scenarios, programmersaccept and reject suggestions, ideally, such a system should use this feedbackin furtherance of this goal. In this work we leverage prior data of programmersinteracting with Copilot to develop interventions that can save programmertime. We propose a utility theory framework, which models this interaction withprogrammers and decides when and which suggestions to display. Our frameworkConditional suggestion Display from Human Feedback (CDHF) is based onpredictive models of programmer actions. Using data from 535 programmers webuild models that predict the likelihood of suggestion acceptance. In aretrospective evaluation on real-world programming tasks solved withAI-assisted programming, we find that CDHF can achieve favorable tradeoffs. Ourfindings show the promise of integrating human feedback to improve interactionwith large language models in scenarios such as programming and possiblywriting tasks.", "output": "When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Soft prompt tuning achieves superior performances across a wide range offew-shot tasks. However, the performances of prompt tuning can be highlysensitive to the initialization of the prompts. We also empirically observethat conventional prompt tuning methods cannot encode and learn sufficienttask-relevant information from prompt tokens. In this work, we develop aninformation-theoretic framework that formulates soft prompt tuning asmaximizing mutual information between prompts and other model parameters (orencoded representations). This novel view helps us to develop a more efficient,accurate and robust soft prompt tuning method InfoPrompt. With this framework,we develop two novel mutual information based loss functions, to (i) discoverproper prompt initialization for the downstream tasks and learn sufficienttask-relevant information from prompt tokens and (ii) encourage the outputrepresentation from the pretrained language model to be more aware of thetask-relevant information captured in the learnt prompt. Extensive experimentsvalidate that InfoPrompt can significantly accelerate the convergence of theprompt tuning and outperform traditional prompt tuning methods. Finally, weprovide a formal theoretical result for showing to show that gradient descenttype algorithm can be used to train our mutual information loss.", "output": "InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we propose a novel activation mechanism aimed at establishinglayer-level activation (LayerAct) functions. These functions are designed to bemore noise-robust compared to traditional element-level activation functions byreducing the layer-level fluctuation of the activation outputs due to shift ininputs. Moreover, the LayerAct functions achieve a zero-like mean activationoutput without restricting the activation output space. We present an analysisand experiments demonstrating that LayerAct functions exhibit superiornoise-robustness compared to element-level activation functions, andempirically show that these functions have a zero-like mean activation.Experimental results on three benchmark image classification tasks show thatLayerAct functions excel in handling noisy image datasets, outperformingelement-level activation functions, while the performance on clean datasets isalso superior in most cases.", "output": "Layer-level activation mechanism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a modified neural model for topic detection from a corpusand proposes a new metric to evaluate the detected topics. The new model buildsupon the embedded topic model incorporating some modifications such as documentclustering. Numerical experiments suggest that the new model performsfavourably regardless of the document's length. The new metric, which can becomputed more efficiently than widely-used metrics such as topic coherence,provides variable information regarding the understandability of the detectedtopics.", "output": "A modified model for topic detection from a corpus and a new metric evaluating the understandability of topics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the recent progress in sports analytics, deep learning approaches havedemonstrated the effectiveness of mining insights into players' tactics forimproving performance quality and fan engagement. This is attributed to theavailability of public ground-truth datasets. While there are a few availabledatasets for turn-based sports for action detection, these datasets severelylack structured source data and stroke-level records since these requirehigh-cost labeling efforts from domain experts and are hard to detect usingautomatic techniques. Consequently, the development of artificial intelligenceapproaches is significantly hindered when existing models are applied to morechallenging structured turn-based sequences. In this paper, we presentShuttleSet, the largest publicly-available badminton singles dataset withannotated stroke-level records. It contains 104 sets, 3,685 rallies, and 36,492strokes in 44 matches between 2018 and 2021 with 27 top-ranking men's singlesand women's singles players. ShuttleSet is manually annotated with acomputer-aided labeling tool to increase the labeling efficiency andeffectiveness of selecting the shot type with a choice of 18 distinct classes,the corresponding hitting locations, and the locations of both players at eachstroke. In the experiments, we provide multiple benchmarks (i.e., strokeinfluence, stroke forecasting, and movement forecasting) with baselines toillustrate the practicability of using ShuttleSet for turn-based analytics,which is expected to stimulate both academic and sports communities. Over thepast two years, a visualization platform has been deployed to illustrate thevariability of analysis cases from ShuttleSet for coaches to delve intoplayers' tactical preferences with human-interactive interfaces, which was alsoused by national badminton teams during multiple international high-rankingmatches.", "output": "ShuttleSet: A Human-Annotated Stroke-Level Singles Dataset for Badminton Tactical Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep learning models have shown remarkable performance in varioustasks, they are susceptible to learning non-generalizable spurious featuresrather than the core features that are genuinely correlated to the true label.In this paper, beyond existing analyses of linear models, we theoreticallyexamine the learning process of a two-layer nonlinear convolutional neuralnetwork in the presence of spurious features. Our analysis suggests thatimbalanced data groups and easily learnable spurious features can lead to thedominance of spurious features during the learning process. In light of this,we propose a new training algorithm called PDE that efficiently enhances themodel's robustness for a better worst-group performance. PDE begins with agroup-balanced subset of training data and progressively expands it tofacilitate the learning of the core features. Experiments on synthetic andreal-world benchmark datasets confirm the superior performance of our method onmodels such as ResNets and Transformers. On average, our method achieves a 2.8%improvement in worst-group accuracy compared with the state-of-the-art method,while enjoying up to 10x faster training efficiency.", "output": "Robust Learning with Progressive Data Expansion Against Spurious Correlation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Efficiently sampling from un-normalized target distributions is a fundamentalproblem in scientific computing and machine learning. Traditional approacheslike Markov Chain Monte Carlo (MCMC) guarantee asymptotically unbiased samplesfrom such distributions but suffer from computational inefficiency,particularly when dealing with high-dimensional targets, as they requirenumerous iterations to generate a batch of samples. In this paper, we proposean efficient and scalable neural implicit sampler that overcomes theselimitations. Our sampler can generate large batches of samples with lowcomputational costs by leveraging a neural transformation that directly mapseasily sampled latent vectors to target samples without the need for iterativeprocedures. To train the neural implicit sampler, we introduce two novelmethods: the KL training method and the Fisher training method. The formerminimizes the Kullback-Leibler divergence, while the latter minimizes theFisher divergence. By employing these training methods, we effectively optimizethe neural implicit sampler to capture the desired target distribution. Todemonstrate the effectiveness, efficiency, and scalability of our proposedsamplers, we evaluate them on three sampling benchmarks with different scales.These benchmarks include sampling from 2D targets, Bayesian inference, andsampling from high-dimensional energy-based models (EBMs). Notably, in theexperiment involving high-dimensional EBMs, our sampler produces samples thatare comparable to those generated by MCMC-based methods while being more than100 times more efficient, showcasing the efficiency of our neural sampler. Webelieve that the theoretical and empirical contributions presented in this workwill stimulate further research on developing efficient samplers for variousapplications beyond the ones explored in this study.", "output": "Entropy-based Training Methods for Scalable Neural Implicit Sampler."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is well-known that modern computer vision systems often exhibit behaviorsmisaligned with those of humans: from adversarial attacks to image corruptions,deep learning vision models suffer in a variety of settings that humans capablyhandle. In light of these phenomena, here we introduce another, orthogonalperspective studying the human-machine vision gap. We revisit the task ofrecovering images under degradation, first introduced over 30 years ago in theRecognition-by-Components theory of human vision. Specifically, we study theperformance and behavior of neural networks on the seemingly simple task ofclassifying regular polygons at varying orders of degradation along theirperimeters. To this end, we implement the Automated Shape Recoverability Testfor rapidly generating large-scale datasets of perimeter-degraded regularpolygons, modernizing the historically manual creation of image recoverabilityexperiments. We then investigate the capacity of neural networks to recognizeand recover such degraded shapes when initialized with different priors.Ultimately, we find that neural networks' behavior on this simple taskconflicts with human behavior, raising a fundamental question of the robustnessand learning capabilities of modern computer vision models.", "output": "Degraded Polygons Raise Fundamental Questions of Neural Network Perception."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid advancement of spoofing algorithms necessitates the development ofrobust detection methods capable of accurately identifying emerging fake audio.Traditional approaches, such as finetuning on new datasets containing thesenovel spoofing algorithms, are computationally intensive and pose a risk ofimpairing the acquired knowledge of known fake audio types. To address thesechallenges, this paper proposes an innovative approach that mitigates thelimitations associated with finetuning. We introduce the concept of traininglow-rank adaptation matrices tailored specifically to the newly emerging fakeaudio types. During the inference stage, these adaptation matrices are combinedwith the existing model to generate the final prediction output. Extensiveexperimentation is conducted to evaluate the efficacy of the proposed method.The results demonstrate that our approach effectively preserves the predictionaccuracy of the existing model for known fake audio types. Furthermore, ourapproach offers several advantages, including reduced storage memoryrequirements and lower equal error rates compared to conventional finetuningmethods, particularly on specific spoofing algorithms.", "output": "Adaptive Fake Audio Detection with Low-Rank Model Squeezing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new algorithm for the problem of recovering data that adheres tomultiple, heterogeneous low-dimensional structures from linear observations.Focusing on data matrices that are simultaneously row-sparse and low-rank, wepropose and analyze an iteratively reweighted least squares (IRLS) algorithmthat is able to leverage both structures. In particular, it optimizes acombination of non-convex surrogates for row-sparsity and rank, a balancing ofwhich is built into the algorithm. We prove locally quadratic convergence ofthe iterates to a simultaneously structured data matrix in a regime of minimalsample complexity (up to constants and a logarithmic factor), which is known tobe impossible for a combination of convex surrogates. In experiments, we showthat the IRLS method exhibits favorable empirical convergence, identifyingsimultaneously row-sparse and low-rank matrices from fewer measurements thanstate-of-the-art methods.", "output": "Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Temporal graph clustering (TGC) is a crucial task in temporal graph learning.Its focus is on node clustering on temporal graphs, and it offers greaterflexibility for large-scale graph structures due to the mechanism of temporalgraph methods. However, the development of TGC is currently constrained by asignificant problem: the lack of suitable and reliable large-scale temporalgraph datasets to evaluate clustering performance. In other words, mostexisting temporal graph datasets are in small sizes, and even large-scaledatasets contain only a limited number of available node labels. It makesevaluating models for large-scale temporal graph clustering challenging. Toaddress this challenge, we build arXiv4TGC, a set of novel academic datasets(including arXivAI, arXivCS, arXivMath, arXivPhy, and arXivLarge) forlarge-scale temporal graph clustering. In particular, the largest dataset,arXivLarge, contains 1.3 million labeled available nodes and 10 milliontemporal edges. We further compare the clustering performance with typicaltemporal graph learning models on both previous classic temporal graph datasetsand the new datasets proposed in this paper. The clustering performance onarXiv4TGC can be more apparent for evaluating different models, resulting inhigher clustering confidence and more suitable for large-scale temporal graphclustering. The arXiv4TGC datasets are publicly available at:", "output": "arXiv4TGC: Large-Scale Datasets for Temporal Graph Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The usage of more than one language in the same text is referred to as CodeMixed. It is evident that there is a growing degree of adaption of the use ofcode-mixed data, especially English with a regional language, on social mediaplatforms. Existing deep-learning models do not take advantage of the implicitlanguage information in the code-mixed text. Our study aims to improveBERT-based models performance on low-resource Code-Mixed Hindi-English Datasetsby experimenting with language augmentation approaches. We propose a pipelineto improve code-mixed systems that comprise data preprocessing, word-levellanguage identification, language augmentation, and model training ondownstream tasks like sentiment analysis. For language augmentation in BERTmodels, we explore word-level interleaving and post-sentence placement oflanguage information. We have examined the performance of vanilla BERT-basedmodels and their code-mixed HingBERT counterparts on respective benchmarkdatasets, comparing their results with and without using word-level languageinformation. The models were evaluated using metrics such as accuracy,precision, recall, and F1 score. Our findings show that the proposed languageaugmentation approaches work well across different BERT models. We demonstratethe importance of augmenting code-mixed text with language information on fivedifferent code-mixed Hindi-English downstream datasets based on sentimentanalysis, hate speech detection, and emotion detection.", "output": "Leveraging Language Identification to Enhance Code-Mixed Text Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We survey eight recent works by our group, involving the successful blendingof evolutionary algorithms with machine learning and deep learning: 1. Binaryand Multinomial Classification through Evolutionary Symbolic Regression, 2.Classy Ensemble: A Novel Ensemble Algorithm for Classification, 3. EC-KitY:Evolutionary Computation Tool Kit in Python, 4. Evolution of ActivationFunctions for Deep Learning-Based Image Classification, 5. Adaptive Combinationof a Genetic Algorithm and Novelty Search for Deep Neuroevolution, 6. AnEvolutionary, Gradient-Free, Query-Efficient, Black-Box Algorithm forGenerating Adversarial Instances in Deep Networks, 7. Foiling Explanations inDeep Neural Networks, 8. Patch of Invisibility: Naturalistic Black-BoxAdversarial Attacks on Object Detectors.", "output": "A Melting Pot of Evolution and Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Errors of machine learning models are costly, especially in safety-criticaldomains such as healthcare, where such mistakes can prevent the deployment ofmachine learning altogether. In these settings, conservative models -- modelswhich can defer to human judgment when they are likely to make an error -- mayoffer a solution. However, detecting unusual or difficult examples is notablychallenging, as it is impossible to anticipate all potential inputs at testtime. To address this issue, prior work has proposed to minimize the model'sconfidence on an auxiliary pseudo-OOD dataset. We theoretically analyze theeffect of confidence minimization and show that the choice of auxiliary datasetis critical. Specifically, if the auxiliary dataset includes samples from theOOD region of interest, confidence minimization provably separates ID and OODinputs by predictive confidence. Taking inspiration from this result, wepresent data-driven confidence minimization (DCM), which minimizes confidenceon an uncertainty dataset containing examples that the model is likely tomisclassify at test time. Our experiments show that DCM consistentlyoutperforms state-of-the-art OOD detection methods on 8 ID-OOD dataset pairs,reducing FPR (at TPR 95%) by 6.3% and 58.1% on CIFAR-10 and CIFAR-100, andoutperforms existing selective classification approaches on 4 datasets inconditions of distribution shift.", "output": "Conservative Prediction via Data-Driven Confidence Minimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although graph neural networks (GNNs) have achieved impressive achievementsin graph classification, they often need abundant task-specific labels, whichcould be extensively costly to acquire. A credible solution is to exploreadditional labeled graphs to enhance unsupervised learning on the targetdomain. However, how to apply GNNs to domain adaptation remains unsolved owingto the insufficient exploration of graph topology and the significant domaindiscrepancy. In this paper, we propose underline{Co}upledunderline{Co}ntrastive Graph Representation Learning (method{}), whichextracts the topological information from coupled learning branches and reducesthe domain discrepancy with coupled contrastive learning. method{} contains agraph convolutional network branch and a hierarchical graph kernel networkbranch, which explore graph topology in implicit and explicit manners. Besides,we incorporate coupled branches into a holistic multi-view contrastive learningframework, which not only incorporates graph representations learned fromcomplementary views for enhanced understanding, but also encourages thesimilarity between cross-domain example pairs with the same semantics fordomain alignment. Extensive experiments on various popular datasets show thatmethod{} outperforms these competing baselines by 5.7% to 21.0% generally.", "output": "CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As a collaborative paradigm, Federated Learning (FL) empowers clients toengage in collective model training without exchanging their respective localdata. Nevertheless, FL remains vulnerable to backdoor attacks in which anattacker compromises malicious clients, and injects poisoned model weights intothe aggregation process to yield attacker-chosen predictions for particularsamples. Existing countermeasures, mainly based on anomaly detection, mayerroneously reject legitimate weights while accepting malicious ones, which isdue to inadequacies in quantifying client model similarities. Other defensemechanisms prove effective exclusively when confronted with a restricted numberof malicious clients, e.g., less than 10%. To address these vulnerabilities, wepresent G$^2$uardFL, a protective framework that reframes the detection ofmalicious clients as an attributed graph clustering problem, therebysafeguarding FL systems. This framework employs a client graph clusteringtechnique to identify malicious clients and incorporates an adaptive method toamplify the disparity between the aggregated model and poisoned client models,thereby eliminating previously embedded backdoors. A theoretical analysis ofconvergence is also performed to demonstrate that the global model closelyapproximates the model untouched by any backdoor. Through empirical evaluationcompared to cutting-edge defenses and against various backdoor attacks, ourexperimental results indicate that G$^2$uardFL considerably undermines theeffectiveness of backdoor attacks while maintaining a negligible impact on thebenign sample performance.", "output": "G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks through Attributed Client Graph Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Research has shown that deep networks tend to be overly optimistic abouttheir predictions, leading to an underestimation of prediction errors. Due tothe limited nature of data, existing studies have proposed various methodsbased on model prediction probabilities to bin the data and evaluatecalibration error. We propose a more generalized definition of calibrationerror called Partitioned Calibration Error (PCE), revealing that the keydifference among these calibration error metrics lies in how the data space ispartitioned. We put forth an intuitive proposition that an accurate modelshould be calibrated across any partition, suggesting that the input spacepartitioning can extend beyond just the partitioning of predictionprobabilities, and include partitions directly related to the input. Throughsemantic-related partitioning functions, we demonstrate that the relationshipbetween model accuracy and calibration lies in the granularity of thepartitioning function. This highlights the importance of partitioning criteriafor training a calibrated and accurate model. To validate the aforementionedanalysis, we propose a method that involves jointly learning a semantic awaregrouping function based on deep model features and logits to partition the dataspace into subsets. Subsequently, a separate calibration function is learnedfor each subset. Experimental results demonstrate that our approach achievessignificant performance improvements across multiple datasets and networkarchitectures, thus highlighting the importance of the partitioning functionfor calibration.", "output": "Beyond Probability Partitions: Calibrating Neural Networks with Semantic Aware Grouping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Minimizing response times is crucial for emergency medical services to reducepatients' waiting times and to increase their survival rates. Many models existto optimize operational tasks such as ambulance allocation and dispatching.Including accurate demand forecasts in such models can improve operationaldecision-making. Against this background, we present a novel convolutionalneural network (CNN) architecture that transforms time series data intoheatmaps to predict ambulance demand. Applying such predictions requiresincorporating external features that influence ambulance demands. We contributeto the existing literature by providing a flexible, generic CNN architecture,allowing for the inclusion of external features with varying dimensions.Additionally, we provide a feature selection and hyperparameter optimizationframework utilizing Bayesian optimization. We integrate historical ambulancedemand and external information such as weather, events, holidays, and time. Toshow the superiority of the developed CNN architecture over existingapproaches, we conduct a case study for Seattle's 911 call data and includeexternal information. We show that the developed CNN architecture outperformsexisting state-of-the-art methods and industry practice by more than 9%.", "output": "Ambulance Demand Prediction via Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of the multi-media internet, visual characteristics havebecome an important factor affecting user interests. Thus, incorporating visualfeatures is a promising direction for further performance improvements inclick-through rate (CTR) prediction. However, we found that simply injectingthe image embeddings trained with established pre-training methods only hasmarginal improvements. We attribute the failure to two reasons: First, Thepre-training methods are designed for well-defined computer vision tasksconcentrating on semantic features, and they cannot learn personalized interestin recommendations. Secondly, pre-trained image embeddings only containingsemantic information have little information gain, considering we already havesemantic features such as categories and item titles as inputs in the CTRprediction task. We argue that a pre-training method tailored forrecommendation is necessary for further improvements. To this end, we propose arecommendation-aware image pre-training method that can learn visual featuresfrom user click histories. Specifically, we propose a user interestreconstruction module to mine visual features related to user interests frombehavior histories. We further propose a contrastive training method to avoidcollapsing of embedding vectors. We conduct extensive experiments to verifythat our method can learn users' visual interests, and our method achieves$0.46%$ improvement in offline AUC and $0.88%$ improvement in Taobao onlineGMV with p-value$<0.01$.", "output": "COURIER: Contrastive User Intention Reconstruction for Large-Scale Pre-Train of Image Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ranking model plays an essential role in e-commerce search andrecommendation. An effective ranking model should give a personalized rankinglist for each user according to the user preference. Existing algorithmsusually extract a user representation vector from the user behavior sequence,then feed the vector into a feed-forward network (FFN) together with otherfeatures for feature interactions, and finally produce a personalized rankingscore. Despite tremendous progress in the past, there is still room forimprovement. Firstly, the personalized patterns of feature interactions fordifferent users are not explicitly modeled. Secondly, most of existingalgorithms have poor personalized ranking results for long-tail users with fewhistorical behaviors due to the data sparsity. To overcome the two challenges,we propose Attention Weighted Mixture of Experts (AW-MoE) with contrastivelearning for personalized ranking. Firstly, AW-MoE leverages the MoE frameworkto capture personalized feature interactions for different users. To model theuser preference, the user behavior sequence is simultaneously fed into expertnetworks and the gate network. Within the gate network, one gate unit and oneactivation unit are designed to adaptively learn the fine-grained activationvector for experts using an attention mechanism. Secondly, a random maskingstrategy is applied to the user behavior sequence to simulate long-tail users,and an auxiliary contrastive loss is imposed to the output of the gate networkto improve the model generalization for these users. This is validated by ahigher performance gain on the long-tail user test set. Experiment results on aJD real production dataset and a public dataset demonstrate the effectivenessof AW-MoE, which significantly outperforms state-of-art methods. Notably,AW-MoE has been successfully deployed in the JD e-commerce search engine, ...", "output": "Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a novel Sequence-to-Sequence (Seq2Seq) model based on atransformer-based attention mechanism and temporal pooling for Non-IntrusiveLoad Monitoring (NILM) of smart buildings. The paper aims to improve theaccuracy of NILM by using a deep learning-based method. The proposed methoduses a Seq2Seq model with a transformer-based attention mechanism to capturethe long-term dependencies of NILM data. Additionally, temporal pooling is usedto improve the model's accuracy by capturing both the steady-state andtransient behavior of appliances. The paper evaluates the proposed method on apublicly available dataset and compares the results with other state-of-the-artNILM techniques. The results demonstrate that the proposed method outperformsthe existing methods in terms of both accuracy and computational efficiency.", "output": "Sequence-to-Sequence Model with Transformer-based Attention Mechanism and Temporal Pooling for Non-Intrusive Load Monitoring."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is growing interest in discovering interpretable, closed-form equationsfor subgrid-scale (SGS) closures/parameterizations of complex processes inEarth system. Here, we apply a common equation-discovery technique withexpansive libraries to learn closures from filtered direct numericalsimulations of 2D forced turbulence and Rayleigh-B'enard convection (RBC).Across common filters, we robustly discover closures of the same form formomentum and heat fluxes. These closures depend on nonlinear combinations ofgradients of filtered variables (velocity, temperature), with constants thatare independent of the fluid/flow properties and only depend on filtertype/size. We show that these closures are the nonlinear gradient model (NGM),which is derivable analytically using Taylor-series expansions. In fact, wesuggest that with common (physics-free) equation-discovery algorithms,regardless of the system/physics, discovered closures are always consistentwith the Taylor-series. Like previous studies, we find that large-eddysimulations with NGM closures are unstable, despite significant similaritiesbetween the true and NGM-predicted fluxes (pattern correlations $> 0.95$). Weidentify two shortcomings as reasons for these instabilities: in 2D, NGMproduces zero kinetic energy transfer between resolved and subgrid scales,lacking both diffusion and backscattering. In RBC, backscattering of potentialenergy is poorly predicted. Moreover, we show that SGS fluxes diagnosed fromdata, presumed the \"truth\" for discovery, depend on filtering procedures andare not unique. Accordingly, to learn accurate, stable closures fromhigh-fidelity data in future work, we propose several ideas around usingphysics-informed libraries, loss functions, and metrics. These findings arerelevant beyond turbulence to closure modeling of any multi-scale system.", "output": "Learning Closed-form Equations for Subgrid-scale Closures from High-fidelity Data: Promises and Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Demand-side management now encompasses more residential loads. To efficientlyapply demand response strategies, it's essential to periodically observe thecontribution of various domestic appliances to total energy consumption.Non-intrusive load monitoring (NILM), also known as load disaggregation, is amethod for decomposing the total energy consumption profile into individualappliance load profiles within the household. It has multiple applications indemand-side management, energy consumption monitoring, and analysis. Variousmethods, including machine learning and deep learning, have been used toimplement and improve NILM algorithms. This paper reviews some recent NILMmethods based on deep learning and introduces the most accurate methods forresidential loads. It summarizes public databases for NILM evaluation andcompares methods using standard performance metrics.", "output": "Non-Intrusive Load Monitoring (NILM) using Deep Neural Networks: A Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural Network designs are quite diverse, from VGG-style to ResNet-style, andfrom Convolutional Neural Networks to Transformers. Towards the design ofefficient accelerators, many works have adopted a dataflow-based, inter-layerpipelined architecture, with a customised hardware towards each layer,achieving ultra high throughput and low latency. The deployment of neuralnetworks to such dataflow architecture accelerators is usually hindered by theavailable on-chip memory as it is desirable to preload the weights of neuralnetworks on-chip to maximise the system performance. To address this, networksare usually compressed before the deployment through methods such as pruning,quantization and tensor decomposition. In this paper, a framework for mappingCNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD isproposed. The proposed method applies layer-specific Singular ValueDecomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixedmanner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs.Our work is open-sourced: ", "output": "Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The posterior collapse phenomenon in variational autoencoders (VAEs), wherethe variational posterior distribution closely matches the prior distribution,can hinder the quality of the learned latent variables. As a consequence ofposterior collapse, the latent variables extracted by the encoder in VAEspreserve less information from the input data and thus fail to producemeaningful representations as input to the reconstruction process in thedecoder. While this phenomenon has been an actively addressed topic related toVAEs performance, the theory for posterior collapse remains underdeveloped,especially beyond the standard VAEs. In this work, we advance the theoreticalunderstanding of posterior collapse to two important and prevalent yet lessstudied classes of VAEs: conditional VAEs and hierarchical VAEs. Specifically,via a non-trivial theoretical analysis of linear conditional VAEs andhierarchical VAEs with two levels of latent, we prove that the cause ofposterior collapses in these models includes the correlation between the inputand output of the conditional VAEs and the effect of learnable encoder variancein the hierarchical VAEs. We empirically validate our theoretical findings forlinear conditional and hierarchical VAEs and demonstrate that these results arealso predictive for non-linear cases.", "output": "Posterior Collapse in Linear Conditional and Hierarchical Variational Autoencoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whole slide image (WSI) refers to a type of high-resolution scanned tissueimage, which is extensively employed in computer-assisted diagnosis (CAD). Theextremely high resolution and limited availability of region-level annotationsmake it challenging to employ deep learning methods for WSI-based digitaldiagnosis. Multiple instance learning (MIL) is a powerful tool to address theweak annotation problem, while Transformer has shown great success in the fieldof visual tasks. The combination of both should provide new insights for deeplearning based image diagnosis. However, due to the limitations of single-levelMIL and the attention mechanism's constraints on sequence length, directlyapplying Transformer to WSI-based MIL tasks is not practical. To tackle thisissue, we propose a Multi-level MIL with Transformer (MMIL-Transformer)approach. By introducing a hierarchical structure to MIL, this approach enablesefficient handling of MIL tasks that involve a large number of instances. Tovalidate its effectiveness, we conducted a set of experiments on WSIsclassification task, where MMIL-Transformer demonstrate superior performancecompared to existing state-of-the-art methods. Our proposed approach achievestest AUC 94.74% and test accuracy 93.41% on CAMELYON16 dataset, test AUC 99.04%and test accuracy 94.37% on TCGA-NSCLC dataset, respectively. All code andpre-trained models are available at: ", "output": "Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent neural architecture search (NAS) frameworks have been successful infinding optimal architectures for given conditions (e.g., performance orlatency). However, they search for optimal architectures in terms of theirperformance on clean images only, while robustness against various types ofperturbations or corruptions is crucial in practice. Although there existseveral robust NAS frameworks that tackle this issue by integrating adversarialtraining into one-shot NAS, however, they are limited in that they onlyconsider robustness against adversarial attacks and require significantcomputational resources to discover optimal architectures for a single task,which makes them impractical in real-world scenarios. To address thesechallenges, we propose a novel lightweight robust zero-cost proxy thatconsiders the consistency across features, parameters, and gradients of bothclean and perturbed images at the initialization state. Our approachfacilitates an efficient and rapid search for neural architectures capable oflearning generalizable features that exhibit robustness across diverseperturbations. The experimental results demonstrate that our proxy can rapidlyand efficiently search for neural architectures that are consistently robustagainst various perturbations on multiple benchmark datasets and diverse searchspaces, largely outperforming existing clean zero-shot NAS and robust NAS withreduced search cost.", "output": "Generalizable Lightweight Proxy for Robust NAS against Diverse Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "System logs play a critical role in maintaining the reliability of softwaresystems. Fruitful studies have explored automatic log-based anomaly detectionand achieved notable accuracy on benchmark datasets. However, when applied tolarge-scale cloud systems, these solutions face limitations due to highresource consumption and lack of adaptability to evolving logs. In this paper,we present an accurate, lightweight, and adaptive log-based anomaly detectionframework, referred to as SeaLog. Our method introduces a Trie-based DetectionAgent (TDA) that employs a lightweight, dynamically-growing trie structure forreal-time anomaly detection. To enhance TDA's accuracy in response to evolvinglog data, we enable it to receive feedback from experts. Interestingly, ourfindings suggest that contemporary large language models, such as ChatGPT, canprovide feedback with a level of consistency comparable to human experts, whichcan potentially reduce manual verification efforts. We extensively evaluateSeaLog on two public datasets and an industrial dataset. The results show thatSeaLog outperforms all baseline methods in terms of effectiveness, runs 2X to10X faster and only consumes 5% to 41% of the memory resource.", "output": "Scalable and Adaptive Log-based Anomaly Detection with Expert in the Loop."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As Transformer-based models have achieved impressive performance on varioustime series tasks, Long-Term Series Forecasting (LTSF) tasks have also receivedextensive attention in recent years. However, due to the inherent computationalcomplexity and long sequences demanding of Transformer-based methods, itsapplication on LTSF tasks still has two major issues that need to be furtherinvestigated: 1) Whether the sparse attention mechanism designed by thesemethods actually reduce the running time on real devices; 2) Whether thesemodels need extra long input sequences to guarantee their performance? Theanswers given in this paper are negative. Therefore, to better copy with thesetwo issues, we design a lightweight Period-Attention mechanism (Periodformer),which renovates the aggregation of long-term subseries via explicit periodicityand short-term subseries via built-in proximity. Meanwhile, a gating mechanismis embedded into Periodformer to regulate the influence of the attention moduleon the prediction results. Furthermore, to take full advantage of GPUs for fasthyperparameter optimization (e.g., finding the suitable input length), aMulti-GPU Asynchronous parallel algorithm based on Bayesian Optimization (MABO)is presented. MABO allocates a process to each GPU via a queue mechanism, andthen creates multiple trials at a time for asynchronous parallel search, whichgreatly reduces the search time. Compared with the state-of-the-art methods,the prediction error of Periodformer reduced by 13% and 26% for multivariateand univariate forecasting, respectively. In addition, MABO reduces the averagesearch time by 46% while finding better hyperparameters. As a conclusion, thispaper indicates that LTSF may not need complex attention and extra long inputsequences. The source code will be open source on Github.", "output": "Does Long-Term Series Forecasting Need Complex Attention and Extra Long Inputs?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce a novel semantic generative communication (SGC)framework, where generative users leverage text-to-image (T2I) generators tocreate images locally from downloaded text prompts, while non-generative usersdirectly download images from a base station (BS). Although generative usershelp reduce downlink transmission energy at the BS, they consume additionalenergy for image generation and for uploading their generator state information(GSI). We formulate the problem of minimizing the total energy consumption ofthe BS and the users, and devise a generative user selection algorithm.Simulation results corroborate that our proposed algorithm reduces total energyby up to 54% compared to a baseline with all non-generative users.", "output": "Energy-Efficient Downlink Semantic Generative Communication with Text-to-Image Generators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, denoising diffusion models have led to significant breakthroughs inthe generation of images, audio and text. However, it is still an open questionon how to adapt their strong modeling ability to model time series. In thispaper, we propose TimeDiff, a non-autoregressive diffusion model that achieveshigh-quality time series prediction with the introduction of two novelconditioning mechanisms: future mixup and autoregressive initialization.Similar to teacher forcing, future mixup allows parts of the ground-truthfuture predictions for conditioning, while autoregressive initialization helpsbetter initialize the model with basic time series patterns such as short-termtrends. Extensive experiments are performed on nine real-world datasets.Results show that TimeDiff consistently outperforms existing time seriesdiffusion models, and also achieves the best overall performance across avariety of the existing strong baselines (including transformers and FiLM).", "output": "Non-autoregressive Conditional Diffusion Models for Time Series Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning with noisy labels is an important topic for scalable training inmany real-world scenarios. However, few previous research considers thisproblem in the online setting, where the arrival of data is streaming. In thispaper, we propose a novel gradient-based approach to enable the detection ofnoisy labels for the online learning of model parameters, named OnlineGradient-based Robust Selection (OGRS). In contrast to the previous sampleselection approach for the offline training that requires the estimation of aclean ratio of the dataset before each epoch of training, OGRS canautomatically select clean samples by steps of gradient update from datasetswith varying clean ratios without changing the parameter setting. During thetraining process, the OGRS method selects clean samples at each iteration andfeeds the selected sample to incrementally update the model parameters. Weprovide a detailed theoretical analysis to demonstrate data selection processis converging to the low-loss region of the sample space, by introducing andproving the sub-linear local Lagrangian regret of the non-convex constrainedoptimization problem. Experimental results show that it outperformsstate-of-the-art methods in different settings.", "output": "A Gradient-based Approach for Online Robust Deep Neural Network Training with Noisy Labels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tabular data is often hidden in text, particularly in medical diagnosticreports. Traditional machine learning (ML) models designed to work with tabulardata, cannot effectively process information in such form. On the other hand,large language models (LLMs) which excel at textual tasks, are probably not thebest tool for modeling tabular data. Therefore, we propose a novel, simple, andeffective methodology for extracting structured tabular data from textualmedical reports, called TEMED-LLM. Drawing upon the reasoning capabilities ofLLMs, TEMED-LLM goes beyond traditional extraction techniques, accuratelyinferring tabular features, even when their names are not explicitly mentionedin the text. This is achieved by combining domain-specific reasoning guidelineswith a proposed data validation and reasoning correction feedback loop. Byapplying interpretable ML models such as decision trees and logistic regressionover the extracted and validated data, we obtain end-to-end interpretablepredictions. We demonstrate that our approach significantly outperformsstate-of-the-art text classification models in medical diagnostics. Given itspredictive performance, simplicity, and interpretability, TEMED-LLM underscoresthe potential of leveraging LLMs to improve the performance and trustworthinessof ML models in medical applications.", "output": "Interpretable Medical Diagnostics with Structured Data Extraction by Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing pruning methods utilize the importance of each weight based onspecified criteria only when searching for a sparse structure but do notutilize it during training. In this work, we propose a novel approach -textbf{M}agnitude textbf{A}ttention-based Dynamic textbf{P}runing (MAP)method, which applies the importance of weights throughout both the forward andbackward paths to explore sparse model structures dynamically. Magnitudeattention is defined based on the magnitude of weights as continuousreal-valued numbers enabling a seamless transition from a redundant to aneffective sparse network by promoting efficient exploration. Additionally, theattention mechanism ensures more effective updates for important layers withinthe sparse network. In later stages of training, our approach shifts fromexploration to exploitation, exclusively updating the sparse model composed ofcrucial weights based on the explored structure, resulting in pruned modelsthat not only achieve performance comparable to dense models but alsooutperform previous pruning methods on CIFAR-10/100 and ImageNet.", "output": "Magnitude Attention-based Dynamic Pruning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Learning models are a standard solution for sensor-based Human ActivityRecognition (HAR), but their deployment is often limited by labeled datascarcity and models' opacity. Neuro-Symbolic AI (NeSy) provides an interestingresearch direction to mitigate these issues by infusing knowledge about contextinformation into HAR deep learning classifiers. However, existing NeSy methodsfor context-aware HAR require computationally expensive symbolic reasonersduring classification, making them less suitable for deployment onresource-constrained devices (e.g., mobile devices). Additionally, NeSyapproaches for context-aware HAR have never been evaluated on in-the-wilddatasets, and their generalization capabilities in real-world scenarios arequestionable. In this work, we propose a novel approach based on a semanticloss function that infuses knowledge constraints in the HAR model during thetraining phase, avoiding symbolic reasoning during classification. Our resultson scripted and in-the-wild datasets show the impact of different semantic lossfunctions in outperforming a purely data-driven model. We also compare oursolution with existing NeSy methods and analyze each approach's strengths andweaknesses. Our semantic loss remains the only NeSy solution that can bedeployed as a single DNN without the need for symbolic reasoning modules,reaching recognition rates close (and better in some cases) to existingapproaches.", "output": "Neuro-Symbolic Approaches for Context-Aware Human Activity Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since the rise of fair machine learning as a critical field of inquiry, manydifferent notions on how to quantify and measure discrimination have beenproposed in the literature. Some of these notions, however, were shown to bemutually incompatible. Such findings make it appear that numerous differentkinds of fairness exist, thereby making a consensus on the appropriate measureof fairness harder to reach, hindering the applications of these tools inpractice. In this paper, we investigate one of these key impossibility resultsthat relates the notions of statistical and predictive parity. Specifically, wederive a new causal decomposition formula for the fairness measures associatedwith predictive parity, and obtain a novel insight into how this criterion isrelated to statistical parity through the legal doctrines of disparatetreatment, disparate impact, and the notion of business necessity. Our resultsshow that through a more careful causal analysis, the notions of statisticaland predictive parity are not really mutually exclusive, but complementary andspanning a spectrum of fairness notions through the concept of businessnecessity. Finally, we demonstrate the importance of our findings on areal-world example.", "output": "Reconciling Predictive and Statistical Parity: A Causal Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The need to execute Deep Neural Networks (DNNs) at low latency and low powerat the edge has spurred the development of new heterogeneous Systems-on-Chips(SoCs) encapsulating a diverse set of hardware accelerators. How to optimallymap a DNN onto such multi-accelerator systems is an open problem. We proposeODiMO, a hardware-aware tool that performs a fine-grain mapping acrossdifferent accelerators on-chip, splitting individual layers and executing themin parallel, to reduce inference energy consumption or latency, while takinginto account each accelerator's quantization precision to maintain accuracy.Pareto-optimal networks in the accuracy vs. energy or latency space are pursuedfor three popular dataset/DNN pairs, and deployed on the DIANA heterogeneousultra-low power edge AI SoC. We show that ODiMO reduces energy/latency by up to33%/31% with limited accuracy drop (-0.53%/-0.32%) compared to manual heuristicmappings.", "output": "Precision-aware Latency and Energy Balancing on Multi-Accelerator Platforms for DNN Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As society transitions towards an AI-based decision-making infrastructure, anever-increasing number of decisions once under control of humans are nowdelegated to automated systems. Even though such developments make variousparts of society more efficient, a large body of evidence suggests that a greatdeal of care needs to be taken to make such automated decision-making systemsfair and equitable, namely, taking into account sensitive attributes such asgender, race, and religion. In this paper, we study a specific decision-makingtask called outcome control in which an automated system aims to optimize anoutcome variable $Y$ while being fair and equitable. The interest in such asetting ranges from interventions related to criminal justice and welfare, allthe way to clinical decision-making and public health. In this paper, we firstanalyze through causal lenses the notion of benefit, which captures how much aspecific individual would benefit from a positive decision, counterfactuallyspeaking, when contrasted with an alternative, negative one. We introduce thenotion of benefit fairness, which can be seen as the minimal fairnessrequirement in decision-making, and develop an algorithm for satisfying it. Wethen note that the benefit itself may be influenced by the protected attribute,and propose causal tools which can be used to analyze this. Finally, if some ofthe variations of the protected attribute in the benefit are considered asdiscriminatory, the notion of benefit fairness may need to be strengthened,which leads us to articulating a notion of causal benefit fairness. Using thisnotion, we develop a new optimization procedure capable of maximizing $Y$ whileascertaining causal fairness in the decision process.", "output": "Causal Fairness for Outcome Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual Prompt Tuning (VPT) is an effective tuning method for adaptingpretrained Vision Transformers (ViTs) to downstream tasks. It leverages extralearnable tokens, known as prompts, which steer the frozen pretrained ViTs.Although VPT has demonstrated its applicability with supervised visiontransformers, it often underperforms with self-supervised ones. Throughempirical observations, we deduce that the effectiveness of VPT hinges largelyon the ViT blocks with which the prompt tokens interact. Specifically, VPTshows improved performance on image classification tasks for MAE and MoCo v3when the prompt tokens are inserted into later blocks rather than the firstblock. These observations suggest that there exists an optimal location ofblocks for the insertion of prompt tokens. Unfortunately, identifying theoptimal blocks for prompts within each self-supervised ViT for diverse futurescenarios is a costly process. To mitigate this problem, we propose a simpleyet effective method that learns a gate for each ViT block to adjust itsintervention into the prompt tokens. With our method, prompt tokens areselectively influenced by blocks that require steering for task adaptation. Ourmethod outperforms VPT variants in FGVC and VTAB image classification andADE20K semantic segmentation. The code is available at", "output": "Improving Visual Prompt Tuning for Self-supervised Vision Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately measuring discrimination is crucial to faithfully assessingfairness of trained machine learning (ML) models. Any bias in measuringdiscrimination leads to either amplification or underestimation of the existingdisparity. Several sources of bias exist and it is assumed that bias resultingfrom machine learning is born equally by different groups (e.g. females vsmales, whites vs blacks, etc.). If, however, bias is born differently bydifferent groups, it may exacerbate discrimination against specificsub-populations. Sampling bias, is inconsistently used in the literature todescribe bias due to the sampling procedure. In this paper, we attempt todisambiguate this term by introducing clearly defined variants of samplingbias, namely, sample size bias (SSB) and underrepresentation bias (URB). Weshow also how discrimination can be decomposed into variance, bias, and noise.Finally, we challenge the commonly accepted mitigation approach thatdiscrimination can be addressed by collecting more samples of theunderrepresented group.", "output": "Shedding light on underrepresentation and Sampling Bias in machine learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the fundamental challenges found throughout the data sciences is toexplain why things happen in specific ways, or through which mechanisms acertain variable $X$ exerts influences over another variable $Y$. In statisticsand machine learning, significant efforts have been put into developingmachinery to estimate correlations across variables efficiently. In causalinference, a large body of literature is concerned with the decomposition ofcausal effects under the rubric of mediation analysis. However, many variationsare spurious in nature, including different phenomena throughout the appliedsciences. Despite the statistical power to estimate correlations and theidentification power to decompose causal effects, there is still littleunderstanding of the properties of spurious associations and how they can bedecomposed in terms of the underlying causal mechanisms. In this manuscript, wedevelop formal tools for decomposing spurious variations in both Markovian andSemi-Markovian models. We prove the first results that allow a non-parametricdecomposition of spurious effects and provide sufficient conditions for theidentification of such decompositions. The described approach has severalapplications, ranging from explainable and fair AI to questions in epidemiologyand medicine, and we empirically demonstrate its use on a real-world dataset.", "output": "A Causal Framework for Decomposing Spurious Variations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The integration of language models for neural machine translation has beenextensively studied in the past. It has been shown that an external languagemodel, trained on additional target-side monolingual data, can help improvetranslation quality. However, there has always been the assumption that thetranslation model also learns an implicit target-side language model duringtraining, which interferes with the external language model at decoding time.Recently, some works on automatic speech recognition have demonstrated that, ifthe implicit language model is neutralized in decoding, further improvementscan be gained when integrating an external language model. In this work, wetransfer this concept to the task of machine translation and compare with themost prominent way of including additional monolingual data - namelyback-translation. We find that accounting for the implicit language modelsignificantly boosts the performance of language model fusion, although thisapproach is still outperformed by back-translation.", "output": "Improving Language Model Integration for Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we present a method to add perturbations to the codedescriptions, i.e., new inputs in natural language (NL) from well-intentioneddevelopers, in the context of security-oriented code, and analyze how and towhat extent perturbations affect the performance of AI offensive codegenerators. Our experiments show that the performance of the code generators ishighly affected by perturbations in the NL descriptions. To enhance therobustness of the code generators, we use the method to perform dataaugmentation, i.e., to increase the variability and diversity of the trainingdata, proving its effectiveness against both perturbed and non-perturbed codedescriptions.", "output": "Enhancing Robustness of AI Offensive Code Generators via Data Augmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The application of Algorithmic Recourse in decision-making is a promisingfield that offers practical solutions to reverse unfavorable decisions.However, the inability of these methods to consider potential dependenciesamong variables poses a significant challenge due to the assumption of featureindependence. Recent advancements have incorporated knowledge of causaldependencies, thereby enhancing the quality of the recommended recourseactions. Despite these improvements, the inability to incorporate the temporaldimension remains a significant limitation of these approaches. This isparticularly problematic as identifying and addressing the root causes ofundesired outcomes requires understanding time-dependent relationships betweenvariables. In this work, we motivate the need to integrate the temporaldimension into causal algorithmic recourse methods to enhance recommendations'plausibility and reliability. The experimental evaluation highlights thesignificance of the role of time in this field.", "output": "The Importance of Time in Causal Algorithmic Recourse."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Phonetic convergence describes the automatic and unconscious speechadaptation of two interlocutors in a conversation. This paper proposes aSiamese recurrent neural network (RNN) architecture to measure the convergenceof the holistic spectral characteristics of speech sounds in an L2-L2interaction. We extend an alternating reading task (the ART) dataset by adding20 native Slovak L2 English speakers. We train and test the Siamese RNN modelto measure phonetic convergence of L2 English speech from three differentnative language groups: Italian (9 dyads), French (10 dyads) and Slovak (10dyads). Our results indicate that the Siamese RNN model effectively capturesthe dynamics of phonetic convergence and the speaker's imitation ability.Moreover, this text-independent model is scalable and capable of handlingL1-induced speaker variability.", "output": "The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning models have been shown to leak sensitive information abouttheir training datasets. As models are being increasingly used, on devices, toautomate tasks and power new applications, there have been concerns that suchwhite-box access to its parameters, as opposed to the black-box setting whichonly provides query access to the model, increases the attack surface. Directlyextending the shadow modelling technique from the black-box to the white-boxsetting has been shown, in general, not to perform better than black-box onlyattacks. A key reason is misalignment, a known characteristic of deep neuralnetworks. We here present the first systematic analysis of the causes ofmisalignment in shadow models and show the use of a different weightinitialisation to be the main cause of shadow model misalignment. Second, weextend several re-alignment techniques, previously developed in the modelfusion literature, to the shadow modelling context, where the goal is tore-align the layers of a shadow model to those of the target model.We showre-alignment techniques to significantly reduce the measured misalignmentbetween the target and shadow models. Finally, we perform a comprehensiveevaluation of white-box membership inference attacks (MIA). Our analysisreveals that (1) MIAs suffer from misalignment between shadow models, but that(2) re-aligning the shadow models improves, sometimes significantly, MIAperformance. On the CIFAR10 dataset with a false positive rate of 1%,white-box MIA using re-aligned shadow models improves the true positive rate by4.5%.Taken together, our results highlight that on-device deployment increasethe attack surface and that the newly available information can be used by anattacker.", "output": "Re-aligning Shadow Models can Improve White-box Membership Inference Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributed and federated learning algorithms and techniques associatedprimarily with minimization problems. However, with the increase of minimaxoptimization and variational inequality problems in machine learning, thenecessity of designing efficient distributed/federated learning approaches forthese problems is becoming more apparent. In this paper, we provide a unifiedconvergence analysis of communication-efficient local training methods fordistributed variational inequality problems (VIPs). Our approach is based on ageneral key assumption on the stochastic estimates that allows us to proposeand analyze several novel local training algorithms under a single frameworkfor solving a class of structured non-monotone VIPs. We present the first localgradient descent-accent algorithms with provable improved communicationcomplexity for solving distributed variational inequalities on heterogeneousdata. The general algorithmic framework recovers state-of-the-art algorithmsand their sharp convergence guarantees when the setting is specialized tominimization or minimax optimization problems. Finally, we demonstrate thestrong performance of the proposed algorithms compared to state-of-the-artmethods when solving federated minimax optimization problems.", "output": "Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a novel and general loss function, called Symmetric Contrastive(Sy-CON) loss, for effective continual self-supervised learning (CSSL). Wefirst argue that the conventional loss form of continual learning whichconsists of single task-specific loss (for plasticity) and a regularizer (forstability) may not be ideal for contrastive loss based CSSL that focus onrepresentation learning. Our reasoning is that, in contrastive learning basedmethods, the task-specific loss would suffer from decreasing diversity ofnegative samples and the regularizer may hinder learning new distinctiverepresentations. To that end, we propose Sy-CON that consists of two losses(one for plasticity and the other for stability) with symmetric dependence oncurrent and past models' negative sample embeddings. We argue our model cannaturally find good trade-off between the plasticity and stability without anyexplicit hyperparameter tuning. We validate the effectiveness of our approachthrough extensive experiments, demonstrating that MoCo-based implementation ofSy-CON loss achieves superior performance compared to other state-of-the-artCSSL methods.", "output": "Sy-CON: Symmetric Contrastive Loss for Continual Self-Supervised Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graphs are widely used to encapsulate a variety of data formats, butreal-world networks often involve complex node relations beyond only beingpairwise. While hypergraphs and hierarchical graphs have been developed andemployed to account for the complex node relations, they cannot fully representthese complexities in practice. Additionally, though many Graph Neural Networks(GNNs) have been proposed for representation learning on higher-order graphs,they are usually only evaluated on simple graph datasets. Therefore, there is aneed for a unified modelling of higher-order graphs, and a collection ofcomprehensive datasets with an accessible evaluation framework to fullyunderstand the performance of these algorithms on complex graphs. In thispaper, we introduce the concept of hybrid graphs, a unified definition forhigher-order graphs, and present the Hybrid Graph Benchmark (HGB). HGB contains23 real-world hybrid graph datasets across various domains such as biology,social media, and e-commerce. Furthermore, we provide an extensible evaluationframework and a supporting codebase to facilitate the training and evaluationof GNNs on HGB. Our empirical study of existing GNNs on HGB reveals variousresearch opportunities and gaps, including (1) evaluating the actualperformance improvement of hypergraph GNNs over simple graph GNNs; (2)comparing the impact of different sampling strategies on hybrid graph learningmethods; and (3) exploring ways to integrate simple graph and hypergraphinformation. We make our source code and full datasets publicly available at", "output": "Hybrid Graph: A Unified Graph Representation with Datasets and Benchmarks for Complex Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical applications of machine learning (ML) have experienced a surge inpopularity in recent years. The intensive care unit (ICU) is a natural habitatfor ML given the abundance of available data from electronic health records.Models have been proposed to address numerous ICU prediction tasks like theearly detection of complications. While authors frequently reportstate-of-the-art performance, it is challenging to verify claims ofsuperiority. Datasets and code are not always published, and cohortdefinitions, preprocessing pipelines, and training setups are difficult toreproduce. This work introduces Yet Another ICU Benchmark (YAIB), a modularframework that allows researchers to define reproducible and comparableclinical ML experiments; we offer an end-to-end solution from cohort definitionto model evaluation. The framework natively supports most open-access ICUdatasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to futureICU datasets. Combined with a transparent preprocessing pipeline and extensibletraining code for multiple ML and deep learning models, YAIB enables unifiedmodel development. Our benchmark comes with five predefined establishedprediction tasks (mortality, acute kidney injury, sepsis, kidney function, andlength of stay) developed in collaboration with clinicians. Adding furthertasks is straightforward by design. Using YAIB, we demonstrate that the choiceof dataset, cohort definition, and preprocessing have a major impact on theprediction performance - often more so than model class - indicating an urgentneed for YAIB as a holistic benchmarking tool. We provide our work to theclinical ML community to accelerate method development and enable real-worldclinical implementations. Software Repository:", "output": "Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compared to sentence-level systems, document-level neural machine translation(NMT) models produce a more consistent output across a document and are able tobetter resolve ambiguities within the input. There are many works ondocument-level NMT, mostly focusing on modifying the model architecture ortraining strategy to better accommodate the additional context-input. On theother hand, in most works, the question on how to perform search with thetrained model is scarcely discussed, sometimes not mentioned at all. In thiswork, we aim to answer the question how to best utilize a context-awaretranslation model in decoding. We start with the most popular document-levelNMT approach and compare different decoding schemes, some from the literatureand others proposed by us. In the comparison, we are using both, standardautomatic metrics, as well as specific linguistic phenomena on three standarddocument-level translation benchmarks. We find that most commonly used decodingstrategies perform similar to each other and that higher quality contextinformation has the potential to further improve the translation.", "output": "On Search Strategies for Document-Level Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative design is an increasingly important tool in the industrial world.It allows the designers and engineers to easily explore vast ranges of designoptions, providing a cheaper and faster alternative to the trial and failureapproaches. Thanks to the flexibility they offer, Deep Generative Models aregaining popularity amongst Generative Design technologies. However, developingand evaluating these models can be challenging. The field lacks accessiblebenchmarks, in order to evaluate and compare objectively different DeepGenerative Models architectures. Moreover, vanilla Deep Generative Modelsappear to be unable to accurately generate multi-components industrial systemsthat are controlled by latent design constraints. To address these challenges,we propose an industry-inspired use case that incorporates actual industrialsystem characteristics. This use case can be quickly generated and used as abenchmark. We propose a Meta-VAE capable of producing multi-componentindustrial systems and showcase its application on the proposed use case.", "output": "A Meta-Generation framework for Industrial System Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Learning (FL) is a machine learning framework where many clientscollaboratively train models while keeping the training data decentralized.Despite recent advances in FL, the uncertainty quantification topic (UQ)remains partially addressed. Among UQ methods, conformal prediction (CP)approaches provides distribution-free guarantees under minimal assumptions. Wedevelop a new federated conformal prediction method based on quantileregression and take into account privacy constraints. This method takesadvantage of importance weighting to effectively address the label shiftbetween agents and provides theoretical guarantees for both valid coverage ofthe prediction sets and differential privacy. Extensive experimental studiesdemonstrate that this method outperforms current competitors.", "output": "Conformal Prediction for Federated Uncertainty Quantification Under Label Shift."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given the increasing volume and quality of genomics data, extracting newinsights requires interpretable machine-learning models. This work presentsGenomic Interpreter: a novel architecture for genomic assay prediction. Thismodel outperforms the state-of-the-art models for genomic assay predictiontasks. Our model can identify hierarchical dependencies in genomic sites. Thisis achieved through the integration of 1D-Swin, a novel Transformer-based blockdesigned by us for modelling long-range hierarchical data. Evaluated on adataset containing 38,171 DNA segments of 17K base pairs, Genomic Interpreterdemonstrates superior performance in chromatin accessibility and geneexpression prediction and unmasks the underlying `syntax' of gene regulation.", "output": "Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce Mesogeos, a large-scale multi-purpose dataset for wildfiremodeling in the Mediterranean. Mesogeos integrates variables representingwildfire drivers (meteorology, vegetation, human activity) and historicalrecords of wildfire ignitions and burned areas for 17 years (2006-2022). It isdesigned as a cloud-friendly spatio-temporal dataset, namely a datacube,harmonizing all variables in a grid of 1km x 1km x 1-day resolution. Thedatacube structure offers opportunities to assess machine learning (ML) usagein various wildfire modeling tasks. We extract two ML-ready datasets thatestablish distinct tracks to demonstrate this potential: (1) short-termwildfire danger forecasting and (2) final burned area estimation given thepoint of ignition. We define appropriate metrics and baselines to evaluate theperformance of models in each track. By publishing the datacube, along with thecode to create the ML datasets and models, we encourage the community to fosterthe implementation of additional tracks for mitigating the increasing threat ofwildfires in the Mediterranean.", "output": "Mesogeos: A multi-purpose dataset for data-driven wildfire modeling in the Mediterranean."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of optimizing a grey-box objective function, i.e.,nested function composed of both black-box and white-box functions. A generalformulation for such grey-box problems is given, which covers the existinggrey-box optimization formulations as special cases. We then design anoptimism-driven algorithm to solve it. Under certain regularity assumptions,our algorithm achieves similar regret bound as that for the standard black-boxBayesian optimization algorithm, up to a constant multiplicative term dependingon the Lipschitz constants of the functions considered. We further extend ourmethod to the constrained case and discuss several special cases. For thecommonly used kernel functions, the regret bounds allow us to derive aconvergence rate to the optimal solution. Experimental results show that ourgrey-box optimization method empirically improves the speed of finding theglobal optimal solution significantly, as compared to the standard black-boxoptimization algorithm.", "output": "Bayesian Optimization of Expensive Nested Grey-Box Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, sequence learning methods have been applied to the problem ofoff-policy Reinforcement Learning, including the seminal work on DecisionTransformers, which employs transformers for this task. Since transformers areparameter-heavy, cannot benefit from history longer than a fixed window size,and are not computed using recurrence, we set out to investigate thesuitability of the S4 family of models, which are based on state-space layersand have been shown to outperform transformers, especially in modelinglong-range dependencies. In this work we present two main algorithms: (i) anoff-policy training procedure that works with trajectories, while stillmaintaining the training efficiency of the S4 model. (ii) An on-policy trainingprocedure that is trained in a recurrent manner, benefits from long-rangedependencies, and is based on a novel stable actor-critic mechanism. Ourresults indicate that our method outperforms multiple variants of decisiontransformers, as well as the other baseline methods on most tasks, whilereducing the latency, number of parameters, and training time by several ordersof magnitude, making our approach more suitable for real-world RL.", "output": "Decision S4: Efficient Sequence-Based RL via State Spaces Layers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated Machine Learning (FL) has received considerable attention in recentyears. FL benchmarks are predominantly explored in either simulated systems ordata center environments, neglecting the setups of real-world systems, whichare often closely linked to edge computing. We close this research gap byintroducing FLEdge, a benchmark targeting FL workloads in edge computingsystems. We systematically study hardware heterogeneity, energy efficiencyduring training, and the effect of various differential privacy levels ontraining in FL systems. To make this benchmark applicable to real-worldscenarios, we evaluate the impact of client dropouts on state-of-the-art FLstrategies with failure rates as high as 50%. FLEdge provides new insights,such as that training state-of-the-art FL workloads on older GPU-acceleratedembedded devices is up to 3x more energy efficient than on modern server-gradeGPUs.", "output": "FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The state of the art of many learning tasks, e.g., image classification, isadvanced by collecting larger datasets and then training larger models on them.As the outcome, the increasing computational cost is becoming unaffordable. Inthis paper, we investigate how to prune the large-scale datasets, and thusproduce an informative subset for training sophisticated deep models withnegligible performance drop. We propose a simple yet effective dataset pruningmethod by exploring both the prediction uncertainty and training dynamics. Toour knowledge, this is the first work to study dataset pruning on large-scaledatasets, i.e., ImageNet-1K and ImageNet-21K, and advanced models, i.e., SwinTransformer and ConvNeXt. Extensive experimental results indicate that ourmethod outperforms the state of the art and achieves 75% lossless compressionratio on both ImageNet-1K and ImageNet-21K. The code and pruned datasets areavailable at ", "output": "Large-scale Dataset Pruning with Dynamic Uncertainty."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fashionable image generation aims to synthesize images of diverse fashionprevalent around the globe, helping fashion designers in real-timevisualization by giving them a basic customized structure of how a specificdesign preference would look in real life and what further improvements can bemade for enhanced customer satisfaction. Moreover, users can alone interact andgenerate fashionable images by just giving a few simple prompts. Recently,diffusion models have gained popularity as generative models owing to theirflexibility and generation of realistic images from Gaussian noise. Latentdiffusion models are a type of generative model that use diffusion processes tomodel the generation of complex data, such as images, audio, or text. They arecalled \"latent\" because they learn a hidden representation, or latent variable,of the data that captures its underlying structure. We propose a methodexploiting the equivalence between diffusion models and energy-based models(EBMs) and suggesting ways to compose multiple probability distributions. Wedescribe a pipeline on how our method can be used specifically for newfashionable outfit generation and virtual try-on using LLM-guided text-to-imagegeneration. Our results indicate that using an LLM to refine the prompts to thelatent diffusion model assists in generating globally creative and culturallydiversified fashion styles and reducing bias.", "output": "Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Document-level context for neural machine translation (NMT) is crucial toimprove the translation consistency and cohesion, the translation of ambiguousinputs, as well as several other linguistic phenomena. Many works have beenpublished on the topic of document-level NMT, but most restrict the system toonly local context, typically including just the one or two preceding sentencesas additional information. This might be enough to resolve some ambiguousinputs, but it is probably not sufficient to capture some document-levelinformation like the topic or style of a conversation. When increasing thecontext size beyond just the local context, there are two challenges: (i)the~memory usage increases exponentially (ii) the translation performancestarts to degrade. We argue that the widely-used attention mechanism isresponsible for both issues. Therefore, we propose a constrained attentionvariant that focuses the attention on the most relevant parts of the sequence,while simultaneously reducing the memory consumption. For evaluation, weutilize targeted test sets in combination with novel evaluation techniques toanalyze the translations in regards to specific discourse-related phenomena. Wefind that our approach is a good compromise between sentence-level NMT vsattending to the full context, especially in low resource scenarios.", "output": "Improving Long Context Document-Level Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study an infinite-dimensional optimization problem that aims to identifythe Nemytskii operator in the nonlinear part of a prototypical semilinearelliptic partial differential equation (PDE) which minimizes the distancebetween the PDE-solution and a given desired state. In contrast to previousworks, we consider this identification problem in a low-regularity regime inwhich the function inducing the Nemytskii operator is a-priori only known to bean element of $H^1_{loc}(mathbb{R})$. This makes the studied problem class asuitable point of departure for the rigorous analysis of training problems forlearning-informed PDEs in which an unknown superposition operator isapproximated by means of a neural network with nonsmooth activation functions(ReLU, leaky-ReLU, etc.). We establish that, despite the low regularity of thecontrols, it is possible to derive a classical stationarity system for localminimizers and to solve the considered problem by means of a gradientprojection method. The convergence of the resulting algorithm is proven in thefunction space setting. It is also shown that the established first-ordernecessary optimality conditions imply that locally optimal superpositionoperators share various characteristic properties with commonly used activationfunctions: They are always sigmoidal, continuously differentiable away from theorigin, and typically possess a distinct kink at zero. The paper concludes withnumerical experiments which confirm the theoretical findings.", "output": "On the Identification and Optimization of Nonsmooth Superposition Operators in Semilinear Elliptic PDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot meta-learning presents a challenge for gradient descent optimizationdue to the limited number of training samples per task. To address this issue,we propose an episodic memory optimization for meta-learning, we callemph{EMO}, which is inspired by the human ability to recall past learningexperiences from the brain's memory. EMO retains the gradient history of pastexperienced tasks in external memory, enabling few-shot learning in amemory-augmented way. By learning to retain and recall the learning process ofpast training tasks, EMO nudges parameter updates in the right direction, evenwhen the gradients provided by a limited number of examples are uninformative.We prove theoretically that our algorithm converges for smooth, strongly convexobjectives. EMO is generic, flexible, and model-agnostic, making it a simpleplug-and-play optimizer that can be seamlessly embedded into existingoptimization-based few-shot meta-learning approaches. Empirical results showthat EMO scales well with most few-shot classification benchmarks and improvesthe performance of optimization-based meta-learning methods, resulting inaccelerated convergence.", "output": "EMO: Episodic Memory Optimization for Few-Shot Meta-Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have been remarkably successful in data synthesis. Suchsuccesses have also driven diffusion models to apply to sensitive data, such ashuman face data, but this might bring about severe privacy concerns. In thiswork, we systematically present the first privacy study about propertyinference attacks against diffusion models, in which adversaries aim to extractsensitive global properties of the training set from a diffusion model, such asthe proportion of the training data for certain sensitive properties.Specifically, we consider the most practical attack scenario: adversaries areonly allowed to obtain synthetic data. Under this realistic scenario, weevaluate the property inference attacks on different types of samplers anddiffusion models. A broad range of evaluations shows that various diffusionmodels and their samplers are all vulnerable to property inference attacks.Furthermore, one case study on off-the-shelf pre-trained diffusion models alsodemonstrates the effectiveness of the attack in practice. Finally, we propose anew model-agnostic plug-in method PriSampler to mitigate the property inferenceof diffusion models. PriSampler can be directly applied to well-traineddiffusion models and support both stochastic and deterministic sampling.Extensive experiments illustrate the effectiveness of our defense and it makesadversaries infer the proportion of properties as close as random guesses.PriSampler also shows its significantly superior performance to diffusionmodels trained with differential privacy on both model utility and defenseperformance.", "output": "PriSampler: Mitigating Property Inference of Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding the characteristics of neural networks is important butdifficult due to their complex structures and behaviors. Some previous workproposes to transform neural networks into equivalent Boolean expressions andapply verification techniques for characteristics of interest. This approach ispromising since rich results of verification techniques for circuits and otherBoolean expressions can be readily applied. The bottleneck is the timecomplexity of the transformation. More precisely, (i) each neuron of thenetwork, i.e., a linear threshold function, is converted to a Binary DecisionDiagram (BDD), and (ii) they are further combined into some final form, such asBoolean circuits. For a linear threshold function with $n$ variables, anexisting method takes $O(n2^{frac{n}{2}})$ time to construct an ordered BDD ofsize $O(2^{frac{n}{2}})$ consistent with some variable ordering. However, itis non-trivial to choose a variable ordering producing a small BDD among $n!$candidates.We propose a method to convert a linear threshold function to a specific formof a BDD based on the boosting approach in the machine learning literature. Ourmethod takes $O(2^n text{poly}(1/rho))$ time and outputs BDD of size$O(frac{n^2}{rho^4}ln{frac{1}{rho}})$, where $rho$ is the margin of someconsistent linear threshold function. Our method does not need to search forgood variable orderings and produces a smaller expression when the margin ofthe linear threshold function is large. More precisely, our method is based onour new boosting algorithm, which is of independent interest. We also propose amethod to combine them into the final Boolean expression representing theneural network.", "output": "Boosting-based Construction of BDDs for Linear Threshold Functions and Its Application to Verification of Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer-based attack adopts the adversarial examples generated on thesurrogate model to attack various models, making it applicable in the physicalworld and attracting increasing interest. Recently, various adversarial attackshave emerged to boost adversarial transferability from different perspectives.In this work, inspired by the fact that flat local minima are correlated withgood generalization, we assume and empirically validate that adversarialexamples at a flat local region tend to have good transferability byintroducing a penalized gradient norm to the original loss function. Sincedirectly optimizing the gradient regularization norm is computationallyexpensive and intractable for generating adversarial examples, we propose anapproximation optimization method to simplify the gradient update of theobjective function. Specifically, we randomly sample an example and adopt thefirst-order gradient to approximate the second-order Hessian matrix, whichmakes computing more efficient by interpolating two Jacobian matrices.Meanwhile, in order to obtain a more stable gradient direction, we randomlysample multiple examples and average the gradients of these examples to reducethe variance due to random sampling during the iterative process. Extensiveexperimental results on the ImageNet-compatible dataset show that the proposedmethod can generate adversarial examples at flat local regions, andsignificantly improve the adversarial transferability on either normallytrained models or adversarially trained models than the state-of-the-artattacks.", "output": "Boosting Adversarial Transferability by Achieving Flat Local Maxima."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative adversarial networks (GANs) have shown remarkable success in imagesynthesis, making GAN models themselves commercially valuable to legitimatemodel owners. Therefore, it is critical to technically protect the intellectualproperty of GANs. Prior works need to tamper with the training set or trainingprocess, and they are not robust to emerging model extraction attacks. In thispaper, we propose a new ownership protection method based on the commoncharacteristics of a target model and its stolen models. Our method can bedirectly applicable to all well-trained GANs as it does not require retrainingtarget models. Extensive experimental results show that our new method canachieve the best protection performance, compared to the state-of-the-artmethods. Finally, we demonstrate the effectiveness of our method with respectto the number of generations of model extraction attacks, the number ofgenerated samples, different datasets, as well as adaptive attacks.", "output": "Ownership Protection of Generative Adversarial Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Using audio and text embeddings jointly for Keyword Spotting (KWS) has shownhigh-quality results, but the key challenge of how to semantically align twoembeddings for multi-word keywords of different sequence lengths remainslargely unsolved. In this paper, we propose an audio-text-based end-to-endmodel architecture for flexible keyword spotting (KWS), which builds uponlearned audio and text embeddings. Our architecture uses a novel dynamicprogramming-based algorithm, Dynamic Sequence Partitioning (DSP), to optimallypartition the audio sequence into the same length as the word-based textsequence using the monotonic alignment of spoken content. Our proposed modelconsists of an encoder block to get audio and text embeddings, a projectorblock to project individual embeddings to a common latent space, and anaudio-text aligner containing a novel DSP algorithm, which aligns the audio andtext embeddings to determine if the spoken content is the same as the text.Experimental results show that our DSP is more effective than otherpartitioning schemes, and the proposed architecture outperformed thestate-of-the-art results on the public dataset in terms of Area Under the ROCCurve (AUC) and Equal-Error-Rate (EER) by 14.4 % and 28.9%, respectively.", "output": "Matching Latent Encoding for Audio-Text based Keyword Spotting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning head models (MLHMs) are developed to estimate braindeformation for early detection of traumatic brain injury (TBI). However, theoverfitting to simulated impacts and the lack of generalizability caused bydistributional shift of different head impact datasets hinders the broadclinical applications of current MLHMs. We propose brain deformation estimatorsthat integrates unsupervised domain adaptation with a deep neural network topredict whole-brain maximum principal strain (MPS) and MPS rate (MPSR). With12,780 simulated head impacts, we performed unsupervised domain adaptation onon-field head impacts from 302 college football (CF) impacts and 457 mixedmartial arts (MMA) impacts using domain regularized component analysis (DRCA)and cycle-GAN-based methods. The new model improved the MPS/MPSR estimationaccuracy, with the DRCA method significantly outperforming other domainadaptation methods in prediction accuracy (p<0.001): MPS RMSE: 0.027 (CF) and0.037 (MMA); MPSR RMSE: 7.159 (CF) and 13.022 (MMA). On another two hold-outtest sets with 195 college football impacts and 260 boxing impacts, the DRCAmodel significantly outperformed the baseline model without domain adaptationin MPS and MPSR estimation accuracy (p<0.001). The DRCA domain adaptationreduces the MPS/MPSR estimation error to be well below TBI thresholds, enablingaccurate brain deformation estimation to detect TBI in future clinicalapplications.", "output": "Toward more accurate and generalizable brain deformation estimators for traumatic brain injury detection with unsupervised domain adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Variational Autoencoder (VAE) is a seminal approach in deep generativemodeling with latent variables. Interpreting its reconstruction process as anonlinear transformation of samples from the latent posterior distribution, weapply the Unscented Transform (UT) -- a well-known distribution approximationused in the Unscented Kalman Filter (UKF) from the field of filtering. A finiteset of statistics called sigma points, sampled deterministically, provides amore informative and lower-variance posterior representation than theubiquitous noise-scaling of the reparameterization trick, while ensuringhigher-quality reconstruction. We further boost the performance by replacingthe Kullback-Leibler (KL) divergence with the Wasserstein distribution metricthat allows for a sharper posterior. Inspired by the two components, we derivea novel, deterministic-sampling flavor of the VAE, the Unscented Autoencoder(UAE), trained purely with regularization-like terms on the per-sampleposterior. We empirically show competitive performance in Fr'echet InceptionDistance (FID) scores over closely-related models, in addition to a lowertraining variance than the VAE.", "output": "Unscented Autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances and achievements of artificial intelligence (AI) as well asdeep and graph learning models have established their usefulness in biomedicalapplications, especially in drug-drug interactions (DDIs). DDIs refer to achange in the effect of one drug to the presence of another drug in the humanbody, which plays an essential role in drug discovery and clinical research.DDIs prediction through traditional clinical trials and experiments is anexpensive and time-consuming process. To correctly apply the advanced AI anddeep learning, the developer and user meet various challenges such as theavailability and encoding of data resources, and the design of computationalmethods. This review summarizes chemical structure based, network based, NLPbased and hybrid methods, providing an updated and accessible guide to thebroad researchers and development community with different domain knowledge. Weintroduce widely-used molecular representation and describe the theoreticalframeworks of graph neural network models for representing molecularstructures. We present the advantages and disadvantages of deep and graphlearning methods by performing comparative experiments. We discuss thepotential technical challenges and highlight future directions of deep andgraph learning models for accelerating DDIs prediction.", "output": "Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Crystallographic groups describe the symmetries of crystals and otherrepetitive structures encountered in nature and the sciences. These groupsinclude the wallpaper and space groups. We derive linear and nonlinearrepresentations of functions that are (1) smooth and (2) invariant under such agroup. The linear representation generalizes the Fourier basis tocrystallographically invariant basis functions. We show that such a basisexists for each crystallographic group, that it is orthonormal in the relevant$L_2$ space, and recover the standard Fourier basis as a special case for pureshift groups. The nonlinear representation embeds the orbit space of the groupinto a finite-dimensional Euclidean space. We show that such an embeddingexists for every crystallographic group, and that it factors functions througha generalization of a manifold called an orbifold. We describe algorithms that,given a standardized description of the group, compute the Fourier basis and anembedding map. As examples, we construct crystallographically invariant neuralnetworks, kernel machines, and Gaussian processes.", "output": "Representing and Learning Functions Invariant Under Crystallographic Groups."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a wide range of multimodal tasks, contrastive learning has become aparticularly appealing approach since it can successfully learn representationsfrom abundant unlabeled data with only pairing information (e.g., image-captionor video-audio pairs). Underpinning these approaches is the assumption ofmulti-view redundancy - that shared information between modalities is necessaryand sufficient for downstream tasks. However, in many real-world settings,task-relevant information is also contained in modality-unique regions:information that is only present in one modality but still relevant to thetask. How can we learn self-supervised multimodal representations to captureboth shared and unique information relevant to downstream tasks? This paperproposes FactorCL, a new multimodal representation learning method to go beyondmulti-view redundancy. FactorCL is built from three new contributions: (1)factorizing task-relevant information into shared and unique representations,(2) capturing task-relevant information via maximizing MI lower bounds andremoving task-irrelevant information via minimizing MI upper bounds, and (3)multimodal data augmentations to approximate task relevance without labels. Onlarge-scale real-world datasets, FactorCL captures both shared and uniqueinformation and achieves state-of-the-art results on six benchmarks.", "output": "Factorized Contrastive Learning: Going Beyond Multi-view Redundancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advent of large pre-trained models has brought about a paradigm shift inboth visual representation learning and natural language processing. However,clustering unlabeled images, as a fundamental and classic machine learningproblem, still lacks effective solution, particularly for large-scale datasets.In this paper, we propose a novel image clustering pipeline that leverages thepowerful feature representation of large pre-trained models such as CLIP andcluster images effectively and efficiently at scale. We show that thepre-trained features are significantly more structured by further optimizingthe rate reduction objective. The resulting features may significantly improvethe clustering accuracy, e.g., from 57% to 66% on ImageNet-1k. Furthermore,by leveraging CLIP's image-text binding, we show how the new clustering methodleads to a simple yet effective self-labeling algorithm that successfully workson unlabeled large datasets such as MS-COCO and LAION-Aesthetics. We willrelease the code in ", "output": "Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies federated linear contextual bandits under the notion ofuser-level differential privacy (DP). We first introduce a unified federatedbandits framework that can accommodate various definitions of DP in thesequential decision-making setting. We then formally introduce user-levelcentral DP (CDP) and local DP (LDP) in the federated bandits framework, andinvestigate the fundamental trade-offs between the learning regrets and thecorresponding DP guarantees in a federated linear contextual bandits model. ForCDP, we propose a federated algorithm termed as robin and show that it isnear-optimal in terms of the number of clients $M$ and the privacy budget$varepsilon$ by deriving nearly-matching upper and lower regret bounds whenuser-level DP is satisfied. For LDP, we obtain several lower bounds, indicatingthat learning under user-level $(varepsilon,delta)$-LDP must suffer a regretblow-up factor at least {$min{1/varepsilon,M}$ or$min{1/sqrt{varepsilon},sqrt{M}}$} under different conditions.", "output": "Federated Linear Contextual Bandits with User-level Differential Privacy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A heart murmur is an atypical sound produced by the flow of blood through theheart. It can be a sign of a serious heart condition, so detecting heartmurmurs is critical for identifying and managing cardiovascular diseases.However, current methods for identifying murmurous heart sounds do not fullyutilize the valuable insights that can be gained by exploring intrinsicproperties of heart sound signals. To address this issue, this study proposes anew discriminatory set of multiscale features based on the self-similarity andcomplexity properties of heart sounds, as derived in the wavelet domain.Self-similarity is characterized by assessing fractal behaviors, whilecomplexity is explored by calculating wavelet entropy. We evaluated thediagnostic performance of these proposed features for detecting murmurs using aset of standard classifiers. When applied to a publicly available heart sounddataset, our proposed wavelet-based multiscale features achieved comparableperformance to existing methods with fewer features. This suggests thatself-similarity and complexity properties in heart sounds could be potentialbiomarkers for improving the accuracy of murmur detection.", "output": "A Method for Detecting Murmurous Heart Sounds based on Self-similar Properties."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We tackle the task of conditional music generation. We introduce MusicGen, asingle Language Model (LM) that operates over several streams of compresseddiscrete music representation, i.e., tokens. Unlike prior work, MusicGen iscomprised of a single-stage transformer LM together with efficient tokeninterleaving patterns, which eliminates the need for cascading several models,e.g., hierarchically or upsampling. Following this approach, we demonstrate howMusicGen can generate high-quality samples, while being conditioned on textualdescription or melodic features, allowing better controls over the generatedoutput. We conduct extensive empirical evaluation, considering both automaticand human studies, showing the proposed approach is superior to the evaluatedbaselines on a standard text-to-music benchmark. Through ablation studies, weshed light over the importance of each of the components comprising MusicGen.Music samples, code, and models are available at", "output": "Simple and Controllable Music Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recognizing human activities from sensor data is a vital task in variousdomains, but obtaining diverse and labeled sensor data remains challenging andcostly. In this paper, we propose an unsupervised statistical feature-guideddiffusion model for sensor-based human activity recognition. The proposedmethod aims to generate synthetic time-series sensor data without relying onlabeled data, addressing the scarcity and annotation difficulties associatedwith real-world sensor data. By conditioning the diffusion model on statisticalinformation such as mean, standard deviation, Z-score, and skewness, wegenerate diverse and representative synthetic sensor data. We conductedexperiments on public human activity recognition datasets and compared theproposed method to conventional oversampling methods and state-of-the-artgenerative adversarial network methods. The experimental results demonstratethat the proposed method can improve the performance of human activityrecognition and outperform existing techniques.", "output": "Unsupervised Statistical Feature-Guided Diffusion Model for Sensor-based Human Activity Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The decoding of brain neural networks has been an intriguing topic inneuroscience for a well-rounded understanding of different types of braindisorders and cognitive stimuli. Integrating different types of connectivity,e.g., Functional Connectivity (FC) and Structural Connectivity (SC), frommulti-modal imaging techniques can take their complementary information intoaccount and therefore have the potential to get better decoding capability.However, traditional approaches for integrating FC and SC overlook thedynamical variations, which stand a great chance to over-generalize the brainneural network. In this paper, we propose a Joint kernel Graph AttentionNetwork (JGAT), which is a new multi-modal temporal graph attention networkframework. It integrates the data from functional Magnetic Resonance Images(fMRI) and Diffusion Weighted Imaging (DWI) while preserving the dynamicinformation at the same time. We conduct brain-decoding tasks with our JGAT onfour independent datasets: three of 7T fMRI datasets from the Human ConnectomeProject (HCP) and one from animal neural recordings. Furthermore, withAttention Scores (AS) and Frame Scores (FS) computed and learned from themodel, we can locate several informative temporal segments and build meaningfuldynamical pathways along the temporal domain for the HCP datasets. The URL tothe code of JGAT model: ", "output": "JGAT: a joint spatio-temporal graph attention model for brain decoding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work presents a novel and promising approach to the clinical managementof acute stroke. Using machine learning techniques, our research has succeededin developing accurate diagnosis and prediction real-time models fromhemodynamic data. These models are able to diagnose stroke subtype with 30minutes of monitoring, to predict the exitus during the first 3 hours ofmonitoring, and to predict the stroke recurrence in just 15 minutes ofmonitoring. Patients with difficult access to a acrshort{CT} scan, and allpatients that arrive at the stroke unit of a specialized hospital will benefitfrom these positive results. The results obtained from the real-time developedmodels are the following: stroke diagnosis around $98%$ precision ($97.8%$Sensitivity, $99.5%$ Specificity), exitus prediction with $99.8%$ precision($99.8%$ Sens., $99.9%$ Spec.) and $98%$ precision predicting strokerecurrence ($98%$ Sens., $99%$ Spec.).", "output": "Predictive and diagnosis models of stroke from hemodynamic signal monitoring."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concentration of drivers on traffic is a vital safety issue; thus, monitoringa driver being on road becomes an essential requirement. The key purpose ofsupervision is to detect abnormal behaviours of the driver and promptly sendwarnings to him her for avoiding incidents related to traffic accidents. Inthis paper, to meet the requirement, based on radar sensors applications, theauthors first use a small sized millimetre wave radar installed at the steeringwheel of the vehicle to collect signals from different head movements of thedriver. The received signals consist of the reflection patterns that change inresponse to the head movements of the driver. Then, in order to distinguishthese different movements, a classifier based on the measured signal of theradar sensor is designed. However, since the collected data set is not large,in this paper, the authors propose One shot learning to classify four cases ofdriver's head movements. The experimental results indicate that the proposedmethod can classify the four types of cases according to the various headmovements of the driver with a high accuracy reaching up to 100. In addition,the classification performance of the proposed method is significantly betterthan that of the convolutional neural network model.", "output": "One shot learning based drivers head movement identification using a millimetre wave radar sensor."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Excellent tail performance is crucial for modern machine learning tasks, suchas algorithmic fairness, class imbalance, and risk-sensitive decision making,as it ensures the effective handling of challenging samples within a dataset.Tail performance is also a vital determinant of success for personalisedrecommender systems to reduce the risk of losing users with low satisfaction.This study introduces a \"safe\" collaborative filtering method that prioritisesrecommendation quality for less-satisfied users rather than focusing on theaverage performance. Our approach minimises the conditional value at risk(CVaR), which represents the average risk over the tails of users' loss. Toovercome computational challenges for web-scale recommender systems, we developa robust yet practical algorithm that extends the most scalable method,implicit alternating least squares (iALS). Empirical evaluation on real-worlddatasets demonstrates the excellent tail performance of our approach whilemaintaining competitive computational efficiency.", "output": "Safe Collaborative Filtering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we address the problem of Received Signal Strength mapreconstruction based on location-dependent radio measurements and utilizingside knowledge about the local region; for example, city plan, terrain height,gateway position. Depending on the quantity of such prior side information, weemploy Neural Architecture Search to find an optimized Neural Network modelwith the best architecture for each of the supposed settings. We demonstratethat using additional side information enhances the final accuracy of theReceived Signal Strength map reconstruction on three datasets that correspondto three major cities, particularly in sub-areas near the gateways where largervariations of the average received signal power are typically observed.", "output": "Deep Learning with Partially Labeled Data for Radio Map Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stochastic gradient descent (SGD) has become a cornerstone of neural networkoptimization, yet the noise introduced by SGD is often assumed to beuncorrelated over time, despite the ubiquity of epoch-based training. In thiswork, we challenge this assumption and investigate the effects of epoch-basednoise correlations on the stationary distribution of discrete-time SGD withmomentum, limited to a quadratic loss. Our main contributions are twofold:first, we calculate the exact autocorrelation of the noise for training inepochs under the assumption that the noise is independent of small fluctuationsin the weight vector; second, we explore the influence of correlationsintroduced by the epoch-based learning scheme on SGD dynamics. We find that fordirections with a curvature greater than a hyperparameter-dependent crossovervalue, the results for uncorrelated noise are recovered. However, forrelatively flat directions, the weight variance is significantly reduced. Weprovide an intuitive explanation for these results based on a crossover betweencorrelation times, contributing to a deeper understanding of the dynamics ofSGD in the presence of epoch-based noise correlations.", "output": "Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing availability of graph-structured data motivates the task ofoptimising over functions defined on the node set of graphs. Traditional graphsearch algorithms can be applied in this case, but they may besample-inefficient and do not make use of information about the functionvalues; on the other hand, Bayesian optimisation is a class of promisingblack-box solvers with superior sample efficiency, but it has been scarcelybeen applied to such novel setups. To fill this gap, we propose a novelBayesian optimisation framework that optimises over functions defined ongeneric, large-scale and potentially unknown graphs. Through the learning ofsuitable kernels on graphs, our framework has the advantage of adapting to thebehaviour of the target function. The local modelling approach furtherguarantees the efficiency of our method. Extensive experiments on bothsynthetic and real-world graphs demonstrate the effectiveness of the proposedoptimisation framework.", "output": "Bayesian Optimisation of Functions on Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents novel experiments shedding light on the shortcomings ofcurrent metrics for assessing biases of gender discrimination made by machinelearning algorithms on textual data. We focus on the Bios dataset, and ourlearning task is to predict the occupation of individuals, based on theirbiography. Such prediction tasks are common in commercial Natural LanguageProcessing (NLP) applications such as automatic job recommendations. We addressan important limitation of theoretical discussions dealing with group-wisefairness metrics: they focus on large datasets, although the norm in manyindustrial NLP applications is to use small to reasonably large linguisticdatasets for which the main practical constraint is to get a good predictionaccuracy. We then question how reliable are different popular measures of biaswhen the size of the training set is simply sufficient to learn reasonablyaccurate predictions. Our experiments sample the Bios dataset and learn morethan 200 models on different sample sizes. This allows us to statisticallystudy our results and to confirm that common gender bias indices providediverging and sometimes unreliable results when applied to relatively smalltraining and test samples. This highlights the crucial importance of variancecalculations for providing sound results in this field.", "output": "Are fairness metric scores enough to assess discrimination biases in machine learning?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While Deep Reinforcement Learning has been widely researched in medicalimaging, the training and deployment of these models usually require powerfulGPUs. Since imaging environments evolve rapidly and can be generated by edgedevices, the algorithm is required to continually learn and adapt to changingenvironments, and adjust to low-compute devices. To this end, we developedthree image coreset algorithms to compress and denoise medical images forselective experience replayed-based lifelong reinforcement learning. Weimplemented neighborhood averaging coreset, neighborhood sensitivity-basedsampling coreset, and maximum entropy coreset on full-body DIXON water andDIXON fat MRI images. All three coresets produced 27x compression withexcellent performance in localizing five anatomical landmarks: left knee, righttrochanter, left kidney, spleen, and lung across both imaging environments.Maximum entropy coreset obtained the best performance of $11.97pm 12.02$average distance error, compared to the conventional lifelong learningframework's $19.24pm 50.77$.", "output": "A framework for dynamically training and adapting deep reinforcement learning models to different, low-compute, and continuously changing radiology deployment environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the Global Navigation Satellite System (GNSS) context, the growing numberof available satellites has lead to many challenges when it comes to choosingthe most accurate pseudorange contributions, given the strong impact of biasedmeasurements on positioning accuracy, particularly in single-epoch scenarios.This work leverages the potential of machine learning in predicting link-wisemeasurement quality factors and, hence, optimize measurement weighting. Forthis purpose, we use a customized matrix composed of heterogeneous featuressuch as conditional pseudorange residuals and per-link satellite metrics (e.g.,carrier-to-noise power density ratio and its empirical statistics, satelliteelevation, carrier phase lock time). This matrix is then fed as an input to arecurrent neural network (RNN) (i.e., a long-short term memory (LSTM) network).Our experimental results on real data, obtained from extensive fieldmeasurements, demonstrate the high potential of our proposed solution beingable to outperform traditional measurements weighting and selection strategiesfrom state-of-the-art.", "output": "RNN-Based GNSS Positioning using Satellite Measurement Features and Pseudorange Residuals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cardiac digital twins provide a physics and physiology informed framework todeliver predictive and personalized medicine. However, high-fidelitymulti-scale cardiac models remain a barrier to adoption due to their extensivecomputational costs and the high number of model evaluations needed forpatient-specific personalization. Artificial Intelligence-based methods canmake the creation of fast and accurate whole-heart digital twins feasible. Inthis work, we use Latent Neural Ordinary Differential Equations (LNODEs) tolearn the temporal pressure-volume dynamics of a heart failure patient. Oursurrogate model based on LNODEs is trained from 400 3D-0D whole-heartclosed-loop electromechanical simulations while accounting for 43 modelparameters, describing single cell through to whole organ and cardiovascularhemodynamics. The trained LNODEs provides a compact and efficientrepresentation of the 3D-0D model in a latent space by means of a feedforwardfully-connected Artificial Neural Network that retains 3 hidden layers with 13neurons per layer and allows for 300x real-time numerical simulations of thecardiac function on a single processor of a standard laptop. This surrogatemodel is employed to perform global sensitivity analysis and robust parameterestimation with uncertainty quantification in 3 hours of computations, still ona single processor. We match pressure and volume time traces unseen by theLNODEs during the training phase and we calibrate 4 to 11 model parameterswhile also providing their posterior distribution. This paper introduces themost advanced surrogate model of cardiac function available in the literatureand opens new important venues for parameter calibration in cardiac digitaltwins.", "output": "Real-time whole-heart electromechanical simulations using Latent Neural Ordinary Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The introduction of computerized medical records in hospitals has reducedburdensome operations like manual writing and information fetching. However,the data contained in medical records are still far underutilized, primarilybecause extracting them from unstructured textual medical records takes timeand effort. Information Extraction, a subfield of Natural Language Processing,can help clinical practitioners overcome this limitation, using automatedtext-mining pipelines. In this work, we created the first Italianneuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it todevelop a Large Language Model for this task. Moreover, we conducted severalexperiments with three external independent datasets to implement an effectivemulticenter model, with overall F1-score 84.77%, Precision 83.16%, Recall86.44%. The lessons learned are: (i) the crucial role of a consistentannotation process and (ii) a fine-tuning strategy that combines classicalmethods with a \"few-shot\" approach. This allowed us to establish methodologicalguidelines that pave the way for future implementations in this field and allowItalian hospitals to tap into important research opportunities.", "output": "Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper addresses intra-client and inter-client covariate shifts infederated learning (FL) with a focus on the overall generalization performance.To handle covariate shifts, we formulate a new global model training paradigmand propose Federated Importance-Weighted Empirical Risk Minimization (FTW-ERM)along with improving density ratio matching methods without requiring perfectknowledge of the supremum over true ratios. We also propose thecommunication-efficient variant FITW-ERM with the same level of privacyguarantees as those of classical ERM in FL. We theoretically show that FTW-ERMachieves smaller generalization error than classical ERM under certainsettings. Experimental results demonstrate the superiority of FTW-ERM overexisting FL baselines in challenging imbalanced federated settings in terms ofdata distribution shifts across clients.", "output": "Federated Learning under Covariate Shifts with Generalization Guarantees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Crystal property prediction is a crucial aspect of developing novelmaterials. However, there are two technical challenges to be addressed forspeeding up the investigation of crystals. First, labeling crystal propertiesis intrinsically difficult due to the high cost and time involved in physicalsimulations or lab experiments. Second, crystals adhere to a specific quantumchemical principle known as periodic invariance, which is often not captured byexisting machine learning methods. To overcome these challenges, we propose thecrystal-specific pre-training framework for learning crystal representationswith self-supervision. The framework designs a mutex mask strategy forenhancing representation learning so as to alleviate the limited labelsavailable for crystal property prediction. Moreover, we take into account thespecific periodic invariance in crystal structures by developing a periodicinvariance multi-graph module and periodic attribute learning within ourframework. This framework has been tested on eight different tasks. Theexperimental results on these tasks show that the framework achieves promisingprediction performance and is able to outperform recent strong baselines.", "output": "A Crystal-Specific Pre-Training Framework for Crystal Material Property Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-image generative models have enabled high-resolution image synthesisacross different domains, but require users to specify the content they wish togenerate. In this paper, we consider the inverse problem -- given a collectionof different images, can we discover the generative concepts that representeach image? We present an unsupervised approach to discover generative conceptsfrom a collection of images, disentangling different art styles in paintings,objects, and lighting from kitchen scenes, and discovering image classes givenImageNet images. We show how such generative concepts can accurately representthe content of images, be recombined and composed to generate new artistic andhybrid images, and be further used as a representation for downstreamclassification tasks.", "output": "Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There are increasing concerns about malicious attacks on autonomous vehicles.In particular, inaudible voice command attacks pose a significant threat asvoice commands become available in autonomous driving systems. How toempirically defend against these inaudible attacks remains an open question.Previous research investigates utilizing deep learning-based multimodal fusionfor defense, without considering the model uncertainty in trustworthiness. Asdeep learning has been applied to increasingly sensitive tasks, uncertaintymeasurement is crucial in helping improve model robustness, especially inmission-critical scenarios. In this paper, we propose the Multimodal FusionFramework (MFF) as an intelligent security system to defend against inaudiblevoice command attacks. MFF fuses heterogeneous audio-vision modalities usingVGG family neural networks and achieves the detection accuracy of 92.25% in thecomparative fusion method empirical study. Additionally, extensive experimentson audio-vision tasks reveal the model's uncertainty. Using ExpectedCalibration Errors, we measure calibration errors and Monte-Carlo Dropout toestimate the predictive distribution for the proposed models. Our findings showempirically to train robust multimodal models, improve standard accuracy andprovide a further step toward interpretability. Finally, we discuss the prosand cons of our approach and its applicability for Advanced Driver AssistanceSystems.", "output": "Trustworthy Sensor Fusion against Inaudible Command Attacks in Advanced Driver-Assistance System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Subject clustering (i.e., the use of measured features to cluster subjects,such as patients or cells, into multiple groups) is a problem of greatinterest. In recent years, many approaches were proposed, among whichunsupervised deep learning (UDL) has received a great deal of attention. Twointeresting questions are (a) how to combine the strengths of UDL and otherapproaches, and (b) how these approaches compare to one other.We combine Variational Auto-Encoder (VAE), a popular UDL approach, with therecent idea of Influential Feature PCA (IF-PCA), and propose IF-VAE as a newmethod for subject clustering. We study IF-VAE and compare it with severalother methods (including IF-PCA, VAE, Seurat, and SC3) on $10$ gene microarraydata sets and $8$ single-cell RNA-seq data sets. We find that IF-VAEsignificantly improves over VAE, but still underperforms IF-PCA. We also findthat IF-PCA is quite competitive, which slightly outperforms Seurat and SC3over the $8$ single-cell data sets. IF-PCA is conceptually simple and permitsdelicate analysis. We demonstrate that IF-PCA is capable of achieving the phasetransition in a Rare/Weak model. Comparatively, Seurat and SC3 are more complexand theoretically difficult to analyze (for these reasons, their optimalityremains unclear).", "output": "Subject clustering by IF-PCA and several recent methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A two-player symmetric zero-sum game is transitive if for any pure strategies$x$, $y$, $z$, if $x$ is better than $y$, and $y$ is better than $z$, then $x$is better than $z$. It was recently observed that the Elo rating fails atpreserving transitive relations among strategies and therefore cannot correctlyextract the transitive component of a game. Our first contribution is to showthat the Elo rating actually does preserve transitivity when computed in theright space. Precisely, using a suitable invertible mapping $varphi$, we firstapply $varphi$ to the game, then compute Elo ratings, then go back to theoriginal space by applying $varphi^{-1}$. We provide a characterization oftransitive games as a weak variant of ordinal potential games with additivelyseparable potential functions. Leveraging this insight, we introduce theconcept of transitivity order, the minimum number of invertible mappingsrequired to transform the payoff of a transitive game into (differences of) itspotential function. The transitivity order is a tool to classify transitivegames, with Elo games being an example of transitive games of order one. Mostreal-world games have both transitive and non-transitive (cyclic) components,and we use our analysis of transitivity to extract the transitive (potential)component of an arbitrary game. We link transitivity to the known concept ofsign-rank: transitive games have sign-rank two; arbitrary games may have highersign-rank. Using a neural network-based architecture, we learn a decompositionof an arbitrary game into transitive and cyclic components that prioritisescapturing the sign pattern of the game. In particular, a transitive game alwayshas just one component in its decomposition, the potential component. Weprovide a comprehensive evaluation of our methodology using both toy examplesand empirical data from real-world games.", "output": "Ordinal Potential-based Player Rating."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As the most public component of the Supreme Court's decision-making process,oral argument receives an out-sized share of attention in the popular media.Despite its prominence, however, the basic function and operation of oralargument as an institution remains poorly understood, as political scientistsand legal scholars continue to debate even the most fundamental questions aboutits role.Past study of oral argument has tended to focus on discrete, quantifiableattributes of oral argument, such as the number of questions asked to eachadvocate, the party of the Justices' appointing president, or the ideologicalimplications of the case on appeal. Such studies allow broad generalizationsabout oral argument and judicial decision making: Justices tend to vote inaccordance with their ideological preferences, and they tend to ask morequestions when they are skeptical of a party's position. But they tell uslittle about the actual goings on at oral argument -- the running dialogbetween Justice and advocate that is the heart of the institution.This Article fills that void, using machine learning techniques to, for thefirst time, construct predictive models of judicial decision making based noton oral argument's superficial features or on factors external to oralargument, such as where the case falls on a liberal-conservative spectrum, buton the actual content of the oral argument itself -- the Justices' questions toeach side. The resultant models offer an important new window into aspects oforal argument that have long resisted empirical study, including the Justices'individual questioning styles, how each expresses skepticism, and which of theJustices' questions are most central to oral argument dialog.", "output": "A Computational Analysis of Oral Argument in the Supreme Court."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vulnerability identification constitutes a task of high importance for cybersecurity. It is quite helpful for locating and fixing vulnerable functions inlarge applications. However, this task is rather challenging owing to theabsence of reliable and adequately managed datasets and learning models.Existing solutions typically rely on human expertise to annotate datasets orspecify features, which is prone to error. In addition, the learning modelshave a high rate of false positives. To bridge this gap, in this paper, wepresent a properly curated C/C++ source code vulnerability dataset, denoted asCVEFunctionGraphEmbeddings (CVEFGE), to aid in developing models. CVEFGE isautomatically crawled from the CVE database, which contains authentic andpublicly disclosed source code vulnerabilities. We also propose a learningframework based on graph neural networks, denoted SEquential Graph NeuralNetwork (SEGNN) for learning a large number of code semantic representations.SEGNN consists of a sequential learning module, graph convolution, pooling, andfully connected layers. Our evaluations on two datasets and four baselinemethods in a graph classification setting demonstrate state-of-the-art results.", "output": "Sequential Graph Neural Networks for Source Code Vulnerability Identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The definition of anomaly detection is the identification of an unexpectedevent. Real-time detection of extreme events such as wildfires, cyclones, orfloods using satellite data has become crucial for disaster management.Although several earth-observing satellites provide information aboutdisasters, satellites in the geostationary orbit provide data at intervals asfrequent as every minute, effectively creating a video from space. There aremany techniques that have been proposed to identify anomalies in surveillancevideos; however, the available datasets do not have dynamic behavior, so wediscuss an anomaly framework that can work on very high-frequency datasets tofind very fast-moving anomalies. In this work, we present a diffusion modelwhich does not need any motion component to capture the fast-moving anomaliesand outperforms the other baseline methods.", "output": "Anomaly Detection in Satellite Videos using Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a flexible framework that produces high-quality almost-exactmatches for causal inference. Most prior work in matching uses ad-hoc distancemetrics, often leading to poor quality matches, particularly when there areirrelevant covariates. In this work, we learn an interpretable distance metricfor matching, which leads to substantially higher quality matches. The learneddistance metric stretches the covariate space according to each covariate'scontribution to outcome prediction: this stretching means that mismatches onimportant covariates carry a larger penalty than mismatches on irrelevantcovariates. Our ability to learn flexible distance metrics leads to matchesthat are interpretable and useful for the estimation of conditional averagetreatment effects.", "output": "MALTS: Matching After Learning to Stretch."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present a simple yet effective provable method (namedABSGD) for addressing the data imbalance or label noise problem in deeplearning. Our method is a simple modification to momentum SGD where we assignan individual importance weight to each sample in the mini-batch. Theindividual-level weight of sampled data is systematically proportional to theexponential of a scaled loss value of the data, where the scaling factor isinterpreted as the regularization parameter in the framework ofdistributionally robust optimization (DRO). Depending on whether the scalingfactor is positive or negative, ABSGD is guaranteed to converge to a stationarypoint of an information-regularized min-max or min-min DRO problem,respectively. Compared with existing class-level weighting schemes, our methodcan capture the diversity between individual examples within each class.Compared with existing individual-level weighting methods using meta-learningthat require three backward propagations for computing mini-batch stochasticgradients, our method is more efficient with only one backward propagation ateach iteration as in standard deep learning methods. ABSGD is flexible enoughto combine with other robust losses without any additional cost. Our empiricalstudies on several benchmark datasets demonstrate the effectiveness of theproposed method.footnote{Code is availableat:url{", "output": "Attentional-Biased Stochastic Gradient Descent."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many complex multi-agent systems such as robot swarms control and autonomousvehicle coordination can be modeled as Multi-Agent Reinforcement Learning(MARL) tasks. QMIX, a widely popular MARL algorithm, has been used as abaseline for the benchmark environments, e.g., Starcraft Multi-Agent Challenge(SMAC), Difficulty-Enhanced Predator-Prey (DEPP). Recent variants of QMIXtarget relaxing the monotonicity constraint of QMIX, allowing for performanceimprovement in SMAC. In this paper, we investigate the code-level optimizationsof these variants and the monotonicity constraint. (1) We find that suchimprovements of the variants are significantly affected by various code-leveloptimizations. (2) The experiment results show that QMIX with normalizedoptimizations outperforms other works in SMAC; (3) beyond the common wisdomfrom these works, the monotonicity constraint can improve sample efficiency inSMAC and DEPP. We also discuss why monotonicity constraints work well in purelycooperative tasks with a theoretical analysis. We open-source the code aturl{", "output": "Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cyclic block coordinate methods are a fundamental class of optimizationmethods widely used in practice and implemented as part of standard softwarepackages for statistical learning. Nevertheless, their convergence is generallynot well understood and so far their good practical performance has not beenexplained by existing convergence analyses. In this work, we introduce a newblock coordinate method that applies to the general class of variationalinequality (VI) problems with monotone operators. This class includes compositeconvex optimization problems and convex-concave min-max optimization problemsas special cases and has not been addressed by the existing work. The resultingconvergence bounds match the optimal convergence bounds of full gradientmethods, but are provided in terms of a novel gradient Lipschitz conditionw.r.t.~a Mahalanobis norm. For $m$ coordinate blocks, the resulting gradientLipschitz constant in our bounds is never larger than a factor $sqrt{m}$compared to the traditional Euclidean Lipschitz constant, while it is possiblefor it to be much smaller. Further, for the case when the operator in the VIhas finite-sum structure, we propose a variance reduced variant of our methodwhich further decreases the per-iteration cost and has better convergence ratesin certain regimes. To obtain these results, we use a gradient extrapolationstrategy that allows us to view a cyclic collection of block coordinate-wisegradients as one implicit gradient.", "output": "Cyclic Coordinate Dual Averaging with Extrapolation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the intense attention and considerable investment into clinicalmachine learning research, relatively few applications have been deployed at alarge-scale in a real-world clinical environment. While research is importantin advancing the state-of-the-art, translation is equally important in bringingthese techniques and technologies into a position to ultimately impacthealthcare. We believe a lack of appreciation for several considerations are amajor cause for this discrepancy between expectation and reality. To bettercharacterize a holistic perspective among researchers and practitioners, wesurvey several practitioners with commercial experience in developing CML forclinical deployment. Using these insights, we identify several main categoriesof challenges in order to better design and develop clinical machine learningapplications.", "output": "Deploying clinical machine learning? Consider the following...."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Linear dynamical systems that obey stochastic differential equations arecanonical models. While optimal control of known systems has a rich literature,the problem is technically hard under model uncertainty and there are hardlyany results. We initiate study of this problem and aim to learn (andsimultaneously deploy) optimal actions for minimizing a quadratic costfunction. Indeed, this work is the first that comprehensively addresses thecrucial challenge of balancing exploration versus exploitation incontinuous-time systems. We present online policies that learn optimal actionsfast by carefully randomizing the parameter estimates, and establish theirperformance guarantees: a regret bound that grows with square-root of timemultiplied by the number of parameters. Implementation of the policy for aflight-control task demonstrates its efficacy. Further, we prove sharpstability results for inexact system dynamics and tightly specify theinfinitesimal regret caused by sub-optimal actions. To obtain the results, weconduct a novel eigenvalue-sensitivity analysis for matrix perturbation,establish upper-bounds for comparative ratios of stochastic integrals, andintroduce the new method of policy differentiation. Our analysis sheds light onfundamental challenges in continuous-time reinforcement learning and suggests auseful cornerstone for similar problems.", "output": "Reinforcement Learning Policies in Continuous-Time Linear Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Policy gradient algorithms have been widely applied to Markov decisionprocesses and reinforcement learning problems in recent years. Regularizationwith various entropy functions is often used to encourage exploration andimprove stability. This paper proposes an approximate Newton method for thepolicy gradient algorithm with entropy regularization. In the case of Shannonentropy, the resulting algorithm reproduces the natural policy gradientalgorithm. For other entropy functions, this method results in brand-new policygradient algorithms. We prove that all these algorithms enjoy Newton-typequadratic convergence and that the corresponding gradient flow convergesglobally to the optimal solution. We use synthetic and industrial-scaleexamples to demonstrate that the proposed approximate Newton method typicallyconverges in single-digit iterations, often orders of magnitude faster thanother state-of-the-art algorithms.", "output": "Approximate Newton policy gradient algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite recent success on various tasks, deep learning techniques stillperform poorly on adversarial examples with small perturbations. Whileoptimization-based methods for adversarial attacks are well-explored in thefield of computer vision, it is impractical to directly apply them in naturallanguage processing due to the discrete nature of the text. To address theproblem, we propose a unified framework to extend the existingoptimization-based adversarial attack methods in the vision domain to crafttextual adversarial samples. In this framework, continuously optimizedperturbations are added to the embedding layer and amplified in the forwardpropagation process. Then the final perturbed latent representations aredecoded with a masked language model head to obtain potential adversarialsamples. In this paper, we instantiate our framework with an attack algorithmnamed Textual Projected Gradient Descent (T-PGD). We find our algorithmeffective even using proxy gradient information. Therefore, we perform the morechallenging transfer black-box attack and conduct comprehensive experiments toevaluate our attack algorithm with several models on three benchmark datasets.Experimental results demonstrate that our method achieves overall betterperformance and produces more fluent and grammatical adversarial samplescompared to strong baseline methods. The code and data are available aturl{", "output": "Bridge the Gap Between CV and NLP! A Gradient-based Textual Adversarial Attack Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider mixtures of $kgeq 2$ Gaussian components with unknown means andunknown covariance (identical for all components) that are well-separated,i.e., distinct components have statistical overlap at most $k^{-C}$ for a largeenough constant $Cge 1$. Previous statistical-query [DKS17] and lattice-based[BRST21, GVV22] lower bounds give formal evidence that even distinguishing suchmixtures from (pure) Gaussians may be exponentially hard (in $k$).We show that this kind of hardness can only appear if mixing weights areallowed to be exponentially small, and that for polynomially lower boundedmixing weights non-trivial algorithmic guarantees are possible inquasi-polynomial time. Concretely, we develop an algorithm based on thesum-of-squares method with running time quasi-polynomial in the minimum mixingweight. The algorithm can reliably distinguish between a mixture of $kge 2$well-separated Gaussian components and a (pure) Gaussian distribution. As acertificate, the algorithm computes a bipartition of the input sample thatseparates a pair of mixture components, i.e., both sides of the bipartitioncontain most of the sample points of at least one component.For the special case of colinear means, our algorithm outputs a$k$-clustering of the input sample that is approximately consistent with thecomponents of the mixture. We obtain similar clustering guarantees also for thecase that the overlap between any two mixture components is lower boundedquasi-polynomially in $k$ (in addition to being upper bounded polynomially in$k$).A key technical ingredient is a characterization of separating directions forwell-separated Gaussian components in terms of ratios of polynomials thatcorrespond to moments of two carefully chosen orders logarithmic in the minimummixing weight.", "output": "Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Decision-makers often simultaneously face many related but heterogeneouslearning problems. For instance, a large retailer may wish to learn productdemand at different stores to solve pricing or inventory problems, making itdesirable to learn jointly for stores serving similar customers; alternatively,a hospital network may wish to learn patient risk at different providers toallocate personalized interventions, making it desirable to learn jointly forhospitals serving similar patient populations. Motivated by real datasets, westudy a natural setting where the unknown parameter in each learning instancecan be decomposed into a shared global parameter plus a sparseinstance-specific term. We propose a novel two-stage multitask learningestimator that exploits this structure in a sample-efficient way, using aunique combination of robust statistics (to learn across similar instances) andLASSO regression (to debias the results). Our estimator yields improved samplecomplexity bounds in the feature dimension $d$ relative to commonly-employedestimators; this improvement is exponential for \"data-poor\" instances, whichbenefit the most from multitask learning. We illustrate the utility of theseresults for online learning by embedding our multitask estimator withinsimultaneous contextual bandit algorithms. We specify a dynamic calibration ofour estimator to appropriately balance the bias-variance tradeoff over time,improving the resulting regret bounds in the context dimension $d$. Finally, weillustrate the value of our approach on synthetic and real datasets.", "output": "Multitask Learning and Bandits via Robust Statistics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Value function factorization via centralized training and decentralizedexecution is promising for solving cooperative multi-agent reinforcement tasks.One of the approaches in this area, QMIX, has become state-of-the-art andachieved the best performance on the StarCraft II micromanagement benchmark.However, the monotonic-mixing of per agent estimates in QMIX is known torestrict the joint action Q-values it can represent, as well as theinsufficient global state information for single agent value functionestimation, often resulting in suboptimality. To this end, we present LSF-SAC,a novel framework that features a variational inference-basedinformation-sharing mechanism as extra state information to assist individualagents in the value function factorization. We demonstrate that such latentindividual state information sharing can significantly expand the power ofvalue function factorization, while fully decentralized execution can still bemaintained in LSF-SAC through a soft-actor-critic design. We evaluate LSF-SACon the StarCraft II micromanagement challenge and demonstrate that itoutperforms several state-of-the-art methods in challenging collaborativetasks. We further set extensive ablation studies for locating the key factorsaccounting for its performance improvements. We believe that this new insightcan lead to new local value estimation methods and variational deep learningalgorithms. A demo video and code of implementation can be found at", "output": "Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies have revealed that GNNs are vulnerable to adversarial attacks.Most existing robust graph learning methods measure model robustness based onlabel information, rendering them infeasible when label information is notavailable. A straightforward direction is to employ the widely used Infomaxtechnique from typical Unsupervised Graph Representation Learning (UGRL) tolearn robust unsupervised representations. Nonetheless, directly transplantingthe Infomax technique from typical UGRL to robust UGRL may involve a biasedassumption. In light of the limitation of Infomax, we propose a novel unbiasedrobust UGRL method called Robust Graph Information Bottleneck (RGIB), which isgrounded in the Information Bottleneck (IB) principle. Our RGIB attempts tolearn robust node representations against adversarial perturbations bypreserving the original information in the benign graph while eliminating theadversarial information in the adversarial graph. There are mainly twochallenges to optimize RGIB: 1) high complexity of adversarial attack toperturb node features and graph structure jointly in the training procedure; 2)mutual information estimation upon adversarially attacked graphs. To tacklethese problems, we further propose an efficient adversarial training strategywith only feature perturbations and an effective mutual information estimatorwith subgraph-level summary. Moreover, we theoretically establish a connectionbetween our proposed RGIB and the robustness of downstream classifiers,revealing that RGIB can provide a lower bound on the adversarial risk ofdownstream classifiers. Extensive experiments over several benchmarks anddownstream tasks demonstrate the effectiveness and superiority of our proposedmethod.", "output": "Toward Enhanced Robustness in Unsupervised Graph Representation Learning: A Graph Information Bottleneck Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans and animals can learn new skills after practicing for a few hours,while current reinforcement learning algorithms require a large amount of datato achieve good performances. Recent model-based approaches show promisingresults by reducing the number of necessary interactions with the environmentto learn a desirable policy. However, these methods require biologicalimplausible ingredients, such as the detailed storage of older experiences, andlong periods of offline learning. The optimal way to learn and exploitword-models is still an open question. Taking inspiration from biology, wesuggest that dreaming might be an efficient expedient to use an inner model. Wepropose a two-module (agent and model) spiking neural network in which\"dreaming\" (living new experiences in a model-based simulated environment)significantly boosts learning. We also explore \"planning\", an onlinealternative to dreaming, that shows comparable performances. Importantly, ourmodel does not require the detailed storage of experiences, and learns onlinethe world-model and the policy. Moreover, we stress that our network iscomposed of spiking neurons, further increasing the biological plausibility andimplementability in neuromorphic hardware.", "output": "Towards biologically plausible Dreaming and Planning in recurrent spiking networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper we show that feedforward neural networks corresponding toarbitrary directed acyclic graphs undergo transition to linearity as their\"width\" approaches infinity. The width of these general networks ischaracterized by the minimum in-degree of their neurons, except for the inputand first layers. Our results identify the mathematical structure underlyingtransition to linearity and generalize a number of recent works aimed atcharacterizing transition to linearity or constancy of the Neural TangentKernel for standard architectures.", "output": "Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While neural networks can be approximated by linear models as their widthincreases, certain properties of wide neural networks cannot be captured bylinear models. In this work we show that recently proposed Neural QuadraticModels can exhibit the \"catapult phase\" [Lewkowycz et al. 2020] that ariseswhen training such models with large learning rates. We then empirically showthat the behaviour of neural quadratic models parallels that of neural networksin generalization, especially in the catapult phase regime. Our analysisfurther demonstrates that quadratic models can be an effective tool foranalysis of neural networks.", "output": "Quadratic models for understanding neural network dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiringembeddings without expert labelling, a capability that carries profoundimplications for molecular graphs due to the staggering number of potentialmolecules and the high cost of obtaining labels. However, GSSL methods aredesigned not for optimisation within a specific domain but rather fortransferability across a variety of downstream tasks. This broad applicabilitycomplicates their evaluation. Addressing this challenge, we present \"MolecularGraph Representation Evaluation\" (MOLGRAPHEVAL), generating detailed profilesof molecular graph embeddings with interpretable and diversified attributes.MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i)generic graph, (ii) molecular substructure, and (iii) embedding spaceproperties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methodsagainst both current downstream datasets and our suite of tasks, we uncoversignificant inconsistencies between inferences drawn solely from existingdatasets and those derived from more nuanced probing. These findings suggestthat current evaluation methodologies fail to capture the entirety of thelandscape.", "output": "Evaluating Self-Supervised Learning for Molecular Graph Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the ranking of individuals, teams, or objects, based on pairwisecomparisons between them, using the Bradley-Terry model. Estimates of rankingswithin this model are commonly made using a simple iterative algorithm firstintroduced by Zermelo almost a century ago. Here we describe an alternative andsimilarly simple iteration that provably returns identical results but does somuch faster -- over a hundred times faster in some cases. We demonstrate thisalgorithm with applications to a range of example data sets and derive a numberof results regarding its convergence.", "output": "Efficient computation of rankings from pairwise comparisons."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the promising results achieved, state-of-the-art interactivereinforcement learning schemes rely on passively receiving supervision signalsfrom advisor experts, in the form of either continuous monitoring orpre-defined rules, which inevitably result in a cumbersome and expensivelearning process. In this paper, we introduce a novel initiativeadvisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces theunilateral advisor-guidance mechanism with a bidirectional learner-initiativeone, and thereby enables a customized and efficacious message exchange betweenlearner and advisor. At the heart of Ask-AC are two complementary components,namely action requester and adaptive state selector, that can be readilyincorporated into various discrete actor-critic architectures. The formercomponent allows the agent to initiatively seek advisor intervention in thepresence of uncertain states, while the latter identifies the unstable statespotentially missed by the former especially when environment changes, and thenlearns to promote the ask action on such states. Experimental results on bothstationary and non-stationary environments and across different actor-criticbackbones demonstrate that the proposed framework significantly improves thelearning efficiency of the agent, and achieves the performances on par withthose obtained by continuous advisor monitoring.", "output": "Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of physics-informed neural networks (PINNs) inapproximating partial differential equations (PDEs), PINNs can sometimes failto converge to the correct solution in problems involving complicated PDEs.This is reflected in several recent studies on characterizing the \"failuremodes\" of PINNs, although a thorough understanding of the connection betweenPINN failure modes and sampling strategies is missing. In this paper, weprovide a novel perspective of failure modes of PINNs by hypothesizing thattraining PINNs relies on successful \"propagation\" of solution from initialand/or boundary condition points to interior points. We show that PINNs withpoor sampling strategies can get stuck at trivial solutions if there arepropagation failures, characterized by highly imbalanced PDE residual fields.To mitigate propagation failures, we propose a novel Retain-Resample-Releasesampling (R3) algorithm that can incrementally accumulate collocation points inregions of high PDE residuals with little to no computational overhead. Weprovide an extension of R3 sampling to respect the principle of causality whilesolving time-dependent PDEs. We theoretically analyze the behavior of R3sampling and empirically demonstrate its efficacy and efficiency in comparisonwith baselines on a variety of PDE problems.", "output": "Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The proliferation of automated data collection schemes and the advances insensorics are increasing the amount of data we are able to monitor inreal-time. However, given the high annotation costs and the time required byquality inspections, data is often available in an unlabeled form. This isfostering the use of active learning for the development of soft sensors andpredictive models. In production, instead of performing random inspections toobtain product information, labels are collected by evaluating the informationcontent of the unlabeled data. Several query strategy frameworks for regressionhave been proposed in the literature but most of the focus has been dedicatedto the static pool-based scenario. In this work, we propose a new strategy forthe stream-based scenario, where instances are sequentially offered to thelearner, which must instantaneously decide whether to perform the quality checkto obtain the label or discard the instance. The approach is inspired by theoptimal experimental design theory and the iterative aspect of thedecision-making process is tackled by setting a threshold on theinformativeness of the unlabeled data points. The proposed approach isevaluated using numerical simulations and the Tennessee Eastman Processsimulator. The results confirm that selecting the examples suggested by theproposed algorithm allows for a faster reduction in the prediction error.", "output": "Stream-based active learning with linear models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In healthcare, detecting stress and enabling individuals to monitor theirmental health and wellbeing is challenging. Advancements in wearable technologynow enable continuous physiological data collection. This data can provideinsights into mental health and behavioural states through psychophysiologicalanalysis. However, automated analysis is required to provide timely results dueto the quantity of data collected. Machine learning has shown efficacy inproviding an automated classification of physiological data for healthapplications in controlled laboratory environments. Ambulatory uncontrolledenvironments, however, provide additional challenges requiring furthermodelling to overcome. This work empirically assesses several approachesutilising machine learning classifiers to detect stress using physiologicaldata recorded in an ambulatory setting with self-reported stress annotations. Asubset of the training portion SMILE dataset enables the evaluation ofapproaches before submission. The optimal stress detection approach achieves90.77% classification accuracy, 91.24 F1-Score, 90.42 Sensitivity and 91.08Specificity, utilising an ExtraTrees classifier and feature imputation methods.Meanwhile, accuracy on the challenge data is much lower at 59.23% (submission#54 from BEaTS-MTU, username ZacDair). The cause of the performance disparityis explored in this work.", "output": "Classification of Stress via Ambulatory ECG and GSR Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertical federated learning (VFL), a variant of Federated Learning (FL), hasrecently drawn increasing attention as the VFL matches the enterprises' demandsof leveraging more valuable features to achieve better model performance.However, conventional VFL methods may run into data deficiency as they exploitonly aligned and labeled samples (belonging to different parties), leavingoften the majority of unaligned and unlabeled samples unused. The datadeficiency hampers the effort of the federation.In this work, we propose a Federated Hybrid Self-Supervised Learningframework, named FedHSSL, that utilizes cross-party views (i.e., dispersedfeatures) of samples aligned among parties and local views (i.e., augmentation)of unaligned samples within each party to improve the representation learningcapability of the VFL joint model. FedHSSL further exploits invariant featuresacross parties to boost the performance of the joint model through partialmodel aggregation. FedHSSL, as a framework, can work with variousrepresentative SSL methods. We empirically demonstrate that FedHSSL methodsoutperform baselines by large margins. We provide an in-depth analysis ofFedHSSL regarding label leakage, which is rarely investigated in existingself-supervised VFL works. The experimental results show that, with properprotection, FedHSSL achieves the best privacy-utility trade-off against thestate-of-the-art label inference attack compared with baselines. Code isavailable at url{", "output": "A Hybrid Self-Supervised Learning Framework for Vertical Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work on mini-batch consistency (MBC) for set functions has broughtattention to the need for sequentially processing and aggregating chunks of apartitioned set while guaranteeing the same output for all partitions. However,existing constraints on MBC architectures lead to models with limitedexpressive power. Additionally, prior work has not addressed how to deal withlarge sets during training when the full set gradient is required. To addressthese issues, we propose a Universally MBC (UMBC) class of set functions whichcan be used in conjunction with arbitrary non-MBC components while stillsatisfying MBC, enabling a wider range of function classes to be used in MBCsettings. Furthermore, we propose an efficient MBC training algorithm whichgives an unbiased approximation of the full set gradient and has a constantmemory overhead for any set size for both train- and test-time. We conductextensive experiments including image completion, text classification,unsupervised clustering, and cancer detection on high-resolution images toverify the efficiency and efficacy of our scalable set encoding framework. Ourcode is available at github.com/jeffwillette/umbc", "output": "Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The COVID-19 pandemic has posed a heavy burden to the healthcare systemworldwide and caused huge social disruption and economic loss. Many deeplearning models have been proposed to conduct clinical predictive tasks such asmortality prediction for COVID-19 patients in intensive care units usingElectronic Health Record (EHR) data. Despite their initial success in certainclinical applications, there is currently a lack of benchmarking results toachieve a fair comparison so that we can select the optimal model for clinicaluse. Furthermore, there is a discrepancy between the formulation of traditionalprediction tasks and real-world clinical practice in intensive care. To fillthese gaps, we propose two clinical prediction tasks, Outcome-specificlength-of-stay prediction and Early mortality prediction for COVID-19 patientsin intensive care units. The two tasks are adapted from the naivelength-of-stay and mortality prediction tasks to accommodate the clinicalpractice for COVID-19 patients. We propose fair, detailed, open-sourcedata-preprocessing pipelines and evaluate 17 state-of-the-art predictive modelson two tasks, including 5 machine learning models, 6 basic deep learning modelsand 6 deep learning predictive models specifically designed for EHR data. Weprovide benchmarking results using data from two real-world COVID-19 EHRdatasets. One dataset is publicly available without needing any inquiry andanother dataset can be accessed on request. We provide fair, reproduciblebenchmarking results for two tasks. We deploy all experiment results and modelson an online platform. We also allow clinicians and researchers to upload theirdata to the platform and get quick prediction results using our trained models.We hope our efforts can further facilitate deep learning and machine learningresearch for COVID-19 predictive modeling.", "output": "A Comprehensive Benchmark for COVID-19 Predictive Modeling Using Electronic Health Records in Intensive Care."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning is becoming ubiquitous. From finance to medicine, machinelearning models are boosting decision-making processes and even outperforminghumans in some tasks. This huge progress in terms of prediction quality doesnot however find a counterpart in the security of such models and correspondingpredictions, where perturbations of fractions of the training set (poisoning)can seriously undermine the model accuracy. Research on poisoning attacks anddefenses received increasing attention in the last decade, leading to severalpromising solutions aiming to increase the robustness of machine learning.Among them, ensemble-based defenses, where different models are trained onportions of the training set and their predictions are then aggregated, providestrong theoretical guarantees at the price of a linear overhead. Surprisingly,ensemble-based defenses, which do not pose any restrictions on the base model,have not been applied to increase the robustness of random forest models. Thework in this paper aims to fill in this gap by designing and implementing anovel hash-based ensemble approach that protects random forest againstuntargeted, random poisoning attacks. An extensive experimental evaluationmeasures the performance of our approach against a variety of attacks, as wellas its sustainability in terms of resource consumption and performance, andcompares it with a traditional monolithic model based on random forest. A finaldiscussion presents our main findings and compares our approach with existingpoisoning defenses targeting random forests.", "output": "On the Robustness of Random Forest Against Untargeted Data Poisoning: An Ensemble-Based Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Saddle points constitute a crucial challenge for first-order gradient descentalgorithms. In notions of classical machine learning, they are avoided forexample by means of stochastic gradient descent methods. In this work, weprovide evidence that the saddle points problem can be naturally avoided invariational quantum algorithms by exploiting the presence of stochasticity. Weprove convergence guarantees and present practical examples in numericalsimulations and on quantum hardware. We argue that the natural stochasticity ofvariational algorithms can be beneficial for avoiding strict saddle points,i.e., those saddle points with at least one negative Hessian eigenvalue. Thisinsight that some levels of shot noise could help is expected to add a newperspective to notions of near-term variational quantum algorithms.", "output": "Stochastic noise can be helpful for variational quantum algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fair representation learning (FRL) is a popular class of methods aiming toproduce fair classifiers via data preprocessing. Recent regulatory directivesstress the need for FRL methods that provide practical certificates, i.e.,provable upper bounds on the unfairness of any downstream classifier trained onpreprocessed data, which directly provides assurance in a practical scenario.Creating such FRL methods is an important challenge that remains unsolved. Inthis work, we address that challenge and introduce FARE (Fairness withRestricted Encoders), the first FRL method with practical fairnesscertificates. FARE is based on our key insight that restricting therepresentation space of the encoder enables the derivation of practicalguarantees, while still permitting favorable accuracy-fairness tradeoffs forsuitable instantiations, such as one we propose based on fair trees. To producea practical certificate, we develop and apply a statistical procedure thatcomputes a finite sample high-confidence upper bound on the unfairness of anydownstream classifier trained on FARE embeddings. In our comprehensiveexperimental evaluation, we demonstrate that FARE produces practicalcertificates that are tight and often even comparable with purely empiricalresults obtained by prior methods, which establishes the practical value of ourapproach.", "output": "FARE: Provably Fair Representation Learning with Practical Certificates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mixture-of-Expert (MoE) models have obtained state-of-the-art performance inNeural Machine Translation (NMT) tasks. Existing works in MoE mostly consider ahomogeneous design where the same number of experts of the same size are placeduniformly throughout the network. Furthermore, existing MoE works do notconsider computational constraints (e.g., FLOPs, latency) to guide theirdesign. To this end, we develop AutoMoE -- a framework for designingheterogeneous MoE's under computational constraints. AutoMoE leverages NeuralArchitecture Search (NAS) to obtain efficient sparse MoE sub-transformers with4x inference speedup (CPU) and FLOPs reduction over manually designedTransformers, with parity in BLEU score over dense Transformer and within 1BLEU point of MoE SwitchTransformer, on aggregate over benchmark datasets forNMT. Heterogeneous search space with dense and sparsely activated Transformermodules (e.g., how many experts? where to place them? what should be theirsizes?) allows for adaptive compute -- where different amounts of computationsare used for different tokens in the input. Adaptivity comes naturally fromrouting decisions which send tokens to experts of different sizes. AutoMoEcode, data, and trained models are available at ", "output": "AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We have recently witnessed a number of impressive results on hardmathematical reasoning problems with language models. At the same time, therobustness of these models has also been called into question; recent workshave shown that models can rely on shallow patterns in the problem descriptionwhen generating a solution. Building on the idea of behavioral testing, wepropose a novel framework, which pins down the causal effect of various factorsin the input, e.g., the surface form of the problem text, the operands, andmath operators on the output solution. By grounding the behavioral analysis ina causal graph describing an intuitive reasoning process, we study the behaviorof language models in terms of robustness and sensitivity to directinterventions in the input space. We apply our framework on a test bed of mathword problems. Our analysis shows that robustness does not appear tocontinuously improve as a function of size, but the GPT-3 Davinci models (175B)achieve a dramatic improvement in both robustness and sensitivity compared toall other GPT variants.", "output": "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multi-agent learning has gained increasing attention to tackle distributedmachine learning scenarios under constrictions of data exchanging. However,existing multi-agent learning models usually consider data fusion under fixedand compulsory collaborative relations among agents, which is not as flexibleand autonomous as human collaboration. To fill this gap, we propose adistributed multi-agent learning model inspired by human collaboration, inwhich the agents can autonomously detect suitable collaborators and refer tocollaborators' model for better performance. To implement such adaptivecollaboration, we use a collaboration graph to indicate the pairwisecollaborative relation. The collaboration graph can be obtained by graphlearning techniques based on model similarity between different agents. Sincemodel similarity can not be formulated by a fixed graphical optimization, wedesign a graph learning network by unrolling, which can learn underlyingsimilar features among potential collaborators. By testing on both regressionand classification tasks, we validate that our proposed collaboration model canfigure out accurate collaborative relationship and greatly improve agents'learning performance.", "output": "Unrolled Graph Learning for Multi-Agent Collaboration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised visual representation methods are closing the gap withsupervised learning performance. These methods rely on maximizing thesimilarity between embeddings of related synthetic inputs created through dataaugmentations. This can be seen as a task that encourages embeddings to leaveout factors modified by these augmentations, i.e. to be invariant to them.However, this only considers one side of the trade-off in the choice of theaugmentations: they need to strongly modify the images to avoid simple solutionshortcut learning (e.g. using only color histograms), but on the other hand,augmentations-related information may be lacking in the representations forsome downstream tasks (e.g. color is important for birds and flowerclassification). Few recent works proposed to mitigate the problem of usingonly an invariance task by exploring some form of equivariance toaugmentations. This has been performed by learning additional embeddingsspace(s), where some augmentation(s) cause embeddings to differ, yet in anon-controlled way. In this work, we introduce EquiMod a generic equivariancemodule that structures the learned latent space, in the sense that our modulelearns to predict the displacement in the embedding space caused by theaugmentations. We show that applying that module to state-of-the-art invariancemodels, such as SimCLR and BYOL, increases the performances on CIFAR10 andImageNet datasets. Moreover, while our model could collapse to a trivialequivariance, i.e. invariance, we observe that it instead automatically learnsto keep some augmentations-related information beneficial to therepresentations.", "output": "EquiMod: An Equivariance Module to Improve Self-Supervised Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identifying change points (CPs) in a time series is crucial to guide betterdecision making across various fields like finance and healthcare andfacilitating timely responses to potential risks or opportunities. ExistingChange Point Detection (CPD) methods have a limitation in tracking changes inthe joint distribution of multidimensional features. In addition, they fail togeneralize effectively within the same time series as different types of CPsmay require different detection methods. As the volume of multidimensional timeseries continues to grow, capturing various types of complex CPs such aschanges in the correlation structure of the time-series features has becomeessential. To overcome the limitations of existing methods, we propose TiVaCPD,an approach that uses a Time-Varying Graphical Lasso (TVGL) to identify changesin correlation patterns between multidimensional features over time, andcombines that with an aggregate Kernel Maximum Mean Discrepancy (MMD) test toidentify changes in the underlying statistical distributions of dynamic timewindows with varying length. The MMD and TVGL scores are combined using a novelensemble method based on similarity measures leveraging the power of bothstatistical tests. We evaluate the performance of TiVaCPD in identifying andcharacterizing various types of CPs and show that our method outperformscurrent state-of-the-art methods in real-world CPD datasets. We furtherdemonstrate that TiVaCPD scores characterize the type of CPs and facilitateinterpretation of change dynamics, offering insights into real-lifeapplications.", "output": "Dynamic Interpretable Change Point Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classical artificial neural networks have witnessed widespread successes inmachine-learning applications. Here, we propose fermion neural networks (FNNs)whose physical properties, such as local density of states or conditionalconductance, serve as outputs, once the inputs are incorporated as an initiallayer. Comparable to back-propagation, we establish an efficient optimization,which entitles FNNs to competitive performance on challenging machine-learningbenchmarks. FNNs also directly apply to quantum systems, including hard oneswith interactions, and offer in-situ analysis without preprocessing orpresumption. Following machine learning, FNNs precisely determine topologicalphases and emergent charge orders. Their quantum nature also brings variousadvantages: quantum correlation entitles more general network connectivity andinsight into the vanishing gradient problem, quantum entanglement opens upnovel avenues for interpretable machine learning, etc.", "output": "A fermion neural network with efficient optimization and quantum applicability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the demand for pervasive smart services and applications hasincreased rapidly. Device-free human detection through sensors or cameras hasbeen widely adopted, but it comes with privacy issues as well as misdetectionfor motionless people. To address these drawbacks, channel state information(CSI) captured from commercialized Wi-Fi devices provides rich signal featuresfor accurate detection. However, existing systems suffer from inaccurateclassification under a non-line-of-sight (NLoS) and stationary scenario, suchas when a person is standing still in a room corner. In this work, we propose asystem called CRONOS (Colorization and Contrastive Learning Enhanced NLoS HumanPresence Detection), which generates dynamic recurrence plots (RPs) andcolor-coded CSI ratios to distinguish mobile and stationary people from vacancyin a room, respectively. We also incorporate supervised contrastive learning toretrieve substantial representations, where consultation loss is formulated todifferentiate the representative distances between dynamic and stationarycases. Furthermore, we propose a self-switched static feature enhancedclassifier (S3FEC) to determine the utilization of either RPs or color-codedCSI ratios. Our comprehensive experimental results show that CRONOS outperformsexisting systems that either apply machine learning or non-learning basedmethods, as well as non-CSI based features in open literature. CRONOS achievesthe highest human presence detection accuracy in vacancy, mobility,line-of-sight (LoS), and NLoS scenarios.", "output": "CRONOS: Colorization and Contrastive Learning for Device-Free NLoS Human Presence Detection using Wi-Fi CSI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Privacy noise may negate the benefits of using adaptive optimizers indifferentially private model training. Prior works typically address this issueby using auxiliary information (e.g., public data) to boost the effectivenessof adaptive optimization. In this work, we explore techniques to estimate andefficiently adapt to gradient geometry in private adaptive optimization withoutauxiliary data. Motivated by the observation that adaptive methods can toleratestale preconditioners, we propose differentially private adaptive training withdelayed preconditioners (DP^2), a simple method that constructs delayed butless noisy preconditioners to better realize the benefits of adaptivity.Theoretically, we provide convergence guarantees for our method for both convexand non-convex problems, and analyze trade-offs between delay and privacy noisereduction. Empirically, we explore DP^2 across several real-world datasets,demonstrating that it can improve convergence speed by as much as 4x relativeto non-adaptive baselines and match the performance of state-of-the-artoptimization methods that require auxiliary data.", "output": "Differentially Private Adaptive Optimization with Delayed Preconditioners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Which parts of a dataset will a given model find difficult? Recent work hasshown that SGD-trained models have a bias towards simplicity, leading them toprioritize learning a majority class, or to rely upon harmful spuriouscorrelations. Here, we show that the preference for \"easy\" runs far deeper: Amodel may prioritize any class or group of the dataset that it finds simple-atthe expense of what it finds complex-as measured by performance difference onthe test set. When subsets with different levels of complexity align withdemographic groups, we term this difficulty disparity, a phenomenon that occurseven with balanced datasets that lack group/label associations. We show howdifficulty disparity is a model-dependent quantity, and is further amplified incommonly-used models as selected by typical average performance scores. Wequantify an amplification factor across a range of settings in order to comparedisparity of different models on a fixed dataset. Finally, we present tworeal-world examples of difficulty amplification in action, resulting inworse-than-expected performance disparities between groups even when using abalanced dataset. The existence of such disparities in balanced datasetsdemonstrates that merely balancing sample sizes of groups is not sufficient toensure unbiased performance. We hope this work presents a step towardsmeasurable understanding of the role of model bias as it interacts with thestructure of data, and call for additional model-dependent mitigation methodsto be deployed alongside dataset audits.", "output": "Simplicity Bias Leads to Amplified Performance Disparities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, indoor human presence detection based on supervised learning(SL) and channel state information (CSI) has attracted much attention. However,existing studies that rely on spatial information of CSI are susceptible toenvironmental changes which degrade prediction accuracy. Moreover, SL-basedmethods require time-consuming data labeling for retraining models. Therefore,it is imperative to design a continuously monitored model using asemi-supervised learning (SSL) based scheme. In this paper, we conceive abifold teacher-student (BTS) learning approach for indoor human presencedetection in an adjoining two-room scenario. The proposed SSL-based primal-dualteacher-student network intelligently learns spatial and temporal features fromlabeled and unlabeled CSI datasets. Additionally, the enhanced penalized lossfunction leverages entropy and distance measures to distinguish drifted data,i.e., features of new datasets affected by time-varying effects and alteredfrom the original distribution. Experimental results demonstrate that theproposed BTS system sustains asymptotic accuracy after retraining the modelwith unlabeled data. Furthermore, BTS outperforms existing SSL-based models interms of the highest detection accuracy while achieving the asymptoticperformance of SL-based methods.", "output": "BTS: Bifold Teacher-Student in Semi-Supervised Learning for Indoor Two-Room Presence Detection Under Time-Varying CSI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Feature selection helps reduce data acquisition costs in ML, but the standardapproach is to train models with static feature subsets. Here, we consider thedynamic feature selection (DFS) problem where a model sequentially queriesfeatures based on the presently available information. DFS is often addressedwith reinforcement learning, but we explore a simpler approach of greedilyselecting features based on their conditional mutual information. This methodis theoretically appealing but requires oracle access to the data distribution,so we develop a learning approach based on amortized optimization. The proposedmethod is shown to recover the greedy policy when trained to optimality, and itoutperforms numerous existing feature selection methods in our experiments,thus validating it as a simple but powerful approach for this problem.", "output": "Learning to Maximize Mutual Information for Dynamic Feature Selection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning (DL) applied to breast tissue segmentation in magneticresonance imaging (MRI) has received increased attention in the last decade,however, the domain shift which arises from different vendors, acquisitionprotocols, and biological heterogeneity, remains an important but challengingobstacle on the path towards clinical implementation. In this paper, we proposea novel Multi-level Semantic-guided Contrastive Domain Adaptation (MSCDA)framework to address this issue in an unsupervised manner. Our approachincorporates self-training with contrastive learning to align featurerepresentations between domains. In particular, we extend the contrastive lossby incorporating pixel-to-pixel, pixel-to-centroid, and centroid-to-centroidcontrasts to better exploit the underlying semantic information of the image atdifferent levels. To resolve the data imbalance problem, we utilize acategory-wise cross-domain sampling strategy to sample anchors from targetimages and build a hybrid memory bank to store samples from source images. Wehave validated MSCDA with a challenging task of cross-domain breast MRIsegmentation between datasets of healthy volunteers and invasive breast cancerpatients. Extensive experiments show that MSCDA effectively improves themodel's feature alignment capabilities between domains, outperformingstate-of-the-art methods. Furthermore, the framework is shown to belabel-efficient, achieving good performance with a smaller source dataset. Thecode is publicly available at url{", "output": "MSCDA: Multi-level Semantic-guided Contrast Improves Unsupervised Domain Adaptation for Breast MRI Segmentation in Small Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face morphing attacks seek to deceive a Face Recognition (FR) system bypresenting a morphed image consisting of the biometric qualities from twodifferent identities with the aim of triggering a false acceptance with one ofthe two identities, thereby presenting a significant threat to biometricsystems. The success of a morphing attack is dependent on the ability of themorphed image to represent the biometric characteristics of both identitiesthat were used to create the image. We present a novel morphing attack thatuses a Diffusion-based architecture to improve the visual fidelity of the imageand the ability of the morphing attack to represent characteristics from bothidentities. We demonstrate the effectiveness of the proposed attack byevaluating its visual fidelity via the Frechet Inception Distance (FID). Also,extensive experiments are conducted to measure the vulnerability of FR systemsto the proposed attack. The ability of a morphing attack detector to detect theproposed attack is measured and compared against two state-of-the-art GAN-basedmorphing attacks along with two Landmark-based attacks. Additionally, a novelmetric to measure the relative strength between different morphing attacks isintroduced and evaluated.", "output": "Leveraging Diffusion For Strong and High Quality Face Morphing Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nonhomogeneous partial differential equations (PDEs) are an applicable modelin soft sensor modeling for describing spatiotemporal industrial systems withunmeasurable source terms, which cannot be well solved by existingphysics-informed neural networks (PINNs). To this end, a coupled PINN (CPINN)with a recurrent prediction (RP) learning strategy (CPINN-RP) is proposed forsoft sensor modeling in spatiotemporal industrial processes, such as vibrationdisplacement. First, CPINN containing NetU and NetG is proposed. NetU is usedto approximate the solutions to PDEs under study and NetG is used to regularizethe training of NetU. The two networks are integrated into adata-physics-hybrid loss function. Then, we theoretically prove that theproposed CPINN has a satisfying approximation capacity to the PDEs solutions.Besides the theoretical aspects, we propose a hierarchical training strategy tooptimize and couple the two networks to achieve the parameters of CPINN.Secondly, NetU-RP is achieved by NetU compensated by RP, the recurrentlydelayed output of CPINN, to further improve the soft sensor performance.Finally, simulations and experiment verify the effectiveness and practicalapplications of CPINN-RP.", "output": "Solving PDEs with Unmeasurable Source Terms Using Coupled Physics-Informed Neural Network with Recurrent Prediction in Soft Sensor Modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning has been wildly successful in practice and moststate-of-the-art machine learning methods are based on neural networks.Lacking, however, is a rigorous mathematical theory that adequately explainsthe amazing performance of deep neural networks. In this article, we present arelatively new mathematical framework that provides the beginning of a deeperunderstanding of deep learning. This framework precisely characterizes thefunctional properties of neural networks that are trained to fit to data. Thekey mathematical tools which support this framework include transform-domainsparse regularization, the Radon transform of computed tomography, andapproximation theory, which are all techniques deeply rooted in signalprocessing. This framework explains the effect of weight decay regularizationin neural network training, the use of skip connections and low-rank weightmatrices in network architectures, the role of sparsity in neural networks, andexplains why neural networks can perform well in high-dimensional problems.", "output": "Deep Learning Meets Sparse Regularization: A Signal Processing Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the causal bandit problem when the causal graph is unknown anddevelop an efficient algorithm for finding the parent node of the reward nodeusing atomic interventions. We derive the exact equation for the expectednumber of interventions performed by the algorithm and show that under certaingraphical conditions it could perform either logarithmically fast or, undermore general assumptions, slower but still sublinearly in the number ofvariables. We formally show that our algorithm is optimal as it meets theuniversal lower bound we establish for any algorithm that performs atomicinterventions. Finally, we extend our algorithm to the case when the rewardnode has multiple parents. Using this algorithm together with a standardalgorithm from bandit literature leads to improved regret bounds.", "output": "Causal Bandits without Graph Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The optimized certainty equivalent (OCE) is a family of risk measures thatcover important examples such as entropic risk, conditional value-at-risk andmean-variance models. In this paper, we propose a new episodic risk-sensitivereinforcement learning formulation based on tabular Markov decision processeswith recursive OCEs. We design an efficient learning algorithm for this problembased on value iteration and upper confidence bound. We derive an upper boundon the regret of the proposed algorithm, and also establish a minimax lowerbound. Our bounds show that the regret rate achieved by our proposed algorithmhas optimal dependence on the number of episodes and the number of actions.", "output": "Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the recent advances in technology, a wide range of systems continue tocollect a large amount of data over time and thus generate time series.Time-Series Anomaly Detection (TSAD) is an important task in varioustime-series applications such as e-commerce, cybersecurity, vehiclemaintenance, and healthcare monitoring. However, this task is very challengingas it requires considering both the intra-variable dependency and theinter-variable dependency, where a variable can be defined as an observation intime series data. Recent graph-based approaches have made impressive progressin tackling the challenges of this field. In this survey, we conduct acomprehensive and up-to-date review of Graph-based TSAD (G-TSAD). First, weexplore the significant potential of graph representation learning fortime-series data. Then, we review state-of-the-art graph anomaly detectiontechniques in the context of time series and discuss their strengths anddrawbacks. Finally, we discuss the technical challenges and potential futuredirections for possible improvements in this research field.", "output": "Graph-based Time-Series Anomaly Detection: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many industrial applications, obtaining labeled observations is notstraightforward as it often requires the intervention of human experts or theuse of expensive testing equipment. In these circumstances, active learning canbe highly beneficial in suggesting the most informative data points to be usedwhen fitting a model. Reducing the number of observations needed for modeldevelopment alleviates both the computational burden required for training andthe operational expenses related to labeling. Online active learning, inparticular, is useful in high-volume production processes where the decisionabout the acquisition of the label for a data point needs to be taken within anextremely short time frame. However, despite the recent efforts to developonline active learning strategies, the behavior of these methods in thepresence of outliers has not been thoroughly examined. In this work, weinvestigate the performance of online active linear regression in contaminateddata streams. Our study shows that the currently available query strategies areprone to sample outliers, whose inclusion in the training set eventuallydegrades the predictive performance of the models. To address this issue, wepropose a solution that bounds the search area of a conditional D-optimalalgorithm and uses a robust estimator. Our approach strikes a balance betweenexploring unseen regions of the input space and protecting against outliers.Through numerical simulations, we show that the proposed method is effective inimproving the performance of online active learning in the presence ofoutliers, thus expanding the potential applications of this powerful tool.", "output": "Robust online active learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In digital marketing, experimenting with new website content is one of thekey levers to improve customer engagement. However, creating successfulmarketing content is a manual and time-consuming process that lacks clearguiding principles. This paper seeks to close the loop between content creationand online experimentation by offering marketers AI-driven actionable insightsbased on historical data to improve their creative process. We present aneural-network-based system that scores and extracts insights from a marketingcontent design, namely, a multimodal neural network predicts the attractivenessof marketing contents, and a post-hoc attribution method generates actionableinsights for marketers to improve their content in specific marketinglocations. Our insights not only point out the advantages and drawbacks of agiven current content, but also provide design recommendations based onhistorical data. We show that our scoring model and insights work well bothquantitatively and qualitatively.", "output": "Neural Insights for Digital Marketing Content Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the most critical problems in machine learning is HyperParameterOptimization (HPO), since choice of hyperparameters has a significant impact onfinal model performance. Although there are many HPO algorithms, they eitherhave no theoretical guarantees or require strong assumptions. To this end, weintroduce BLiE -- a Lipschitz-bandit-based algorithm for HPO that only assumesLipschitz continuity of the objective function. BLiE exploits the landscape ofthe objective function to adaptively search over the hyperparameter space.Theoretically, we show that $(i)$ BLiE finds an $epsilon$-optimalhyperparameter with $mathcal{O} left( epsilon^{-(d_z + beta)}right)$ totalbudgets, where $d_z$ and $beta$ are problem intrinsic; $(ii)$ BLiE is highlyparallelizable. Empirically, we demonstrate that BLiE outperforms thestate-of-the-art HPO algorithms on benchmark tasks. We also apply BLiE tosearch for noise schedule of diffusion models. Comparison with the defaultschedule shows that BLiE schedule greatly improves the sampling speed.", "output": "A Lipschitz Bandits Approach for Continuous Hyperparameter Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider minimizing functions for which it is expensive to compute the(possibly stochastic) gradient. Such functions are prevalent in reinforcementlearning, imitation learning and adversarial training. Our target optimizationframework uses the (expensive) gradient computation to construct surrogatefunctions in a emph{target space} (e.g. the logits output by a linear modelfor classification) that can be minimized efficiently. This allows for multipleparameter updates to the model, amortizing the cost of gradient computation. Inthe full-batch setting, we prove that our surrogate is a global upper-bound onthe loss, and can be (locally) minimized using a black-box optimizationalgorithm. We prove that the resulting majorization-minimization algorithmensures convergence to a stationary point of the loss. Next, we instantiate ourframework in the stochastic setting and propose the $SSO$ algorithm, which canbe viewed as projected stochastic gradient descent in the target space. Thisconnection enables us to prove theoretical guarantees for $SSO$ when minimizingconvex functions. Our framework allows the use of standard stochasticoptimization algorithms to construct surrogates which can be minimized by anydeterministic optimization method. To evaluate our framework, we consider asuite of supervised learning and imitation learning problems. Our experimentsindicate the benefits of target optimization and the effectiveness of $SSO$.", "output": "Target-based Surrogates for Stochastic Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compositional reinforcement learning is a promising approach for trainingpolicies to perform complex long-horizon tasks. Typically, a high-level task isdecomposed into a sequence of subtasks and a separate policy is trained toperform each subtask. In this paper, we focus on the problem of trainingsubtask policies in a way that they can be used to perform any task; here, atask is given by a sequence of subtasks. We aim to maximize the worst-caseperformance over all tasks as opposed to the average-case performance. Weformulate the problem as a two agent zero-sum game in which the adversary picksthe sequence of subtasks. We propose two RL algorithms to solve this game: oneis an adaptation of existing multi-agent RL algorithms to our setting and theother is an asynchronous version which enables parallel training of subtaskpolicies. We evaluate our approach on two multi-task environments withcontinuous states and actions and demonstrate that our algorithms outperformstate-of-the-art baselines.", "output": "Robust Subtask Learning for Compositional Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-centric computer vision (HCCV) data curation practices often neglectprivacy and bias concerns, leading to dataset retractions and unfair models.Further, HCCV datasets constructed through nonconsensual web scraping lack thenecessary metadata for comprehensive fairness and robustness evaluations.Current remedies address issues post hoc, lack persuasive justification foradoption, or fail to provide proper contextualization for appropriateapplication. Our research focuses on proactive, domain-specific recommendationsfor curating HCCV datasets, addressing privacy and bias. We adopt an ante hocreflective perspective and draw from current practices and guidelines, guidedby the ethical framework of principlism.", "output": "Principlism Guided Responsible Data Curation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the grand challenges of cell biology is inferring the gene regulatorynetwork (GRN) which describes interactions between genes and their productsthat control gene expression and cellular function. We can treat this as acausal discovery problem but with two non-standard challenges: (1) regulatorynetworks are inherently cyclic so we should not model a GRN as a directedacyclic graph (DAG), and (2) observations have significant measurement noise,so for typical sample sizes there will always be a large equivalence class ofgraphs that are likely given the data, and we want methods that capture thisuncertainty. Existing methods either focus on challenge (1), identifying cyclicstructure from dynamics, or on challenge (2) learning complex Bayesianposteriors over DAGs, but not both. In this paper we leverage the fact that itis possible to estimate the \"velocity\" of gene expression with RNA velocitytechniques to develop an approach that addresses both challenges. Because wehave access to velocity information, we can treat the Bayesian structurelearning problem as a problem of sparse identification of a dynamical system,capturing cyclic feedback loops through time. Since our objective is to modeluncertainty over discrete structures, we leverage Generative Flow Networks(GFlowNets) to estimate the posterior distribution over the combinatorial spaceof possible sparse dependencies. Our results indicate that our method learnsposteriors that better encapsulate the distributions of cyclic structurescompared to counterpart state-of-the-art Bayesian structure learningapproaches.", "output": "DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with GFlowNets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electron and scanning probe microscopy produce vast amounts of data in theform of images or hyperspectral data, such as EELS or 4D STEM, that containinformation on a wide range of structural, physical, and chemical properties ofmaterials. To extract valuable insights from these data, it is crucial toidentify physically separate regions in the data, such as phases, ferroicvariants, and boundaries between them. In order to derive an easilyinterpretable feature analysis, combining with well-defined boundaries in aprincipled and unsupervised manner, here we present a physics augmented machinelearning method which combines the capability of Variational Autoencoders todisentangle factors of variability within the data and the physics driven lossfunction that seeks to minimize the total length of the discontinuities inimages corresponding to latent representations. Our method is applied tovarious materials, including NiO-LSMO, BiFeO3, and graphene. The resultsdemonstrate the effectiveness of our approach in extracting meaningfulinformation from large volumes of imaging data. The fully notebook containingimplementation of the code and analysis workflow is available at", "output": "Combining Variational Autoencoders and Physical Bias for Improved Microscopy Data Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have achieved great success in image synthesis throughiterative noise estimation using deep neural networks. However, the slowinference, high memory consumption, and computation intensity of the noiseestimation model hinder the efficient adoption of diffusion models. Althoughpost-training quantization (PTQ) is considered a go-to compression method forother tasks, it does not work out-of-the-box on diffusion models. We propose anovel PTQ method specifically tailored towards the unique multi-timesteppipeline and model architecture of the diffusion models, which compresses thenoise estimation network to accelerate the generation process. We identify thekey difficulty of diffusion model quantization as the changing outputdistributions of noise estimation networks over multiple time steps and thebimodal activation distribution of the shortcut layers within the noiseestimation network. We tackle these challenges with timestep-aware calibrationand split shortcut quantization in this work. Experimental results show thatour proposed method is able to quantize full-precision unconditional diffusionmodels into 4-bit while maintaining comparable performance (small FID change ofat most 2.34 compared to >100 for traditional PTQ) in a training-free manner.Our approach can also be applied to text-guided image generation, where we canrun stable diffusion in 4-bit weights with high generation quality for thefirst time.", "output": "Q-Diffusion: Quantizing Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimization problems involving mixed variables, i.e., variables of numericaland categorical nature, can be challenging to solve, especially in the presenceof complex constraints. Moreover, when the objective function is the result ofa complicated simulation or experiment, it may be expensive to evaluate. Thispaper proposes a novel surrogate-based global optimization algorithm to solvelinearly constrained mixed-variable problems up to medium-large size (around100 variables after encoding and 20 constraints) based on constructing apiecewise affine surrogate of the objective function over feasible samples. Weintroduce two types of exploration functions to efficiently search the feasibledomain via mixed-integer linear programming solvers. We also provide apreference-based version of the algorithm, which can be used when only pairwisecomparisons between samples can be acquired while the underlying objectivefunction to minimize remains unquantified. The two algorithms are tested onmixed-variable benchmark problems with and without constraints. The resultsshow that, within a small number of acquisitions, the proposed algorithms canoften achieve better or comparable results than other existing methods.", "output": "Global and Preference-based Optimization with Mixed Variables using Piecewise Affine Surrogates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For infinite action contextual bandits, smoothed regret and reduction toregression results in state-of-the-art online performance with computationalcost independent of the action set: unfortunately, the resulting data exhaustdoes not have well-defined importance-weights. This frustrates the execution ofdownstream data science processes such as offline model selection. In thispaper we describe an online algorithm with an equivalent smoothed regretguarantee, but which generates well-defined importance weights: in exchange,the online computational cost increases, but only to order smoothness (i.e.,still independent of the action set). This removes a key obstacle to adoptionof smoothed regret in production scenarios.", "output": "Infinite Action Contextual Bandits with Reusable Data Exhaust."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The imbalance problem is widespread in the field of machine learning, whichalso exists in multimodal learning areas caused by the intrinsic discrepancybetween modalities of samples. Recent works have attempted to solve themodality imbalance problem from algorithm perspective, however, they do notfully analyze the influence of modality bias in datasets. Concretely, existingmultimodal datasets are usually collected under specific tasks, where onemodality tends to perform better than other ones in most conditions. In thiswork, to comprehensively explore the influence of modality bias, we first splitexisting datasets into different subsets by estimating sample-wise modalitydiscrepancy. We surprisingly find that: the multimodal models with existingimbalance algorithms consistently perform worse than the unimodal one onspecific subsets, in accordance with the modality bias. To further explore theinfluence of modality bias and analyze the effectiveness of existing imbalancealgorithms, we build a balanced audiovisual dataset, with uniformly distributedmodality discrepancy over the whole dataset. We then conduct extensiveexperiments to re-evaluate existing imbalance algorithms and draw someinteresting findings: existing algorithms only provide a compromise betweenmodalities and suffer from the large modality discrepancy of samples. We hopethat these findings could facilitate future research on the modality imbalanceproblem.", "output": "Balanced Audiovisual Dataset for Imbalance Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since their introduction, diffusion models have quickly become the prevailingapproach to generative modeling in many domains. They can be interpreted aslearning the gradients of a time-varying sequence of log-probability densityfunctions. This interpretation has motivated classifier-based andclassifier-free guidance as methods for post-hoc control of diffusion models.In this work, we build upon these ideas using the score-based interpretation ofdiffusion models, and explore alternative ways to condition, modify, and reusediffusion models for tasks involving compositional generation and guidance. Inparticular, we investigate why certain types of composition fail using currenttechniques and present a number of solutions. We conclude that the sampler (notthe model) is responsible for this failure and propose new samplers, inspiredby MCMC, which enable successful compositional generation. Further, we proposean energy-based parameterization of diffusion models which enables the use ofnew compositional operators and more sophisticated, Metropolis-correctedsamplers. Intriguingly we find these samplers lead to notable improvements incompositional generation across a wide set of problems such asclassifier-guided ImageNet modeling and compositional text-to-image generation.", "output": "Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Latent variable models are widely used to perform unsupervised segmentationof time series in different context such as robotics, speech recognition, andeconomics. One of the most widely used latent variable model is theAuto-Regressive Hidden Markov Model (ARHMM), which combines a latent modegoverned by a Markov chain dynamics with a linear Auto-Regressive dynamics ofthe observed state.In this work, we propose two generalizations of the ARHMM. First, we proposea more general AR dynamics in Cartesian space, described as a linearcombination of non-linear basis functions. Second, we propose a linear dynamicsin unit quaternion space, in order to properly describe orientations. Theseextensions allow to describe more complex dynamics of the observed state.Although this extension is proposed for the ARHMM, it can be easily extendedto other latent variable models with AR dynamics in the observed space, such asAuto-Regressive Hidden semi-Markov Models.", "output": "Generalization of Auto-Regressive Hidden Markov Models to Non-Linear Dynamics and Unit Quaternion Observation Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lack of explainability is a key factor limiting the practical adoption ofhigh-performant Deep Reinforcement Learning (DRL) controllers. Explainable RLfor networking hitherto used salient input features to interpret a controller'sbehavior. However, these feature-based solutions do not completely explain thecontroller's decision-making process. Often, operators are interested inunderstanding the impact of a controller's actions on performance in thefuture, which feature-based solutions cannot capture.In this paper, we present CrystalBox, a framework that explains acontroller's behavior in terms of the future impact on key network performancemetrics. CrystalBox employs a novel learning-based approach to generatesuccinct and expressive explanations. We use reward components of the DRLnetwork controller, which are key performance metrics meaningful to operators,as the basis for explanations. CrystalBox is generalizable and can work acrossboth discrete and continuous control environments without any changes to thecontroller or the DRL workflow. Using adaptive bitrate streaming and congestioncontrol, we demonstrate CrytalBox's ability to generate high-fidelityfuture-based explanations. We additionally present three practical use cases ofCrystalBox: cross-state explainability, guided reward design, and networkobservability.", "output": "CrystalBox: Future-Based Explanations for DRL Network Controllers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When interacting with people, AI agents do not just influence the state ofthe world -- they also influence the actions people take in response to theagent, and even their underlying intentions and strategies. Accounting for andleveraging this influence has mostly been studied in settings where it issufficient to assume that human behavior is near-optimal: competitive games, orgeneral-sum settings like autonomous driving alongside human drivers. Instead,we focus on influence in settings where there is a need to capture humansuboptimality. For instance, imagine a collaborative task in which, due eitherto cognitive biases or lack of information, people do not perform very well --how could an agent influence them towards more optimal behavior? Assumingnear-optimal human behavior will not work here, and so the agent needs to learnfrom real human data. But experimenting online with humans is potentiallyunsafe, and creating a high-fidelity simulator of the environment is oftenimpractical. Hence, we focus on learning from an offline dataset of human-humaninteractions. Our observation is that offline reinforcement learning (RL) canlearn to effectively influence suboptimal humans by extending and combiningelements of observed human-human behavior. We demonstrate that offline RL cansolve two challenges with effective influence. First, we show that by learningfrom a dataset of suboptimal human-human interaction on a variety of tasks --none of which contains examples of successful influence -- an agent can learninfluence strategies to steer humans towards better performance even on newtasks. Second, we show that by also modeling and conditioning on humanbehavior, offline RL can learn to affect not just the human's actions but alsotheir underlying strategy, and adapt to changes in their strategy.", "output": "Learning to Influence Human Behavior with Offline Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of privately estimating the parameters of$d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. For this,we develop a technique to reduce the problem to its non-private counterpart.This allows us to privatize existing non-private algorithms in a blackboxmanner, while incurring only a small overhead in the sample complexity andrunning time. As the main application of our framework, we develop an$(varepsilon, delta)$-differentially private algorithm to learn GMMs usingthe non-private algorithm of Moitra and Valiant [MV10] as a blackbox.Consequently, this gives the first sample complexity upper bound and firstpolynomial time algorithm for privately learning GMMs without any boundednessassumptions on the parameters. As part of our analysis, we prove a tight (up toa constant factor) lower bound on the total variation distance ofhigh-dimensional Gaussians which can be of independent interest.", "output": "Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian inference provides a principled framework for learning from complexdata and reasoning under uncertainty. It has been widely applied in machinelearning tasks such as medical diagnosis, drug design, and policymaking. Inthese common applications, data can be highly sensitive. Differential privacy(DP) offers data analysis tools with powerful worst-case privacy guarantees andhas been developed as the leading approach in privacy-preserving data analysis.In this paper, we study Metropolis-Hastings (MH), one of the most fundamentalMCMC methods, for large-scale Bayesian inference under differential privacy.While most existing private MCMC algorithms sacrifice accuracy and efficiencyto obtain privacy, we provide the first exact and fast DP MH algorithm, usingonly a minibatch of data in most iterations. We further reveal, for the firsttime, a three-way trade-off among privacy, scalability (i.e. the batch size),and efficiency (i.e. the convergence rate), theoretically characterizing howprivacy affects the utility and computational cost in Bayesian inference. Weempirically demonstrate the effectiveness and efficiency of our algorithm invarious experiments.", "output": "DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for Large-Scale Bayesian Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The use of Shap scores has become widespread in Explainable AI. However,their computation is in general intractable, in particular when done with ablack-box classifier, such as neural network. Recent research has unveiledclasses of open-box Boolean Circuit classifiers for which Shap can be computedefficiently. We show how to transform binary neural networks into thosecircuits for efficient Shap computation. We use logic-based knowledgecompilation techniques. The performance gain is huge, as we show in the lightof our experiments.", "output": "Efficient Computation of Shap Explanation Scores for Neural Network Classifiers via Knowledge Compilation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, graph pre-training has gained significant attention,focusing on acquiring transferable knowledge from unlabeled graph data toimprove downstream performance. Despite these recent endeavors, the problem ofnegative transfer remains a major concern when utilizing graph pre-trainedmodels to downstream tasks. Previous studies made great efforts on the issue ofwhat to pre-train and how to pre-train by designing a variety of graphpre-training and fine-tuning strategies. However, there are cases where eventhe most advanced \"pre-train and fine-tune\" paradigms fail to yield distinctbenefits. This paper introduces a generic framework W2PGNN to answer thecrucial question of when to pre-train (i.e., in what situations could we takeadvantage of graph pre-training) before performing effortful pre-training orfine-tuning. We start from a new perspective to explore the complex generativemechanisms from the pre-training data to downstream data. In particular, W2PGNNfirst fits the pre-training data into graphon bases, each element of graphonbasis (i.e., a graphon) identifies a fundamental transferable pattern shared bya collection of pre-training graphs. All convex combinations of graphon basesgive rise to a generator space, from which graphs generated form the solutionspace for those downstream data that can benefit from pre-training. In thismanner, the feasibility of pre-training can be quantified as the generationprobability of the downstream data from any generator in the generator space.W2PGNN offers three broad applications: providing the application scope ofgraph pre-trained models, quantifying the feasibility of pre-training, andassistance in selecting pre-training data to enhance downstream performance. Weprovide a theoretically sound solution for the first application and extensiveempirical justifications for the latter two applications.", "output": "When to Pre-Train Graph Neural Networks? From Data Generation Perspective!."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates the influence of incorporating spatiotemporal winddata on the performance of wind forecasting neural networks. While previousstudies have shown that including spatial data enhances the accuracy of suchmodels, limited research has explored the impact of different spatial andtemporal scales of input wind data on the learnability of neural networkmodels. In this study, convolutional neural networks (CNNs) are employed andtrained using various scales of spatiotemporal wind data. The researchdemonstrates that using spatiotemporally correlated data from the surroundingarea and past time steps for training a CNN favorably affects the predictiveperformance of the model. The study proposes correlation analyses, includingautocorrelation and Pearson correlation analyses, to unveil the influence ofspatiotemporal wind characteristics on the predictive performance of differentCNN models. The spatiotemporal correlations and performances of CNN models areinvestigated in three regions: Korea, the USA, and the UK. The findings revealthat regions with smaller deviations of autocorrelation coefficients (ACC) aremore favorable for CNNs to learn the regional and seasonal windcharacteristics. Specifically, the regions of Korea, the USA, and the UKexhibit maximum standard deviations of ACCs of 0.100, 0.043, and 0.023,respectively. The CNNs wind prediction performances follow the reverse order ofthe regions: UK, USA, and Korea. This highlights the significant impact ofregional and seasonal wind conditions on the performance of the predictionmodels.", "output": "How Regional Wind Characteristics Affect CNN-based wind predictions: Insights from Spatiotemporal Correlation Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial agents have traditionally been trained to maximize reward, whichmay incentivize power-seeking and deception, analogous to how next-tokenprediction in language models (LMs) may incentivize toxicity. So do agentsnaturally learn to be Machiavellian? And how do we measure these behaviors ingeneral-purpose models such as GPT-4? Towards answering these questions, weintroduce MACHIAVELLI, a benchmark of 134 Choose-Your-Own-Adventure gamescontaining over half a million rich, diverse scenarios that center on socialdecision-making. Scenario labeling is automated with LMs, which are moreperformant than human annotators. We mathematize dozens of harmful behaviorsand use our annotations to evaluate agents' tendencies to be power-seeking,cause disutility, and commit ethical violations. We observe some tensionbetween maximizing reward and behaving ethically. To improve this trade-off, weinvestigate LM-based methods to steer agents' towards less harmful behaviors.Our results show that agents can both act competently and morally, so concreteprogress can currently be made in machine ethics--designing agents that arePareto improvements in both safety and capabilities.", "output": "Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the mixing time of Metropolis-Adjusted Langevin algorithm (MALA) forsampling a target density on $mathbb{R}^d$. We assume that the target densitysatisfies $psi_mu$-isoperimetry and that the operator norm and trace of itsHessian are bounded by $L$ and $Upsilon$ respectively. Our main resultestablishes that, from a warm start, to achieve $epsilon$-total variationdistance to the target density, MALA mixes in$Oleft(frac{(LUpsilon)^{frac12}}{psi_mu^2}logleft(frac{1}{epsilon}right)right)$ iterations. Notably, this resultholds beyond the log-concave sampling setting and the mixing time depends ononly $Upsilon$ rather than its upper bound $L d$. In the $m$-stronglylogconcave and $L$-log-smooth sampling setting, our bound recovers the previousminimax mixing bound of MALA~cite{wu2021minimax}.", "output": "A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While machine learning can myopically reinforce social inequalities, it mayalso be used to dynamically seek equitable outcomes. In this paper, weformalize long-term fairness in the context of online reinforcement learning.This formulation can accommodate dynamical control objectives, such as drivingequity inherent in the state of a population, that cannot be incorporated intostatic formulations of fairness. We demonstrate that this framing allows analgorithm to adapt to unknown dynamics by sacrificing short-term incentives todrive a classifier-population system towards more desirable equilibria. For theproposed setting, we develop an algorithm that adapts recent work in onlinelearning. We prove that this algorithm achieves simultaneous probabilisticbounds on cumulative loss and cumulative violations of fairness (as statisticalregularities between demographic groups). We compare our proposed algorithm tothe repeated retraining of myopic classifiers, as a baseline, and to a deepreinforcement learning algorithm that lacks safety guarantees. Our experimentsmodel human populations according to evolutionary game theory and integratereal-world datasets.", "output": "Long-Term Fairness with Unknown Dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Skin and subcutaneous diseases rank high among the leading contributors tothe global burden of nonfatal diseases, impacting a considerable portion of thepopulation. Nonetheless, the field of dermatology diagnosis faces threesignificant hurdles. Firstly, there is a shortage of dermatologists accessibleto diagnose patients, particularly in rural regions. Secondly, accuratelyinterpreting skin disease images poses a considerable challenge. Lastly,generating patient-friendly diagnostic reports is usually a time-consuming andlabor-intensive task for dermatologists. To tackle these challenges, we presentSkinGPT-4, which is the world's first interactive dermatology diagnostic systempowered by an advanced visual large language model. SkinGPT-4 leverages afine-tuned version of MiniGPT-4, trained on an extensive collection of skindisease images (comprising 52,929 publicly available and proprietary images)along with clinical concepts and doctors' notes. We designed a two-steptraining process to allow SkinGPT to express medical features in skin diseaseimages with natural language and make accurate diagnoses of the types of skindiseases. With SkinGPT-4, users could upload their own skin photos fordiagnosis, and the system could autonomously evaluate the images, identifiesthe characteristics and categories of the skin conditions, performs in-depthanalysis, and provides interactive treatment recommendations. Meanwhile,SkinGPT-4's local deployment capability and commitment to user privacy alsorender it an appealing choice for patients in search of a dependable andprecise diagnosis of their skin ailments. To demonstrate the robustness ofSkinGPT-4, we conducted quantitative evaluations on 150 real-life cases, whichwere independently reviewed by certified dermatologists, and showed thatSkinGPT-4 could provide accurate diagnoses of skin diseases.", "output": "SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the connection between learning trajectories of DeepNeural Networks (DNNs) and their generalization capabilities when optimizedusing (stochastic) gradient descent algorithms. Instead of concentrating solelyon the generalization error of the DNN post-training, we present a novelperspective for analyzing generalization error by investigating thecontribution of each update step to the change in generalization error. Thisperspective allows for a more direct comprehension of how the learningtrajectory influences generalization error. Building upon this analysis, wepropose a new generalization bound that incorporates more extensive trajectoryinformation. Our proposed generalization bound depends on the complexity oflearning trajectory and the ratio between the bias and diversity of trainingset. Experimental findings reveal that our method effectively captures thegeneralization error throughout the training process. Furthermore, our approachcan also track changes in generalization error when adjustments are made tolearning rates and label noise levels. These results demonstrate that learningtrajectory information is a valuable indicator of a model's generalizationcapabilities.", "output": "Learning Trajectories are Generalization Indicators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The design of codes for feedback-enabled communications has been along-standing open problem. Recent research on non-linear, deep learning-basedcoding schemes have demonstrated significant improvements in communicationreliability over linear codes, but are still vulnerable to the presence offorward and feedback noise over the channel. In this paper, we develop a newfamily of non-linear feedback codes that greatly enhance robustness to channelnoise. Our autoencoder-based architecture is designed to learn codes based onconsecutive blocks of bits, which obtains de-noising advantages over bit-by-bitprocessing to help overcome the physical separation between the encoder anddecoder over a noisy channel. Moreover, we develop a power control layer at theencoder to explicitly incorporate hardware constraints into the learningoptimization, and prove that the resulting average power constraint issatisfied asymptotically. Numerical experiments demonstrate that our schemeoutperforms state-of-the-art feedback codes by wide margins over practicalforward and feedback noise regimes, and provide information-theoretic insightson the behavior of our non-linear codes. Moreover, we observe that, in a longblocklength regime, canonical error correction codes are still preferable tofeedback codes when the feedback noise becomes high.", "output": "Robust Non-Linear Feedback Coding via Power-Constrained Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models generate fluent texts and can follow natural languageinstructions to solve a wide range of tasks without task-specific training.Nevertheless, it is notoriously difficult to control their generation tosatisfy the various constraints required by different applications. In thiswork, we present InstructCTG, a controlled text generation framework thatincorporates different constraints by conditioning on natural languagedescriptions and demonstrations of the constraints. In particular, we firstextract the underlying constraints of natural texts through a combination ofoff-the-shelf NLP tools and simple heuristics. We then verbalize theconstraints into natural language instructions to form weakly supervisedtraining data. By prepending natural language descriptions of the constraintsand a few demonstrations, we fine-tune a pre-trained language model toincorporate various types of constraints. Compared to existing search-based orscore-based methods, InstructCTG is more flexible to different constraint typesand has a much smaller impact on the generation quality and speed because itdoes not modify the decoding procedure. Additionally, InstructCTG allows themodel to adapt to new constraints without re-training through the use offew-shot task generalization and in-context learning abilities ofinstruction-tuned language models.", "output": "Controlled Text Generation with Natural Language Instructions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of continual learning is to find a model that solves multiplelearning tasks which are presented sequentially to the learner. A key challengein this setting is that the learner may forget how to solve a previous taskwhen learning a new task, a phenomenon known as catastrophic forgetting. Toaddress this challenge, many practical methods have been proposed, includingmemory-based, regularization-based, and expansion-based methods. However, arigorous theoretical understanding of these methods remains elusive. This paperaims to bridge this gap between theory and practice by proposing a newcontinual learning framework called Ideal Continual Learner (ICL), which isguaranteed to avoid catastrophic forgetting by construction. We show that ICLunifies multiple well-established continual learning methods and gives newtheoretical insights into the strengths and weaknesses of these methods. Wealso derive generalization bounds for ICL which allow us to theoreticallyquantify how rehearsal affects generalization. Finally, we connect ICL toseveral classic subjects and research topics of modern interest, which allowsus to make historical remarks and inspire future directions.", "output": "The Ideal Continual Learner: An Agent That Never Forgets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Before deploying a black-box model in high-stakes problems, it is importantto evaluate the model's performance on sensitive subpopulations. For example,in a recidivism prediction task, we may wish to identify demographic groups forwhich our prediction model has unacceptably high false positive rates orcertify that no such groups exist. In this paper, we frame this task, oftenreferred to as \"fairness auditing,\" in terms of multiple hypothesis testing. Weshow how the bootstrap can be used to simultaneously bound performancedisparities over a collection of groups with statistical guarantees. Ourmethods can be used to flag subpopulations affected by model underperformance,and certify subpopulations for which the model performs adequately. Crucially,our audit is model-agnostic and applicable to nearly any performance metric orgroup fairness criterion. Our methods also accommodate extremely rich -- eveninfinite -- collections of subpopulations. Further, we generalize beyondsubpopulations by showing how to assess performance over certain distributionshifts. We test the proposed methods on benchmark datasets in predictiveinference and algorithmic fairness and find that our audits can provideinterpretable and trustworthy guarantees.", "output": "Statistical Inference for Fairness Auditing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The proximal policy optimization (PPO) algorithm stands as one of the mostprosperous methods in the field of reinforcement learning (RL). Despite itssuccess, the theoretical understanding of PPO remains deficient. Specifically,it is unclear whether PPO or its optimistic variants can effectively solvelinear Markov decision processes (MDPs), which are arguably the simplest modelsin RL with function approximation. To bridge this gap, we propose an optimisticvariant of PPO for episodic adversarial linear MDPs with full-informationfeedback, and establish a $tilde{mathcal{O}}(d^{3/4}H^2K^{3/4})$ regret forit. Here $d$ is the ambient dimension of linear MDPs, $H$ is the length of eachepisode, and $K$ is the number of episodes. Compared with existing policy-basedalgorithms, we achieve the state-of-the-art regret bound in both stochasticlinear MDPs and adversarial linear MDPs with full information. Additionally,our algorithm design features a novel multi-batched updating mechanism and thetheoretical analysis utilizes a new covering number argument of value andpolicy classes, which might be of independent interest.", "output": "A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work addresses the problem of revenue maximization in a repeated,unlimited supply item-pricing auction while preserving buyer privacy. Wepresent a novel algorithm that provides differential privacy with respect tothe buyer's input pair: item selection and bid. Notably, our algorithm is thefirst to offer a sublinear $O(sqrt{T}log{T})$ regret with a privacyguarantee. Our method is based on an exponential weights meta-algorithm, and wemitigate the issue of discontinuities in revenue functions via small randomperturbations. As a result of its structural similarity to the exponentialmechanism, our method inherently secures differential privacy. We also extendour algorithm to accommodate scenarios where buyers strategically bid oversuccessive rounds. The inherent differential privacy allows us to adapt ouralgorithm with minimal modification to ensure a sublinear regret in thissetting.", "output": "Differentially Private Online Item Pricing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identifying molecules that exhibit some pre-specified properties is adifficult problem to solve. In the last few years, deep generative models havebeen used for molecule generation. Deep Graph Variational Autoencoders areamong the most powerful machine learning tools with which it is possible toaddress this problem. However, existing methods struggle in capturing the truedata distribution and tend to be computationally expensive. In this work, wepropose RGCVAE, an efficient and effective Graph Variational Autoencoder basedon: (i) an encoding network exploiting a new powerful Relational GraphIsomorphism Network; (ii) a novel probabilistic decoding component. Compared toseveral state-of-the-art VAE methods on two widely adopted datasets, RGCVAEshows state-of-the-art molecule generation performance while beingsignificantly faster to train.", "output": "RGCVAE: Relational Graph Conditioned Variational Autoencoder for Molecule Design."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Conversion rate (CVR) prediction is one of the core components in onlinerecommender systems, and various approaches have been proposed to obtainaccurate and well-calibrated CVR estimation. However, we observe that awell-trained CVR prediction model often performs sub-optimally during salespromotions. This can be largely ascribed to the problem of the datadistribution shift, in which the conventional methods no longer work. To thisend, we seek to develop alternative modeling techniques for CVR prediction.Observing similar purchase patterns across different promotions, we proposereusing the historical promotion data to capture the promotional conversionpatterns. Herein, we propose a novel textbf{H}istorical textbf{D}atatextbf{R}euse (textbf{HDR}) approach that first retrieves historicallysimilar promotion data and then fine-tunes the CVR prediction model with theacquired data for better adaptation to the promotion mode. HDR consists ofthree components: an automated data retrieval module that seeks similar datafrom historical promotions, a distribution shift correction module thatre-weights the retrieved data for better aligning with the target promotion,and a TransBlock module that quickly fine-tunes the original model for betteradaptation to the promotion mode. Experiments conducted with real-world datademonstrate the effectiveness of HDR, as it improves both ranking andcalibration metrics to a large extent. HDR has also been deployed on thedisplay advertising system in Alibaba, bringing a lift of $9%$ RPM and $16%$CVR during Double 11 Sales in 2022.", "output": "Capturing Conversion Rate Fluctuation during Sales Promotions: A Novel Historical Data Reuse Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) models use only the intrinsic structure of agiven signal, independent of its acoustic domain, to extract essentialinformation from the input to an embedding space. This implies that the utilityof such representations is not limited to modeling human speech alone. Buildingon this understanding, this paper explores the cross-transferability of SSLneural representations learned from human speech to analyze bio-acousticsignals. We conduct a caller discrimination analysis and a caller detectionstudy on Marmoset vocalizations using eleven SSL models pre-trained withvarious pretext tasks. The results show that the embedding spaces carrymeaningful caller information and can successfully distinguish the individualidentities of Marmoset callers without fine-tuning. This demonstrates thatrepresentations pre-trained on human speech can be effectively applied to thebio-acoustics domain, providing valuable insights for future investigations inthis field.", "output": "Can Self-Supervised Neural Representations Pre-Trained on Human Speech distinguish Animal Callers?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models are powerful generative models but suffer from slowsampling, often taking 1000 sequential denoising steps for one sample. As aresult, considerable efforts have been directed toward reducing the number ofdenoising steps, but these methods hurt sample quality. Instead of reducing thenumber of denoising steps (trading quality for speed), in this paper we explorean orthogonal approach: can we run the denoising steps in parallel (tradingcompute for speed)? In spite of the sequential nature of the denoising steps,we show that surprisingly it is possible to parallelize sampling via Picarditerations, by guessing the solution of future denoising steps and iterativelyrefining until convergence. With this insight, we present ParaDiGMS, a novelmethod to accelerate the sampling of pretrained diffusion models by denoisingmultiple steps in parallel. ParaDiGMS is the first diffusion sampling methodthat enables trading compute for speed and is even compatible with existingfast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, weimprove sampling speed by 2-4x across a range of robotics and image generationmodels, giving state-of-the-art sampling speeds of 0.2s on 100-stepDiffusionPolicy and 16s on 1000-step StableDiffusion-v2 with no measurabledegradation of task reward, FID score, or CLIP score.", "output": "Parallel Sampling of Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Forecasting physical signals in long time range is among the most challengingtasks in Partial Differential Equations (PDEs) research. To circumventlimitations of traditional solvers, many different Deep Learning methods havebeen proposed. They are all based on auto-regressive methods and exhibitstability issues. Drawing inspiration from the stability property of implicitnumerical schemes, we introduce a stable auto-regressive implicit neuralnetwork. We develop a theory based on the stability definition of schemes toensure the stability in forecasting of this network. It leads us to introducehard constraints on its weights and propagate the dynamics in the latent space.Our experimental results validate our stability property, and show improvedresults at long-term forecasting for two transports PDEs.", "output": "Stability of implicit neural networks for long-term forecasting in dynamical systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Building generalizable AI models is one of the primary challenges in thehealthcare domain. While radiologists rely on generalizable descriptive rulesof abnormality, Neural Network (NN) models suffer even with a slight shift ininput distribution (e.g., scanner type). Fine-tuning a model to transferknowledge from one domain to another requires a significant amount of labeleddata in the target domain. In this paper, we develop an interpretable modelthat can be efficiently fine-tuned to an unseen target domain with minimalcomputational cost. We assume the interpretable component of NN to beapproximately domain-invariant. However, interpretable models typicallyunderperform compared to their Blackbox (BB) variants. We start with a BB inthe source domain and distill it into a emph{mixture} of shallow interpretablemodels using human-understandable concepts. As each interpretable model coversa subset of data, a mixture of interpretable models achieves comparableperformance as BB. Further, we use the pseudo-labeling technique fromsemi-supervised learning (SSL) to learn the concept classifier in the targetdomain, followed by fine-tuning the interpretable models in the target domain.We evaluate our model using a real-life large-scale chest-X-ray (CXR)classification dataset. The code is available at:url{", "output": "Distilling BlackBox to Interpretable models for Efficient Transfer Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of large language models (LLMs) such as ChatGPT has brought alot of attention recently. However, their evaluation in the benchmark academicdatasets remains under-explored due to the difficulty of evaluating thegenerative outputs produced by this model against the ground truth. In thispaper, we aim to present a thorough evaluation of ChatGPT's performance ondiverse academic datasets, covering tasks like question-answering, textsummarization, code generation, commonsense reasoning, mathematicalproblem-solving, machine translation, bias detection, and ethicalconsiderations. Specifically, we evaluate ChatGPT across 140 tasks and analyze255K responses it generates in these datasets. This makes our work the largestevaluation of ChatGPT in NLP benchmarks. In short, our study aims to validatethe strengths and weaknesses of ChatGPT in various tasks and provide insightsfor future research using LLMs. We also report a new emergent ability to followmulti-query instructions that we mostly found in ChatGPT and otherinstruction-tuned models. Our extensive evaluation shows that even thoughChatGPT is capable of performing a wide variety of tasks, and may obtainimpressive performance in several benchmark datasets, it is still far fromachieving the ability to reliably solve many challenging tasks. By providing athorough assessment of ChatGPT's performance across diverse NLP tasks, thispaper sets the stage for a targeted deployment of ChatGPT-like LLMs inreal-world applications.", "output": "A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In a sequential regression setting, a decision-maker may be primarilyconcerned with whether the future observation will increase or decreasecompared to the current one, rather than the actual value of the futureobservation. In this context, we introduce the notion of parity calibration,which captures the goal of calibrated forecasting for the increase-decrease (or\"parity\") event in a timeseries. Parity probabilities can be extracted from aforecasted distribution for the output, but we show that such a strategy leadsto theoretical unpredictability and poor practical performance. We then observethat although the original task was regression, parity calibration can beexpressed as binary calibration. Drawing on this connection, we use an onlinebinary calibration method to achieve parity calibration. We demonstrate theeffectiveness of our approach on real-world case studies in epidemiology,weather forecasting, and model-based control in nuclear fusion.", "output": "Parity Calibration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We tackle the problem of graph out-of-distribution (OOD) generalization.Existing graph OOD algorithms either rely on restricted assumptions or fail toexploit environment information in training data. In this work, we propose tosimultaneously incorporate label and environment causal independence (LECI) tofully make use of label and environment information, thereby addressing thechallenges faced by prior methods on identifying causal and invariantsubgraphs. We further develop an adversarial training strategy to jointlyoptimize these two properties for causal subgraph discovery with theoreticalguarantees. Extensive experiments and analysis show that LECI significantlyoutperforms prior methods on both synthetic and real-world datasets,establishing LECI as a practical and effective solution for graph OODgeneralization.", "output": "Joint Learning of Label and Environment Causal Independence for Graph Out-of-Distribution Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Integrating ethical practices into the AI development process for artificialintelligence (AI) is essential to ensure safe, fair, and responsible operation.AI ethics involves applying ethical principles to the entire life cycle of AIsystems. This is essential to mitigate potential risks and harms associatedwith AI, such as algorithm biases. To achieve this goal, responsible designpatterns (RDPs) are critical for Machine Learning (ML) pipelines to guaranteeethical and fair outcomes. In this paper, we propose a comprehensive frameworkincorporating RDPs into ML pipelines to mitigate risks and ensure the ethicaldevelopment of AI systems. Our framework comprises new responsible AI designpatterns for ML pipelines identified through a survey of AI ethics and datamanagement experts and validated through real-world scenarios with expertfeedback. The framework guides AI developers, data scientists, andpolicy-makers to implement ethical practices in AI development and deployresponsible AI systems in production.", "output": "Responsible Design Patterns for Machine Learning Pipelines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A number of deep reinforcement-learning (RL) approaches propose to controltraffic signals. In this work, we study the robustness of such methods alongtwo axes. First, sensor failures and GPS occlusions create missing-datachallenges and we show that recent methods remain brittle in the face of thesemissing data. Second, we provide a more systematic study of the generalizationability of RL methods to new networks with different traffic regimes. Again, weidentify the limitations of recent approaches. We then propose using acombination of distributional and vanilla reinforcement learning through apolicy ensemble. Building upon the state-of-the-art previous model which uses adecentralized approach for large-scale traffic signal control with graphconvolutional networks (GCNs), we first learn models using a distributionalreinforcement learning (DisRL) approach. In particular, we use implicitquantile networks (IQN) to model the state-action return distribution withquantile regression. For traffic signal control problems, an ensemble ofstandard RL and DisRL yields superior performance across different scenarios,including different levels of missing sensor data and traffic flow patterns.Furthermore, the learning scheme of the resulting model can improve zero-shottransferability to different road network structures, including both syntheticnetworks and real-world networks (e.g., Luxembourg, Manhattan). We conductextensive experiments to compare our approach to multi-agent reinforcementlearning and traditional transportation approaches. Results show that theproposed method improves robustness and generalizability in the face of missingdata, varying road networks, and traffic flows.", "output": "Improving the generalizability and robustness of large-scale traffic signal control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic speech recognition (ASR) models are frequently exposed to datadistribution shifts in many real-world scenarios, leading to erroneouspredictions. To tackle this issue, an existing test-time adaptation (TTA)method has recently been proposed to adapt the pre-trained ASR model onunlabeled test instances without source data. Despite decent performance gain,this work relies solely on naive greedy decoding and performs adaptation acrosstimesteps at a frame level, which may not be optimal given the sequentialnature of the model output. Motivated by this, we propose a novel TTAframework, dubbed SGEM, for general ASR models. To treat the sequential output,SGEM first exploits beam search to explore candidate output logits and selectsthe most plausible one. Then, it utilizes generalized entropy minimization andnegative sampling as unsupervised objectives to adapt the model. SGEM achievesstate-of-the-art performance for three mainstream ASR models under variousdomain shifts.", "output": "SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we strive to develop an interpretable GNNs' inferenceparadigm, termed MSInterpreter, which can serve as a plug-and-play schemereadily applicable to various GNNs' baselines. Unlike the most existingexplanation methods, MSInterpreter provides a Message-passing Selectionscheme(MSScheme) to select the critical paths for GNNs' message aggregations,which aims at reaching the self-explaination instead of post-hoc explanations.In detail, the elaborate MSScheme is designed to calculate weight factors ofmessage aggregation paths by considering the vanilla structure and nodeembedding components, where the structure base aims at weight factors amongnode-induced substructures; on the other hand, the node embedding base focuseson weight factors via node embeddings obtained by one-layer GNN.Finally, wedemonstrate the effectiveness of our approach on graph classificationbenchmarks.", "output": "Message-passing selection: Towards interpretable GNNs for graph classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The intrinsic difficulty in adapting deep learning models to non-stationaryenvironments limits the applicability of neural networks to real-world tasks.This issue is critical in practical supervised learning settings, such as theones in which a pre-trained model computes projections toward a latent spacewhere different task predictors are sequentially learned over time. As a matterof fact, incrementally fine-tuning the whole model to better adapt to new tasksusually results in catastrophic forgetting, with decreasing performance overthe past experiences and losing valuable knowledge from the pre-training stage.In this paper, we propose a novel strategy to make the fine-tuning proceduremore effective, by avoiding to update the pre-trained part of the network andlearning not only the usual classification head, but also a set ofnewly-introduced learnable parameters that are responsible for transforming theinput data. This process allows the network to effectively leverage thepre-training knowledge and find a good trade-off between plasticity andstability with modest computational efforts, thus especially suitable foron-the-edge settings. Our experiments on four image classification problems ina continual learning setting confirm the quality of the proposed approach whencompared to several fine-tuning procedures and to popular continual learningmethods.", "output": "Continual Learning with Pretrained Backbones by Tuning in the Input Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing melody harmonization models have made great progress in improvingthe quality of generated harmonies, but most of them ignored the emotionsbeneath the music. Meanwhile, the variability of harmonies generated byprevious methods is insufficient. To solve these problems, we propose a novelLSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate theinfluence of emotional conditions on melody harmonization, while improving thequality of generated harmonies and capturing the abundant variability of chordprogressions. Specifically, LHVAE incorporates latent variables and emotionalconditions at different levels (piece- and bar-level) to model the global andlocal music properties. Additionally, we introduce an attention-based melodycontext vector at each step to better learn the correspondence between melodiesand harmonies. Experimental results of the objective evaluation show that ourproposed model outperforms other LSTM-based models. Through subjectiveevaluation, we conclude that only altering the chords hardly changes theoverall emotion of the music. The qualitative analysis demonstrates the abilityof our model to generate variable harmonies.", "output": "Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Offline reinforcement learning (RL) offers an appealing approach toreal-world tasks by learning policies from pre-collected datasets withoutinteracting with the environment. However, the performance of existing offlineRL algorithms heavily depends on the scale and state-action space coverage ofdatasets. Real-world data collection is often expensive and uncontrollable,leading to small and narrowly covered datasets and posing significantchallenges for practical deployments of offline RL. In this paper, we provide anew insight that leveraging the fundamental symmetry of system dynamics cansubstantially enhance offline RL performance under small datasets.Specifically, we propose a Time-reversal symmetry (T-symmetry) enforcedDynamics Model (TDM), which establishes consistency between a pair of forwardand reverse latent dynamics. TDM provides both well-behaved representations forsmall datasets and a new reliability measure for OOD samples based oncompliance with the T-symmetry. These can be readily used to construct a newoffline RL algorithm (TSRL) with less conservative policy constraints and areliable latent space data augmentation procedure. Based on extensiveexperiments, we find TSRL achieves great performance on small benchmarkdatasets with as few as 1% of the original samples, which significantlyoutperforms the recent offline RL algorithms in terms of data efficiency andgeneralizability.", "output": "Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a detector of adversarial samples that is based on the view ofneural networks as discrete dynamic systems. The detector tells clean inputsfrom abnormal ones by comparing the discrete vector fields they follow throughthe layers. We also show that regularizing this vector field during trainingmakes the network more regular on the data distribution's support, thus makingthe activations of clean inputs more distinguishable from those of abnormalones. Experimentally, we compare our detector favorably to other detectors onseen and unseen attacks, and show that the regularization of the network'sdynamics improves the performance of adversarial detectors that use theinternal embeddings as inputs, while also improving test accuracy.", "output": "Adversarial Sample Detection Through Neural Network Transport Dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bayesian Optimization (BO) is a class of surrogate-based, sample-efficientalgorithms for optimizing black-box problems with small evaluation budgets. TheBO pipeline itself is highly configurable with many different design choicesregarding the initial design, surrogate model, and acquisition function (AF).Unfortunately, our understanding of how to select suitable components for aproblem at hand is very limited. In this work, we focus on the definition ofthe AF, whose main purpose is to balance the trade-off between exploringregions with high uncertainty and those with high promise for good solutions.We propose Self-Adjusting Weighted Expected Improvement (SAWEI), where we letthe exploration-exploitation trade-off self-adjust in a data-driven manner,based on a convergence criterion for BO. On the noise-free black-box BBOBfunctions of the COCO benchmarking platform, our method exhibits a favorableany-time performance compared to handcrafted baselines and serves as a robustdefault choice for any problem structure. The suitability of our method alsotransfers to HPOBench. With SAWEI, we are a step closer to on-the-fly,data-driven, and robust BO designs that automatically adjust their samplingbehavior to the problem at hand.", "output": "Self-Adjusting Weighted Expected Improvement for Bayesian Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parking guidance systems have recently become a popular trend as a part ofthe smart cities' paradigm of development. The crucial part of such systems isthe algorithm allowing drivers to search for available parking lots acrossregions of interest. The classic approach to this task is based on theapplication of neural network classifiers to camera records. However, existingsystems demonstrate a lack of generalization ability and appropriate testingregarding specific visual conditions. In this study, we extensively evaluatestate-of-the-art parking lot occupancy detection algorithms, compare theirprediction quality with the recently emerged vision transformers, and propose anew pipeline based on EfficientNet architecture. Performed computationalexperiments have demonstrated the performance increase in the case of ourmodel, which was evaluated on 5 different datasets.", "output": "Revising deep learning methods in parking lot occupancy detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge distillation (KD) has received much attention due to its success incompressing networks to allow for their deployment in resource-constrainedsystems. While the problem of adversarial robustness has been studied before inthe KD setting, previous works overlook what we term the relative calibrationof the student network with respect to its teacher in terms of softconfidences. In particular, we focus on two crucial questions with regard to ateacher-student pair: (i) do the teacher and student disagree at points closeto correctly classified dataset examples, and (ii) is the distilled student asconfident as the teacher around dataset examples? These are critical questionswhen considering the deployment of a smaller student network trained from arobust teacher within a safety-critical setting. To address these questions, weintroduce a faithful imitation framework to discuss the relative calibration ofconfidences, as well as provide empirical and certified methods to evaluate therelative calibration of a student w.r.t. its teacher. Further, to verifiablyalign the relative calibration incentives of the student to those of itsteacher, we introduce faithful distillation. Our experiments on the MNIST andFashion-MNIST datasets demonstrate the need for such an analysis and theadvantages of the increased verifiability of faithful distillation overalternative adversarial distillation methods.", "output": "Faithful Knowledge Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Whenever a binary classifier is used to provide decision support, ittypically provides both a label prediction and a confidence value. Then, thedecision maker is supposed to use the confidence value to calibrate how much totrust the prediction. In this context, it has been often argued that theconfidence value should correspond to a well calibrated estimate of theprobability that the predicted label matches the ground truth label. However,multiple lines of empirical evidence suggest that decision makers havedifficulties at developing a good sense on when to trust a prediction usingthese confidence values. In this paper, our goal is first to understand why andthen investigate how to construct more useful confidence values. We first arguethat, for a broad class of utility functions, there exist data distributionsfor which a rational decision maker is, in general, unlikely to discover theoptimal decision policy using the above confidence values -- an optimaldecision maker would need to sometimes place more (less) trust on predictionswith lower (higher) confidence values. However, we then show that, if theconfidence values satisfy a natural alignment property with respect to thedecision maker's confidence on her own predictions, there always exists anoptimal decision policy under which the level of trust the decision maker wouldneed to place on predictions is monotone on the confidence values, facilitatingits discoverability. Further, we show that multicalibration with respect to thedecision maker's confidence on her own predictions is a sufficient conditionfor alignment. Experiments on four different AI-assisted decision making taskswhere a classifier provides decision support to real human experts validate ourtheoretical results and suggest that alignment may lead to better decisions.", "output": "Human-Aligned Calibration for AI-Assisted Decision Making."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Decision support systems for classification tasks are predominantly designedto predict the value of the ground truth labels. However, since theirpredictions are not perfect, these systems also need to make human expertsunderstand when and how to use these predictions to update their ownpredictions. Unfortunately, this has been proven challenging. In this context,it has been recently argued that an alternative type of decision supportsystems may circumvent this challenge. Rather than providing a single labelprediction, these systems provide a set of label prediction values constructedusing a conformal predictor, namely a prediction set, and forcefully askexperts to predict a label value from the prediction set. However, the designand evaluation of these systems have so far relied on stylized expert models,questioning their promise. In this paper, we revisit the design of this type ofsystems from the perspective of online learning and develop a methodology thatdoes not require, nor assumes, an expert model. Our methodology leverages thenested structure of the prediction sets provided by any conformal predictor anda natural counterfactual monotonicity assumption on the experts' predictionsover the prediction sets to achieve an exponential improvement in regret incomparison with vanilla bandit algorithms. We conduct a large-scale humansubject study ($n = 2{,}751$) to verify our counterfactual monotonicityassumption and compare our methodology to several competitive baselines. Theresults suggest that decision support systems that limit experts' level ofagency may be practical and may offer greater performance than those allowingexperts to always exercise their own agency.", "output": "Designing Decision Support Systems Using Counterfactual Prediction Sets."}] \ No newline at end of file