diff --git a/LLMs-papers.md b/LLMs-papers.md
index a5024cd..86043ec 100644
--- a/LLMs-papers.md
+++ b/LLMs-papers.md
@@ -1,48 +1,48 @@
-1. [Cedille: A large autoregressive French language model](http://arxiv.org/abs/2202.03371v1)
-2. [How Good are Commercial Large Language Models on African Languages?](http://arxiv.org/abs/2305.06530v1)
-3. [A Precis of Language Models are not Models of Language](http://arxiv.org/abs/2205.07634v1)
-4. [Using large language models for (de-)formalization and natural
+1. [Lost in Translation: Large Language Models in Non-English Content
+  Analysis](http://arxiv.org/abs/2306.07377v1)
+2. [Cedille: A large autoregressive French language model](http://arxiv.org/abs/2202.03371v1)
+3. [How Good are Commercial Large Language Models on African Languages?](http://arxiv.org/abs/2305.06530v1)
+4. [A Precis of Language Models are not Models of Language](http://arxiv.org/abs/2205.07634v1)
+5. [Using large language models for (de-)formalization and natural
   argumentation exercises for beginner's students](http://arxiv.org/abs/2304.06186v1)
-5. [Should we Stop Training More Monolingual Models, and Simply Use Machine
+6. [Should we Stop Training More Monolingual Models, and Simply Use Machine
   Translation Instead?](http://arxiv.org/abs/2104.10441v1)
-6. [Beyond the limitations of any imaginable mechanism: large language
+7. [Beyond the limitations of any imaginable mechanism: large language
   models and psycholinguistics](http://arxiv.org/abs/2303.00077v1)
-7. [Enhance Reasoning Ability of Visual-Language Models via Large Language
+8. [Enhance Reasoning Ability of Visual-Language Models via Large Language
   Models](http://arxiv.org/abs/2305.13267v1)
-8. [Images in Language Space: Exploring the Suitability of Large Language
+9. [Images in Language Space: Exploring the Suitability of Large Language
   Models for Vision & Language Tasks](http://arxiv.org/abs/2305.13782v1)
-9. [When Being Unseen from mBERT is just the Beginning: Handling New
+10. [When Being Unseen from mBERT is just the Beginning: Handling New
   Languages With Multilingual Language Models](http://arxiv.org/abs/2010.12858v2)
-10. [Dynamic Fusion: Attentional Language Model for Neural Machine
+11. [Dynamic Fusion: Attentional Language Model for Neural Machine
   Translation](http://arxiv.org/abs/1909.04879v1)
-11. [Large Language Models are not Models of Natural Language: they are
+12. [Large Language Models are not Models of Natural Language: they are
   Corpus Models](http://arxiv.org/abs/2112.07055v2)
-12. [BigTrans: Augmenting Large Language Models with Multilingual Translation
+13. [BigTrans: Augmenting Large Language Models with Multilingual Translation
   Capability over 100 Languages](http://arxiv.org/abs/2305.18098v1)
-13. [Re-visiting Automated Topic Model Evaluation with Large Language Models](http://arxiv.org/abs/2305.12152v1)
-14. [Unsupervised Improvement of Factual Knowledge in Language Models](http://arxiv.org/abs/2304.01597v1)
-15. [UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment
+14. [Re-visiting Automated Topic Model Evaluation with Large Language Models](http://arxiv.org/abs/2305.12152v1)
+15. [Unsupervised Improvement of Factual Knowledge in Language Models](http://arxiv.org/abs/2304.01597v1)
+16. [UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment
   classification in low-resource languages](http://arxiv.org/abs/2304.14189v1)
-16. [Multilingual Text Classification for Dravidian Languages](http://arxiv.org/abs/2112.01705v1)
-17. [Can Large Language Models design a Robot?](http://arxiv.org/abs/2303.15324v1)
-18. [Jigsaw: Large Language Models meet Program Synthesis](http://arxiv.org/abs/2112.02969v1)
-19. [HinFlair: pre-trained contextual string embeddings for pos tagging and
+17. [Multilingual Text Classification for Dravidian Languages](http://arxiv.org/abs/2112.01705v1)
+18. [Can Large Language Models design a Robot?](http://arxiv.org/abs/2303.15324v1)
+19. [Jigsaw: Large Language Models meet Program Synthesis](http://arxiv.org/abs/2112.02969v1)
+20. [HinFlair: pre-trained contextual string embeddings for pos tagging and
   text classification in the Hindi language](http://arxiv.org/abs/2101.06949v1)
-20. [Larger-Scale Transformers for Multilingual Masked Language Modeling](http://arxiv.org/abs/2105.00572v1)
-21. [Exploring Cross-lingual Textual Style Transfer with Large Multilingual
+21. [Larger-Scale Transformers for Multilingual Masked Language Modeling](http://arxiv.org/abs/2105.00572v1)
+22. [Exploring Cross-lingual Textual Style Transfer with Large Multilingual
   Language Models](http://arxiv.org/abs/2206.02252v1)
-22. [Low-Resource Language Modelling of South African Languages](http://arxiv.org/abs/2104.00772v1)
-23. [In What Languages are Generative Language Models the Most Formal?
+23. [Low-Resource Language Modelling of South African Languages](http://arxiv.org/abs/2104.00772v1)
+24. [In What Languages are Generative Language Models the Most Formal?
   Analyzing Formality Distribution across Languages](http://arxiv.org/abs/2302.12299v1)
-24. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with
+25. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with
   Knowledge Distillation](http://arxiv.org/abs/2302.08387v1)
-25. [Reimagining Retrieval Augmented Language Models for Answering Queries](http://arxiv.org/abs/2306.01061v1)
-26. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1)
-27. [Can Character-based Language Models Improve Downstream Task Performance
+26. [Reimagining Retrieval Augmented Language Models for Answering Queries](http://arxiv.org/abs/2306.01061v1)
+27. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1)
+28. [Can Character-based Language Models Improve Downstream Task Performance
   in Low-Resource and Noisy Language Scenarios?](http://arxiv.org/abs/2110.13658v1)
-28. [Learning Translation Quality Evaluation on Low Resource Languages from
+29. [Learning Translation Quality Evaluation on Low Resource Languages from
   Large Language Models](http://arxiv.org/abs/2302.03491v1)
-29. [Investigating the Translation Performance of a Large Multilingual
+30. [Investigating the Translation Performance of a Large Multilingual
   Language Model: the Case of BLOOM](http://arxiv.org/abs/2303.01911v2)
-30. [Investigating Robustness of Dialog Models to Popular Figurative Language
-  Constructs](http://arxiv.org/abs/2110.00687v1)
diff --git a/data/2023-06-18.json b/data/2023-06-18.json
new file mode 100644
index 0000000..1700deb
--- /dev/null
+++ b/data/2023-06-18.json
@@ -0,0 +1 @@
+[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Anomaly detection is an important task in network management. However,deploying intelligent alert systems in real-world large-scale networkingsystems is challenging when we take into account (i) scalability, (ii) dataheterogeneity, and (iii) generalizability and maintainability. In this paper,we propose a hybrid model for an alert system that combines statistical modelswith a whitelist mechanism to tackle these challenges and reduce false positivealerts. The statistical models take advantage of a large database to detectanomalies in time-series data, while the whitelist filters out persistentlyalerted nodes to further reduce false positives. Our model is validated usingqualitative data from customer support cases. Future work includes more featureengineering and input data, as well as including human feedback in the modeldevelopment process.", "output": "A Hybrid Approach for Smart Alert Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With rich visual data, such as images, becoming readily associated withitems, visually-aware recommendation systems (VARS) have been widely used indifferent applications. Recent studies have shown that VARS are vulnerable toitem-image adversarial attacks, which add human-imperceptible perturbations tothe clean images associated with those items. Attacks on VARS pose new securitychallenges to a wide range of applications such as e-Commerce and socialnetworks where VARS are widely used. How to secure VARS from such adversarialattacks becomes a critical problem. Currently, there is still a lack ofsystematic study on how to design secure defense strategies against visualattacks on VARS. In this paper, we attempt to fill this gap by proposing anadversarial image reconstruction and detection framework to secure VARS. Ourproposed method can simultaneously (1) secure VARS from adversarial attackscharacterized by local perturbations by image reconstruction based on globalvision transformers; and (2) accurately detect adversarial examples using anovel contrastive learning approach. Meanwhile, our framework is designed to beused as both a filter and a detector so that they can be jointly trained toimprove the flexibility of our defense strategy to a variety of attacks andVARS models. We have conducted extensive experimental studies with two popularattack methods (FGSM and PGD). Our experimental results on two real-worlddatasets show that our defense strategy against visual attacks is effective andoutperforms existing methods on different attacks. Moreover, our method candetect adversarial examples with high accuracy.", "output": "Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid growth of distributed energy resources (DERs), such as renewableenergy sources, generators, consumers, and prosumers in the smart gridinfrastructure, poses significant cybersecurity and trust challenges to thegrid controller. Consequently, it is crucial to identify adversarial tacticsand measure the strength of the attacker's DER. To enable a trustworthy smartgrid controller, this work investigates a trustworthy artificial intelligence(AI) mechanism for proactive identification and explanation of the cyber riskcaused by the control/status message of DERs. Thus, proposing and developing atrustworthy AI framework to facilitate the deployment of any AI algorithms fordetecting potential cyber threats and analyzing root causes based on Shapleyvalue interpretation while dynamically quantifying the risk of an attack basedon Ward's minimum variance formula. The experiment with a state-of-the-artdataset establishes the proposed framework as a trustworthy AI by fulfillingthe capabilities of reliability, fairness, explainability, transparency,reproducibility, and accountability.", "output": "Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised text style transfer task aims to rewrite a text into targetstyle while preserving its main content. Traditional methods rely on the use ofa fixed-sized vector to regulate text style, which is difficult to accuratelyconvey the style strength for each individual token. In fact, each token of atext contains different style intensity and makes different contribution to theoverall style. Our proposed method addresses this issue by assigning individualstyle vector to each token in a text, allowing for fine-grained control andmanipulation of the style strength. Additionally, an adversarial trainingframework integrated with teacher-student learning is introduced to enhancetraining stability and reduce the complexity of high-dimensional optimization.The results of our experiments demonstrate the efficacy of our method in termsof clearly improved style transfer accuracy and content preservation in bothtwo-style transfer and multi-style transfer settings.", "output": "MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, neural networks have spread into numerous fields including manysafety-critical systems. Neural networks are built (and trained) by programmingin frameworks such as TensorFlow and PyTorch. Developers apply a rich set ofpre-defined layers to manually program neural networks or to automaticallygenerate them (e.g., through AutoML). Composing neural networks with differentlayers is error-prone due to the non-trivial constraints that must be satisfiedin order to use those layers. In this work, we propose an approach toautomatically repair erroneous neural networks. The challenge is in identifyinga minimal modification to the network so that it becomes valid. Modifying alayer might have cascading effects on subsequent layers and thus our approachmust search recursively to identify a \"globally\" minimal modification. Ourapproach is based on an executable semantics of deep learning layers andfocuses on four kinds of errors which are common in practice. We evaluate ourapproach for two usage scenarios, i.e., repairing automatically generatedneural networks and manually written ones suffering from common model bugs. Theresults show that we are able to repair 100% of a set of randomly generatedneural networks (which are produced with an existing AI framework testingapproach) effectively and efficiently (with an average repair time of 21.08s)and 93.75% of a collection of real neural network bugs (with an average time of3min 40s).", "output": "Semantic-Based Neural Network Repair."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we develop machine learning techniques to identify unknownprinters in early modern (c.~1500--1800) English printed books. Specifically,we focus on matching uniquely damaged character type-imprints in anonymouslyprinted books to works with known printers in order to provide evidence oftheir origins. Until now, this work has been limited to manual investigationsby analytical bibliographers. We present a Contrastive Attention-based MetricLearning approach to identify similar damage across character image pairs,which is sensitive to very subtle differences in glyph shapes, yet robust tovarious confounding sources of noise associated with digitized historicalbooks. To overcome the scarce amount of supervised data, we design a randomdata synthesis procedure that aims to simulate bends, fractures, and inkingvariations induced by the early printing process. Our method successfullyimproves downstream damaged type-imprint matching among printed works from thisperiod, as validated by in-domain human experts. The results of our approach ontwo important philosophical works from the Early Modern period demonstratepotential to extend the extant historical research about the origins andcontent of these books.", "output": "Contrastive Attention Networks for Attribution of Early Modern Print."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ensuring diagnostic performance of AI models before clinical use is key tothe safe and successful adoption of these technologies. Studies reporting AIapplied to digital pathology images for diagnostic purposes have rapidlyincreased in number in recent years. The aim of this work is to provide anoverview of the diagnostic accuracy of AI in digital pathology images from allareas of pathology. This systematic review and meta-analysis includeddiagnostic accuracy studies using any type of artificial intelligence appliedto whole slide images (WSIs) in any disease type. The reference standard wasdiagnosis through histopathological assessment and / or immunohistochemistry.Searches were conducted in PubMed, EMBASE and CENTRAL in June 2022. Weidentified 2976 studies, of which 100 were included in the review and 48 in thefull meta-analysis. Risk of bias and concerns of applicability were assessedusing the QUADAS-2 tool. Data extraction was conducted by two investigators andmeta-analysis was performed using a bivariate random effects model. 100 studieswere identified for inclusion, equating to over 152,000 whole slide images(WSIs) and representing many disease types. Of these, 48 studies were includedin the meta-analysis. These studies reported a mean sensitivity of 96.3% (CI94.1-97.7) and mean specificity of 93.3% (CI 90.5-95.4) for AI. There wassubstantial heterogeneity in study design and all 100 studies identified forinclusion had at least one area at high or unclear risk of bias. This reviewprovides a broad overview of AI performance across applications in whole slideimaging. However, there is huge variability in study design and availableperformance data, with details around the conduct of the study and make up ofthe datasets frequently missing. Overall, AI offers good accuracy when appliedto WSIs but requires more rigorous evaluation of its performance.", "output": "Diagnostic test accuracy (DTA) of artificial intelligence in digital pathology: a systematic review, meta-analysis and quality assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Active learning algorithms have been an integral part of recent advances inartificial intelligence. However, the research in the field is widely varyingand lacks an overall organizing leans. We outline a Markovian formalism for thefield of active learning and survey the literature to demonstrate theorganizing capability of our proposed formalism. Our formalism takes apartially observable Markovian system approach to the active learning processas a whole. We specifically outline how querying, dataset augmentation, rewardupdates, and other aspects of active learning can be viewed as a transitionbetween meta-states in a Markovian system, and give direction into how otheraspects of active learning can fit into our formalism.", "output": "A Markovian Formalism for Active Querying."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increase in the use of photovoltaic (PV) energy in the world has shownthat the useful life and maintenance of a PV plant directly depend ontheability to quickly detect severe faults on a PV plant. To solve this problemof detection, data based approaches have been proposed in theliterature.However, these previous solutions consider only specific behavior ofone or few faults. Most of these approaches can be qualified as supervised,requiring an enormous labelling effort (fault types clearly identified in eachtechnology). In addition, most of them are validated in PV cells or one PVmodule. That is hardly applicable in large-scale PV plants considering theircomplexity. Alternatively, some unsupervised well-known approaches based ondata try to detect anomalies but are not able to identify precisely the type offault. The most performant of these methods do manage to efficiently grouphealthy panels and separate them from faulty panels. In that way, this articlepresents an unsupervised approach called DTW K-means. This approach takesadvantages of both the dynamic time warping (DWT) metric and the Kmeansclustering algorithm as a data-driven approach. The results of this mixedmethod in a PV string are compared to diagnostic labels established by visualinspection of the panels.", "output": "DTW k-means clustering for fault detection in photovoltaic modules."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diagnosis in PV systems aims to detect, locate and identify faults.Diagnosing these faults is vital to guarantee energy production and extend theuseful life of PV power plants. In the literature, multiple machine learningapproaches have been proposed for this purpose. However, few of these workshave paid special attention to the detection of fine faults and the specializedprocess of extraction and selection of features for their classification. Afine fault is one whose characteristic signature is difficult to distinguish tothat of a healthy panel. As a contribution to the detection of fine faults(especially of the snail trail type), this article proposes an innovativeapproach based on the Random Forest (RF) algorithm. This approach uses acomplex feature extraction and selection method that improves the computationaltime of fault classification while maintaining high accuracy.", "output": "Detection and classification of faults aimed at preventive maintenance of PV systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Backdoor attacks have emerged as an urgent threat to Deep Neural Networks(DNNs), where victim DNNs are furtively implanted with malicious neurons thatcould be triggered by the adversary. To defend against backdoor attacks, manyworks establish a staged pipeline to remove backdoors from victim DNNs:inspecting, locating, and erasing. However, in a scenario where a few cleandata can be accessible, such pipeline is fragile and cannot erase backdoorscompletely without sacrificing model accuracy. To address this issue, in thispaper, we propose a novel data-free holistic backdoor erasing (DHBE) framework.Instead of the staged pipeline, the DHBE treats the backdoor erasing task as aunified adversarial procedure, which seeks equilibrium between two differentcompeting processes: distillation and backdoor regularization. In distillation,the backdoored DNN is distilled into a proxy model, transferring its knowledgeabout clean data, yet backdoors are simultaneously transferred. In backdoorregularization, the proxy model is holistically regularized to prevent frominfecting any possible backdoor transferred from distillation. These twoprocesses jointly proceed with data-free adversarial optimization until aclean, high-accuracy proxy model is obtained. With the novel adversarialdesign, our framework demonstrates its superiority in three aspects: 1) minimaldetriment to model accuracy, 2) high tolerance for hyperparameters, and 3) nodemand for clean data. Extensive experiments on various backdoor attacks anddatasets are performed to verify the effectiveness of the proposed framework.Code is available at url{", "output": "DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain shift is considered a challenge in machine learning as it causessignificant degradation of model performance. In the Acoustic SceneClassification task (ASC), domain shift is mainly caused by different recordingdevices. Several studies have already targeted domain generalization to improvethe performance of ASC models on unseen domains, such as new devices. Recently,the Controllable Gate Adapter ConGater has been proposed in Natural LanguageProcessing to address the biased training data problem. ConGater allowscontrolling the debiasing process at inference time. ConGater's main advantageis the continuous and selective debiasing of a trained model, during inference.In this work, we adapt ConGater to the audio spectrogram transformer for anacoustic scene classification task. We show that ConGater can be used toselectively adapt the learned representations to be invariant to device domainshifts such as recording devices. Our analysis shows that ConGater canprogressively remove device information from the learned representations andimprove the model generalization, especially under domain shift conditions(e.g. unseen devices). We show that information removal can be extended to bothdevice and location domain. Finally, we demonstrate ConGater's ability toenhance specific device performance without further training.", "output": "Domain Information Control at Inference Time for Acoustic Scene Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) naturally faces the problem of data heterogeneity inreal-world scenarios, but this is often overlooked by studies on FL securityand privacy. On the one hand, the effectiveness of backdoor attacks on FL maydrop significantly under non-IID scenarios. On the other hand, maliciousclients may steal private data through privacy inference attacks. Therefore, itis necessary to have a comprehensive perspective of data heterogeneity,backdoor, and privacy inference. In this paper, we propose a novel privacyinference-empowered stealthy backdoor attack (PI-SBA) scheme for FL undernon-IID scenarios. Firstly, a diverse data reconstruction mechanism based ongenerative adversarial networks (GANs) is proposed to produce a supplementarydataset, which can improve the attacker's local data distribution and supportmore sophisticated strategies for backdoor attacks. Based on this, we design asource-specified backdoor learning (SSBL) strategy as a demonstration, allowingthe adversary to arbitrarily specify which classes are susceptible to thebackdoor trigger. Since the PI-SBA has an independent poisoned data synthesisprocess, it can be integrated into existing backdoor attacks to improve theireffectiveness and stealthiness in non-IID scenarios. Extensive experimentsbased on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that theproposed PI-SBA scheme is effective in non-IID FL and stealthy againststate-of-the-art defense methods.", "output": "Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a robust and reliable evaluation metric for generative models byintroducing topological and statistical treatments for rigorous supportestimation. Existing metrics, such as Inception Score (IS), Fr'echet InceptionDistance (FID), and the variants of Precision and Recall (P&amp;R), heavily relyon supports that are estimated from sample features. However, the reliabilityof their estimation has not been seriously discussed (and overlooked) eventhough the quality of the evaluation entirely depends on it. In this paper, wepropose Topological Precision and Recall (TopP&amp;R, pronounced 'topper'), whichprovides a systematic approach to estimating supports, retaining onlytopologically and statistically important features with a certain level ofconfidence. This not only makes TopP&amp;R strong for noisy features, but alsoprovides statistical consistency. Our theoretical and experimental results showthat TopP&amp;R is robust to outliers and non-independent and identicallydistributed (Non-IID) perturbations, while accurately capturing the true trendof change in samples. To the best of our knowledge, this is the firstevaluation metric focused on the robust estimation of the support and providesits statistical consistency under noise.", "output": "TopP\\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Free Energy Principle (FEP) is a theoretical framework for describing how(intelligent) systems self-organise into coherent, stable structures byminimising a free energy functional. Active Inference (AIF) is a corollary ofthe FEP that specifically details how systems that are able to plan for thefuture (agents) function by minimising particular free energy functionals thatincorporate information seeking components. This paper is the first in a seriesof two where we derive a synthetic version of AIF on free form factor graphs.The present paper focuses on deriving a local version of the free energyfunctionals used for AIF. This enables us to construct a version of AIF whichapplies to arbitrary graphical models and interfaces with prior work on messagepassing algorithms. The resulting messages are derived in our companion paper.We also identify a gap in the graphical notation used for factor graphs. Whilefactor graphs are great at expressing a generative model, they have so far beenunable to specify the full optimisation problem including constraints. To solvethis problem we develop Constrained Forney-style Factor Graph (CFFG) notationwhich permits a fully graphical description of variational inferenceobjectives. We then proceed to show how CFFG's can be used to reconstruct prioralgorithms for AIF as well as derive new ones. The latter is demonstrated byderiving an algorithm that permits direct policy inference for AIF agents,circumventing a long standing scaling issue that has so far hindered theapplication of AIF in industrial settings. We demonstrate our algorithm on theclassic T-maze task and show that it reproduces the information seekingbehaviour that is a hallmark feature of AIF.", "output": "Realising Synthetic Active Inference Agents, Part I: Epistemic Objectives and Graphical Specification Language."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs), with their remarkable task-handlingcapabilities and innovative outputs, have catalyzed significant advancementsacross a spectrum of fields. However, their proficiency within specializeddomains such as biomolecular studies remains limited. To address thischallenge, we introduce Mol-Instructions, a meticulously curated, comprehensiveinstruction dataset expressly designed for the biomolecular realm.Mol-Instructions is composed of three pivotal components: molecule-orientedinstructions, protein-oriented instructions, and biomolecular textinstructions, each curated to enhance the understanding and predictioncapabilities of LLMs concerning biomolecular features and behaviors. Throughextensive instruction tuning experiments on the representative LLM, weunderscore the potency of Mol-Instructions to enhance the adaptability andcognitive acuity of large models within the complex sphere of biomolecularstudies, thereby promoting advancements in the biomolecular research community.Mol-Instructions is made publicly accessible for future research endeavors andwill be subjected to continual updates for enhanced applicability.", "output": "Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing availability of digital collections of historical andcontemporary literature presents a wealth of possibilities for new research inthe humanities. The scale and diversity of such collections however, presentsparticular challenges in identifying and extracting relevant content. Thispaper presents Curatr, an online platform for the exploration and curation ofliterature with machine learning-supported semantic search, designed within thecontext of digital humanities scholarship. The platform provides a text miningworkflow that combines neural word embeddings with expert domain knowledge toenable the generation of thematic lexicons, allowing researches to curaterelevant sub-corpora from a large corpus of 18th and 19th century digitisedtexts.", "output": "Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Finding optimal channel dimensions (i.e., the number of filters in DNNlayers) is essential to design DNNs that perform well under computationalresource constraints. Recent work in neural architecture search aims atautomating the optimization of the DNN model implementation. However, existingneural architecture search methods for channel dimensions rely on fixed searchspaces, which prevents achieving an efficient and fully automated solution. Inthis work, we propose a novel differentiable neural architecture search methodwith an efficient dynamic channel allocation algorithm to enable a flexiblesearch space for channel dimensions. We show that the proposed framework isable to find DNN architectures that are equivalent to previous methods in taskaccuracy and inference latency for the CIFAR-10 dataset with an improvement of$1.3-1.7times$ in GPU-hours and $1.5-1.7times$ in the memory requirementsduring the architecture search stage. Moreover, the proposed frameworks do notrequire a well-engineered search space a priori, which is an important steptowards fully automated design of DNN architectures.", "output": "Flexible Channel Dimensions for Differentiable Architecture Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We characterize offline data poisoning attacks on Multi-Agent ReinforcementLearning (MARL), where an attacker may change a data set in an attempt toinstall a (potentially fictitious) unique Markov-perfect Nash equilibrium. Wepropose the unique Nash set, namely the set of games, specified by their Qfunctions, with a specific joint policy being the unique Nash equilibrium. Theunique Nash set is central to poisoning attacks because the attack issuccessful if and only if data poisoning pushes all plausible games inside it.The unique Nash set generalizes the reward polytope commonly used in inversereinforcement learning to MARL. For zero-sum Markov games, both the inverseNash set and the set of plausible games induced by data are polytopes in the Qfunction space. We exhibit a linear program to efficiently compute the optimalpoisoning attack. Our work sheds light on the structure of data poisoningattacks on offline MARL, a necessary step before one can design more robustMARL algorithms.", "output": "On Faking a Nash Equilibrium."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Natural language explanations have the potential to provide rich informationthat in principle guides model reasoning. Yet, recent work by Lampinen et al.(2022) has shown limited utility of natural language explanations in improvingclassification. To effectively learn from explanations, we present FLamE, atwo-stage few-shot learning framework that first generates explanations usingGPT-3, and then finetunes a smaller model (e.g., RoBERTa) with generatedexplanations. Our experiments on natural language inference demonstrateeffectiveness over strong baselines, increasing accuracy by 17.6% over GPT-3Babbage and 5.7% over GPT-3 Davinci in e-SNLI. Despite improving classificationperformance, human evaluation surprisingly reveals that the majority ofgenerated explanations does not adequately justify classification decisions.Additional analyses point to the important role of label-specific cues (e.g.,\"not know\" for the neutral label) in generated explanations.", "output": "FLamE: Few-shot Learning from Natural Language Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most medical treatment decisions are sequential in nature. Hence, there issubstantial hope that reinforcement learning may make it possible to formulateprecise data-driven treatment plans. However, a key challenge for mostapplications in this field is the sparse nature of primarily mortality-basedreward functions, leading to decreased stability of offline estimates. In thiswork, we introduce a deep Q-learning approach able to obtain more reliablecritical care policies. This method integrates relevant but noisy intermediatebiomarker signals into the reward specification, without compromising theoptimization of the main outcome of interest (e.g. patient survival). Weachieve this by first pruning the action set based on all available rewards,and second training a final model based on the sparse main reward but with arestricted action set. By disentangling accurate and approximated rewardsthrough action pruning, potential distortions of the main objective areminimized, all while enabling the extraction of valuable information fromintermediate signals that can guide the learning process. We evaluate ourmethod in both off-policy and offline settings using simulated environments andreal health records of patients in intensive care units. Our empirical resultsindicate that pruning significantly reduces the size of the action space whilestaying mostly consistent with the actions taken by physicians, outperformingthe current state-of-the-art offline reinforcement learning method conservativeQ-learning. Our work is a step towards developing reliable policies byeffectively harnessing the wealth of available information in data-intensivecritical care environments.", "output": "Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperparameter tuning of deep learning models can lead to order-of-magnitudeperformance gains for the same amount of compute. Despite this, systematictuning is uncommon, particularly for large models, which are expensive toevaluate and tend to have many hyperparameters, necessitating difficultjudgment calls about tradeoffs, budgets, and search bounds. To address theseissues and propose a practical method for robustly tuning large models, wepresent Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesianoptimization algorithm that performs local search around the performance-costPareto frontier. CARBS does well even in unbounded search spaces with manyhyperparameters, learns scaling relationships so that it can tune models evenas they are scaled up, and automates much of the \"black magic\" of tuning. Amongour results, we effectively solve the entire ProcGen benchmark just by tuning asimple baseline (PPO, as provided in the original ProcGen paper). We alsoreproduce the model size vs. training tokens scaling result from the Chinchillaproject (Hoffmann et al. 2022), while simultaneously discovering scaling lawsfor every other hyperparameter, via an easy automated process that usessignificantly less compute and is applicable to any deep learning problem (notjust language models).", "output": "Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributed trust is a nebulous concept that has evolved from differentperspectives in recent years. While one can attribute its current prominence toblockchain and cryptocurrency, the distributed trust concept has beencultivating progress in federated learning, trustworthy and responsible AI inan ecosystem setting, data sharing, privacy issues across organizationalboundaries, and zero trust cybersecurity. This paper will survey the concept ofdistributed trust in multiple disciplines. It will take a system/softwarearchitecture point of view to look at trust redistribution/shift and theassociated tradeoffs in systems and applications enabled by distributed trusttechnologies.", "output": "Distributed Trust Through the Lens of Software Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning symbolic expressions directly from experiment data is a vital stepin AI-driven scientific discovery. Nevertheless, state-of-the-art approachesare limited to learning simple expressions. Regressing expressions involvingmany independent variables still remain out of reach. Motivated by the controlvariable experiments widely utilized in science, we propose Control VariableGenetic Programming (CVGP) for symbolic regression over many independentvariables. CVGP expedites symbolic expression discovery via customizedexperiment design, rather than learning from a fixed dataset collected apriori. CVGP starts by fitting simple expressions involving a small set ofindependent variables using genetic programming, under controlled experimentswhere other variables are held as constants. It then extends expressionslearned in previous generations by adding new independent variables, using newcontrol variable experiments in which these variables are allowed to vary.Theoretically, we show CVGP as an incremental building approach can yield anexponential reduction in the search space when learning a class of expressions.Experimentally, CVGP outperforms several baselines in learning symbolicexpressions involving multiple independent variables.", "output": "Symbolic Regression via Control Variable Genetic Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent progress in 3D scene understanding enables scalable learning ofrepresentations across large datasets of diverse scenes. As a consequence,generalization to unseen scenes and objects, rendering novel views from just asingle or a handful of input images, and controllable scene generation thatsupports editing, is now possible. However, training jointly on a large numberof scenes typically compromises rendering quality when compared to single-sceneoptimized models such as NeRFs. In this paper, we leverage recent progress indiffusion models to equip 3D scene representation learning models with theability to render high-fidelity novel views, while retaining benefits such asobject-level scene editing to a large degree. In particular, we propose DORSal,which adapts a video diffusion architecture for 3D scene generation conditionedon object-centric slot-based representations of scenes. On both complexsynthetic multi-object scenes and on the real-world large-scale Street Viewdataset, we show that DORSal enables scalable neural rendering of 3D sceneswith object-level editing and improves upon existing approaches.", "output": "DORSal: Diffusion for Object-centric Representations of Scenes $\\textit{et al.}$."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The surge in Reinforcement Learning (RL) applications in IntelligentTransportation Systems (ITS) has contributed to its growth as well ashighlighted key challenges. However, defining objectives of RL agents intraffic control and management tasks, as well as aligning policies with thesegoals through an effective formulation of Markov Decision Process (MDP), can bechallenging and often require domain experts in both RL and ITS. Recentadvancements in Large Language Models (LLMs) such as GPT-4 highlight theirbroad general knowledge, reasoning capabilities, and commonsense priors acrossvarious domains. In this work, we conduct a large-scale user study involving 70participants to investigate whether novices can leverage ChatGPT to solvecomplex mixed traffic control problems. Three environments are tested,including ring road, bottleneck, and intersection. We find ChatGPT has mixedresults. For intersection and bottleneck, ChatGPT increases number ofsuccessful policies by 150% and 136% compared to solely beginner capabilities,with some of them even outperforming experts. However, ChatGPT does not provideconsistent improvements across all scenarios.", "output": "Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing reliance on large language models (LLMs) in academic writinghas led to a rise in plagiarism. Existing AI-generated text classifiers havelimited accuracy and often produce false positives. We propose a novel approachusing natural language processing (NLP) techniques, offering quantifiablemetrics at both sentence and document levels for easier interpretation by humanevaluators. Our method employs a multi-faceted approach, generating multipleparaphrased versions of a given question and inputting them into the LLM togenerate answers. By using a contrastive loss function based on cosinesimilarity, we match generated sentences with those from the student'sresponse. Our approach achieves up to 94% accuracy in classifying human and AItext, providing a robust and adaptable solution for plagiarism detection inacademic settings. This method improves with LLM advancements, reducing theneed for new model training or reconfiguration, and offers a more transparentway of evaluating and detecting AI-generated text.", "output": "Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized dialogue agents (DAs) powered by large pre-trained languagemodels (PLMs) often rely on explicit persona descriptions to maintainpersonality consistency. However, such descriptions may not always be availableor may pose privacy concerns. To tackle this bottleneck, we introducePersonaPKT, a lightweight transfer learning approach that can buildpersona-consistent dialogue models without explicit persona descriptions. Byrepresenting each persona as a continuous vector, PersonaPKT learns implicitpersona-specific features directly from a small number of dialogue samplesproduced by the same persona, adding less than 0.1% trainable parameters foreach persona on top of the PLM backbone. Empirical results demonstrate thatPersonaPKT effectively builds personalized DAs with high storage efficiency,outperforming various baselines in terms of persona consistency whilemaintaining good response generation quality. In addition, it enhances privacyprotection by avoiding explicit persona descriptions. Overall, PersonaPKT is aneffective solution for creating personalized DAs that respect user privacy.", "output": "PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an autonomous information seeking visual questionanswering framework, AVIS. Our method leverages a Large Language Model (LLM) todynamically strategize the utilization of external tools and to investigatetheir outputs, thereby acquiring the indispensable knowledge needed to provideanswers to the posed questions. Responding to visual questions that necessitateexternal knowledge, such as \"What event is commemorated by the buildingdepicted in this image?\", is a complex task. This task presents a combinatorialsearch space that demands a sequence of actions, including invoking APIs,analyzing their responses, and making informed decisions. We conduct a userstudy to collect a variety of instances of human decision-making when facedwith this task. This data is then used to design a system comprised of threecomponents: an LLM-powered planner that dynamically determines which tool touse next, an LLM-powered reasoner that analyzes and extracts key informationfrom the tool outputs, and a working memory component that retains the acquiredinformation throughout the process. The collected user behavior serves as aguide for our system in two key ways. First, we create a transition graph byanalyzing the sequence of decisions made by users. This graph delineatesdistinct states and confines the set of actions available at each state.Second, we use examples of user decision-making to provide our LLM-poweredplanner and reasoner with relevant contextual instances, enhancing theircapacity to make informed decisions. We show that AVIS achievesstate-of-the-art results on knowledge-intensive visual question answeringbenchmarks such as Infoseek and OK-VQA.", "output": "AVIS: Autonomous Visual Information Seeking with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As generative AI becomes more prevalent, it is important to study how humanusers interact with such models. In this work, we investigate how people usetext-to-image models to generate desired target images. To study thisinteraction, we created ArtWhisperer, an online game where users are given atarget image and are tasked with iteratively finding a prompt that creates asimilar-looking image as the target. Through this game, we recorded over 50,000human-AI interactions; each interaction corresponds to one text prompt createdby a user and the corresponding generated image. The majority of these arerepeated interactions where a user iterates to find the best prompt for theirtarget image, making this a unique sequential dataset for studying human-AIcollaborations. In an initial analysis of this dataset, we identify severalcharacteristics of prompt interactions and user strategies. People submitdiverse prompts and are able to discover a variety of text descriptions thatgenerate similar images. Interestingly, prompt diversity does not decrease asusers find better prompts. We further propose to a new metric the study thesteerability of AI using our dataset. We define steerability as the expectednumber of interactions required to adequately complete a task. We estimate thisvalue by fitting a Markov chain for each target task and calculating theexpected time to reach an adequate score in the Markov chain. We quantify andcompare AI steerability across different types of target images and twodifferent models, finding that images of cities and natural world images aremore steerable than artistic and fantasy images. These findings provideinsights into human-AI interaction behavior, present a concrete method ofassessing AI steerability, and demonstrate the general utility of theArtWhisperer dataset.", "output": "ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized prediction is a machine learning approach that predicts aperson's future observations based on their past labeled observations and istypically used for sequential tasks, e.g., to predict daily mood ratings. Whenmaking personalized predictions, a model can combine two types of trends: (a)trends shared across people, i.e., person-generic trends, such as being happieron weekends, and (b) unique trends for each person, i.e., person-specifictrends, such as a stressful weekly meeting. Mixed effect models are popularstatistical models to study both trends by combining person-generic andperson-specific parameters. Though linear mixed effect models are gainingpopularity in machine learning by integrating them with neural networks, theseintegrations are currently limited to linear person-specific parameters: rulingout nonlinear person-specific trends. In this paper, we propose Neural MixedEffect (NME) models to optimize nonlinear person-specific parameters anywherein a neural network in a scalable manner. NME combines the efficiency of neuralnetwork optimization with nonlinear mixed effects modeling. Empirically, weobserve that NME improves performance across six unimodal and multimodaldatasets, including a smartphone dataset to predict daily mood and amother-adolescent dataset to predict affective state sequences where half themothers experience at least moderate symptoms of depression. Furthermore, weevaluate NME for two model architectures, including for neural conditionalrandom fields (CRF) to predict affective state sequences where the CRF learnsnonlinear person-specific temporal transitions between affective states.Analysis of these person-specific transitions on the mother-adolescent datasetshows interpretable trends related to the mother's depression symptoms.", "output": "Neural Mixed Effects for Nonlinear Personalized Predictions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cryptocurrencies have gained popularity across various sectors, especially infinance and investment. The popularity is partly due to their uniquespecifications originating from blockchain-related characteristics such asprivacy, decentralisation, and untraceability. Despite their growingpopularity, cryptocurrencies remain a high-risk investment due to their pricevolatility and uncertainty. The inherent volatility in cryptocurrency prices,coupled with internal cryptocurrency-related factors and external influentialglobal economic factors makes predicting their prices and price movementdirections challenging. Nevertheless, the knowledge obtained from predictingthe direction of cryptocurrency prices can provide valuable guidance forinvestors in making informed investment decisions. To address this issue, thispaper proposes a dynamic Bayesian network (DBN) approach, which can modelcomplex systems in multivariate settings, to predict the price movementdirection of five popular altcoins (cryptocurrencies other than Bitcoin) in thenext trading day. The efficacy of the proposed model in predictingcryptocurrency price directions is evaluated from two perspectives. Firstly,our proposed approach is compared to two baseline models, namely anauto-regressive integrated moving average and support vector regression.Secondly, from a feature engineering point of view, the impact of twenty-threedifferent features, grouped into four categories, on the DBN's predictionperformance is investigated. The experimental results demonstrate that the DBNsignificantly outperforms the baseline models. In addition, among the groups offeatures, technical indicators are found to be the most effective predictors ofcryptocurrency price directions.", "output": "Causal Feature Engineering of Price Directions of Cryptocurrencies using Dynamic Bayesian Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks often learn unintended biases during training, whichmight have harmful effects when deployed in real-world settings. This papersurveys 209 papers on bias in NLP models, most of which addresssociodemographic bias. To better understand the distinction between bias andreal-world harm, we turn to ideas from psychology and behavioral economics topropose a definition for sociodemographic bias. We identify three maincategories of NLP bias research: types of bias, quantifying bias, anddebiasing. We conclude that current approaches on quantifying bias facereliability issues, that many of the bias metrics do not relate to real-worldbiases, and that current debiasing techniques are superficial and hide biasrather than removing it. Finally, we provide recommendations for future work.", "output": "Survey on Sociodemographic Bias in Natural Language Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Foundation Large Language Models (LLMs) such as GPT-4 represent a revolutionin AI due to their real-world applications though natural language processing.However, they also pose many significant risks such as the presence of biased,private, or harmful text, and the unauthorized inclusion of copyrightedmaterial.We introduce h2oGPT, a suite of open-source code repositories for thecreation and use of Large Language Models (LLMs) based on Generative PretrainedTransformers (GPTs). The goal of this project is to create the world's besttruly open-source alternative to closed-source GPTs. In collaboration with andas part of the incredible and unstoppable open-source community, we open-sourceseveral fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready forcommercial use under fully permissive Apache 2.0 licenses. Included in ourrelease is 100% private document search using natural language.Open-source language models help boost AI development and make it moreaccessible and trustworthy. They lower entry hurdles, allowing people andgroups to tailor these models to their needs. This openness increasesinnovation, transparency, and fairness. An open-source strategy is needed toshare AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.", "output": "h2oGPT: Democratizing Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a method that dramatically reduces fine-tuning VRAM requirementsand rectifies quantization errors in quantized Large Language Models. First, wedevelop an extremely memory-efficient fine-tuning (EMEF) method for quantizedmodels using Low-Rank Adaptation (LoRA), and drawing upon it, we construct anerror-correcting algorithm designed to minimize errors induced by thequantization process. Our method reduces the memory requirements by up to 5.6times, which enables fine-tuning a 7 billion parameter Large Language Model(LLM) on consumer laptops. At the same time, we propose a Low-Rank ErrorCorrection (LREC) method that exploits the added LoRA layers to ameliorate thegap between the quantized model and its float point counterpart. Our errorcorrection framework leads to a fully functional INT2 quantized LLM with thecapacity to generate coherent English text. To the best of our knowledge, thisis the first INT2 Large Language Model that has been able to reach such aperformance. The overhead of our method is merely a 1.05 times increase inmodel size, which translates to an effective precision of INT2.1. Also, ourmethod readily generalizes to other quantization standards, such as INT3, INT4,and INT8, restoring their lost performance, which marks a significant milestonein the field of model quantization. The strategies delineated in this paperhold promising implications for the future development and optimization ofquantized models, marking a pivotal shift in the landscape of low-resourcemachine learning computations.", "output": "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Proteolysis-Targeting Chimeras (PROTACs) represent a novel class of smallmolecules which are designed to act as a bridge between an E3 ligase and adisease-relevant protein, thereby promoting its subsequent degradation. PROTACsare composed of two protein binding \"active\" domains, linked by a \"linker\"domain. The design of the linker domain is challenging due to geometric andchemical constraints given by its interactions, and the need to maximizedrug-likeness. To tackle these challenges, we introduce ShapeLinker, a methodfor de novo design of linkers. It performs fragment-linking using reinforcementlearning on an autoregressive SMILES generator. The method optimizes for acomposite score combining relevant physicochemical properties and a novel,attention-based point cloud alignment score. This new method successfullygenerates linkers that satisfy both relevant 2D and 3D requirements, andachieves state-of-the-art results in producing novel linkers assuming a targetlinker conformation. This allows for more rational and efficient PROTAC designand optimization. Code and data are available at", "output": "Reinforcement Learning-Driven Linker Design via Fast Attention-based Point Cloud Alignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Conformer-based end-to-end models have become ubiquitous these days and arecommonly used in both streaming and non-streaming automatic speech recognition(ASR). Techniques like dual-mode and dynamic chunk training helped unifystreaming and non-streaming systems. However, there remains a performance gapbetween streaming with a full and limited past context. To address this issue,we propose the integration of a novel dynamic contextual carry-over mechanismin a state-of-the-art (SOTA) unified ASR system. Our proposed dynamic contextConformer (DCTX-Conformer) utilizes a non-overlapping contextual carry-overmechanism that takes into account both the left context of a chunk and one ormore preceding context embeddings. We outperform the SOTA by a relative 25.0%word error rate, with a negligible latency impact due to the additional contextembeddings.", "output": "DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite its centrality in the philosophy of cognitive science, there has beenlittle prior philosophical work engaging with the notion of representation incontemporary NLP practice. This paper attempts to fill that lacuna: drawing onideas from cognitive science, I introduce a framework for evaluating therepresentational claims made about components of neural NLP models, proposingthree criteria with which to evaluate whether a component of a model representsa property and operationalising these criteria using probing classifiers, apopular analysis technique in NLP (and deep learning more broadly).The project of operationalising a philosophically-informed notion ofrepresentation should be of interest to both philosophers of science and NLPpractitioners. It affords philosophers a novel testing-ground for claims aboutthe nature of representation, and helps NLPers organise the large literature onprobing experiments, suggesting novel avenues for empirical research.", "output": "Operationalising Representation in Natural Language Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Node classification on graphs is a significant task with a wide range ofapplications, including social analysis and anomaly detection. Even thoughgraph neural networks (GNNs) have produced promising results on this task,current techniques often presume that label information of nodes is accurate,which may not be the case in real-world applications. To tackle this issue, weinvestigate the problem of learning on graphs with label noise and develop anovel approach dubbed Consistent Graph Neural Network (CGNN) to solve it.Specifically, we employ graph contrastive learning as a regularization term,which promotes two views of augmented nodes to have consistent representations.Since this regularization term cannot utilize label information, it can enhancethe robustness of node representations to label noise. Moreover, to detectnoisy labels on the graph, we present a sample selection technique based on thehomophily assumption, which identifies noisy nodes by measuring the consistencybetween the labels with their neighbors. Finally, we purify these confidentnoisy labels to permit efficient semantic graph learning. Extensive experimentson three well-known benchmark datasets demonstrate the superiority of our CGNNover competing approaches.", "output": "Learning on Graphs under Label Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the pursuit of artificial general intelligence (AGI), we tackleAbstraction and Reasoning Corpus (ARC) tasks using a novel two-prongedapproach. We employ the Decision Transformer in an imitation learning paradigmto model human problem-solving, and introduce an object detection algorithm,the Push and Pull clustering method. This dual strategy enhances AI's ARCproblem-solving skills and provides insights for AGI progression. Yet, our workreveals the need for advanced data collection tools, robust training datasets,and refined model structures. This study highlights potential improvements forDecision Transformers and propels future AGI research.", "output": "Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As artificial intelligence spreads out to numerous fields, the application ofAI to sports analytics is also in the spotlight. However, one of the majorchallenges is the difficulty of automated acquisition of continuous movementdata during sports matches. In particular, it is a conundrum to reliably tracka tiny ball on a wide soccer pitch with obstacles such as occlusion andimitations. Tackling the problem, this paper proposes an inference framework ofball trajectory from player trajectories as a cost-efficient alternative toball tracking. We combine Set Transformers to get permutation-invariant andequivariant representations of the multi-agent contexts with a hierarchicalarchitecture that intermediately predicts the player ball possession to supportthe final trajectory inference. Also, we introduce the reality loss term andpostprocessing to secure the estimated trajectories to be physically realistic.The experimental results show that our model provides natural and accuratetrajectories as well as admissible player ball possession at the same time.Lastly, we suggest several practical applications of our framework includingmissing trajectory imputation, semi-automated pass annotation, automatedzoom-in for match broadcasting, and calculating possession-wise runningperformance metrics.", "output": "Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image segmentation based on deep learning often fails when deployedon images from a different domain. The domain adaptation methods aim to solvedomain-shift challenges, but still face some problems. The transfer learningmethods require annotation on the target domain, and the generativeunsupervised domain adaptation (UDA) models ignore domain-specificrepresentations, whose generated quality highly restricts segmentationperformance. In this study, we propose a novel Structure-Modal Constrained(SMC) UDA framework based on a discriminative paradigm and introduce edgestructure as a bridge between domains. The proposed multi-modal learningbackbone distills structure information from image texture to distinguishdomain-invariant edge structure. With the structure-constrained self-learningand progressive ROI, our methods segment the kidney by locating the 3D spatialstructure of the edge. We evaluated SMC-UDA on public renal segmentationdatasets, adapting from the labeled source domain (CT) to the unlabeled targetdomain (CT/MRI). The experiments show that our proposed SMC-UDA has a stronggeneralization and outperforms generative UDA methods.", "output": "SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inverse Reinforcement Learning (IRL) aims to reconstruct the reward functionfrom expert demonstrations to facilitate policy learning, and has demonstratedits remarkable success in imitation learning. To promote expert-like behavior,existing IRL methods mainly focus on learning global reward functions tominimize the trajectory difference between the imitator and the expert.However, these global designs are still limited by the redundant noise anderror propagation problems, leading to the unsuitable reward assignment andthus downgrading the agent capability in complex multi-stage tasks. In thispaper, we propose a novel Curricular Subgoal-based Inverse ReinforcementLearning (CSIRL) framework, that explicitly disentangles one task with severallocal subgoals to guide agent imitation. Specifically, CSIRL firstly introducesdecision uncertainty of the trained agent over expert trajectories todynamically select subgoals, which directly determines the exploration boundaryof different task stages. To further acquire local reward functions for eachstage, we customize a meta-imitation objective based on these curricularsubgoals to train an intrinsic reward generator. Experiments on the D4RL andautonomous driving benchmarks demonstrate that the proposed methods yieldsresults superior to the state-of-the-art counterparts, as well as betterinterpretability. Our code is available at ", "output": "Curricular Subgoals for Inverse Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although the prevention of AI vulnerabilities is critical to preserve thesafety and privacy of users and businesses, educational tools for robust AI arestill underdeveloped worldwide. We present the design, implementation, andassessment of Maestro. Maestro is an effective open-source game-based platformthat contributes to the advancement of robust AI education. Maestro providesgoal-based scenarios where college students are exposed to challenginglife-inspired assignments in a competitive programming environment. We assessedMaestro's influence on students' engagement, motivation, and learning successin robust AI. This work also provides insights into the design features ofonline learning tools that promote active learning opportunities in the robustAI domain. We analyzed the reflection responses (measured with Likert scales)of 147 undergraduate students using Maestro in two quarterly college courses inAI. According to the results, students who felt the acquisition of new skillsin robust AI tended to appreciate highly Maestro and scored highly on materialconsolidation, curiosity, and mastery in robust AI. Moreover, the leaderboard,our key gamification element in Maestro, has effectively contributed tostudents' engagement and learning. Results also indicate that Maestro can beeffectively adapted to any course length and depth without losing itseducational quality.", "output": "Maestro: A Gamified Platform for Teaching AI Robustness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cell recognition is a fundamental task in digital histopathology imageanalysis. Point-based cell recognition (PCR) methods normally require a vastnumber of annotations, which is extremely costly, time-consuming andlabor-intensive. Semi-supervised learning (SSL) can provide a shortcut to makefull use of cell information in gigapixel whole slide images without exhaustivelabeling. However, research into semi-supervised point-based cell recognition(SSPCR) remains largely overlooked. Previous SSPCR works are all built ondensity map-based PCR models, which suffer from unsatisfactory accuracy, slowinference speed and high sensitivity to hyper-parameters. To address theseissues, end-to-end PCR models are proposed recently. In this paper, we developa SSPCR framework suitable for the end-to-end PCR models for the first time.Overall, we use the current models to generate pseudo labels for unlabeledimages, which are in turn utilized to supervise the models training. Besides,we introduce a co-teaching strategy to overcome the confirmation bias problemthat generally exists in self-training. A distribution alignment technique isalso incorporated to produce high-quality, unbiased pseudo labels for unlabeleddata. Experimental results on four histopathology datasets concerning differenttypes of staining styles show the effectiveness and versatility of the proposedframework. Code is available attextcolor{magenta}{url{", "output": "Semi-supervised Cell Recognition under Point Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The bokeh effect is an artistic technique that blurs out-of-focus areas in aphotograph and has gained interest due to recent developments in text-to-imagesynthesis and the ubiquity of smart-phone cameras and photo-sharing apps. Priorwork on rendering bokeh effects have focused on post hoc image manipulation toproduce similar blurring effects in existing photographs using classicalcomputer graphics or neural rendering techniques, but have either depthdiscontinuity artifacts or are restricted to reproducing bokeh effects that arepresent in the training data. More recent diffusion based models can synthesizeimages with an artistic style, but either require the generation ofhigh-dimensional masks, expensive fine-tuning, or affect global imagecharacteristics. In this paper, we present GBSD, the first generativetext-to-image model that synthesizes photorealistic images with a bokeh style.Motivated by how image synthesis occurs progressively in diffusion models, ourapproach combines latent diffusion models with a 2-stage conditioning algorithmto render bokeh effects on semantically defined objects. Since we can focus theeffect on objects, this semantic bokeh effect is more versatile than classicalrendering techniques. We evaluate GBSD both quantitatively and qualitativelyand demonstrate its ability to be applied in both text-to-image andimage-to-image settings.", "output": "GBSD: Generative Bokeh with Stage Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modelling spatio-temporal processes on road networks is a task of growingimportance. While significant progress has been made on developingspatio-temporal graph neural networks (Gnns), existing works are built uponthree assumptions that are not practical on real-world road networks. First,they assume sensing on every node of a road network. In reality, due tobudget-constraints or sensor failures, all locations (nodes) may not beequipped with sensors. Second, they assume that sensing history is available atall installed sensors. This is unrealistic as well due to sensor failures, lossof packets during communication, etc. Finally, there is an assumption of staticroad networks. Connectivity within networks change due to road closures,constructions of new roads, etc. In this work, we develop FRIGATE to addressall these shortcomings. FRIGATE is powered by a spatio-temporal Gnn thatintegrates positional, topological, and temporal information into richinductive node representations. The joint fusion of this diverse information ismade feasible through a novel combination of gated Lipschitz embeddings withLstms. We prove that the proposed Gnn architecture is provably more expressivethan message-passing Gnns used in state-of-the-art algorithms. The higherexpressivity of FRIGATE naturally translates to superior empirical performanceconducted on real-world network-constrained traffic data. In addition, FRIGATEis robust to frugal sensor deployment, changes in road network connectivity,and temporal irregularity in sensing.", "output": "FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is very challenging for various visual tasks such as image fusion,pedestrian detection and image-to-image translation in low light conditions dueto the loss of effective target areas. In this case, infrared and visibleimages can be used together to provide both rich detail information andeffective target areas. In this paper, we present LLVIP, a visible-infraredpaired dataset for low-light vision. This dataset contains 30976 images, or15488 pairs, most of which were taken at very dark scenes, and all of theimages are strictly aligned in time and space. Pedestrians in the dataset arelabeled. We compare the dataset with other visible-infrared datasets andevaluate the performance of some popular visual algorithms including imagefusion, pedestrian detection and image-to-image translation on the dataset. Theexperimental results demonstrate the complementary effect of fusion on imageinformation, and find the deficiency of existing algorithms of the three visualtasks in very low-light conditions. We believe the LLVIP dataset willcontribute to the community of computer vision by promoting image fusion,pedestrian detection and image-to-image translation in very low-lightapplications. The dataset is being released in Raw data is also provided for furtherresearch such as image registration.", "output": "LLVIP: A Visible-infrared Paired Dataset for Low-light Vision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parameterized Quantum Circuits (PQC) are promising towards quantum advantageon near-term quantum hardware. However, due to the large quantum noises(errors), the performance of PQC models has a severe degradation on realquantum devices. Take Quantum Neural Network (QNN) as an example, the accuracygap between noise-free simulation and noisy results on IBMQ-Yorktown forMNIST-4 classification is over 60%. Existing noise mitigation methods aregeneral ones without leveraging unique characteristics of PQC; on the otherhand, existing PQC work does not consider noise effect. To this end, we presentQuantumNAT, a PQC-specific framework to perform noise-aware optimizations inboth training and inference stages to improve robustness. We experimentallyobserve that the effect of quantum noise to PQC measurement outcome is a linearmap from noise-free outcome with a scaling and a shift factor. Motivated bythat, we propose post-measurement normalization to mitigate the featuredistribution differences between noise-free and noisy scenarios. Furthermore,to improve the robustness against noise, we propose noise injection to thetraining process by inserting quantum error gates to PQC according to realisticnoise models of quantum hardware. Finally, post-measurement quantization isintroduced to quantize the measurement outcomes to discrete values, achievingthe denoising effect. Extensive experiments on 8 classification tasks using 6quantum devices demonstrate that QuantumNAT improves accuracy by up to 43%, andachieves over 94% 2-class, 80% 4-class, and 34% 10-class classificationaccuracy measured on real quantum computers. The code for construction andnoise-aware training of PQC is available in the TorchQuantum library.", "output": "QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Malware authors apply different techniques of control flow obfuscation, inorder to create new malware variants to avoid detection. Existing Siameseneural network (SNN)-based malware detection methods fail to correctly classifydifferent malware families when such obfuscated malware samples are present inthe training dataset, resulting in high false-positive rates. To address thisissue, we propose a novel task-aware few-shot-learning-based Siamese NeuralNetwork that is resilient against the presence of malware variants affected bysuch control flow obfuscation techniques. Using the average entropy features ofeach malware family as inputs, in addition to the image features, our modelgenerates the parameters for the feature layers, to more accurately adjust thefeature embedding for different malware families, each of which has obfuscatedmalware variants. In addition, our proposed method can classify malwareclasses, even if there are only one or a few training samples available. Ourmodel utilizes few-shot learning with the extracted features of a pre-trainednetwork (e.g., VGG-16), to avoid the bias typically associated with a modeltrained with a limited number of training samples. Our proposed approach ishighly effective in recognizing unique malware signatures, thus correctlyclassifying malware samples that belong to the same malware family, even in thepresence of obfuscated malware variants. Our experimental results, validated byN-way on N-shot learning, show that our model is highly effective inclassification accuracy, exceeding a rate textgreater 91%, compared to othersimilar methods.", "output": "Task-Aware Meta Learning-based Siamese Neural Network for Classifying Obfuscated Malware."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Cognitive Ledger Project is an effort to develop a modular system forturning users' personal data into structured information and machine learningmodels based on a blockchain-based infrastructure. In this work-in-progresspaper, we propose a cognitive architecture for cognitive digital twins. Thesuggested design embraces a cognitive blockchain (Cognitive ledger) at itscore. The architecture includes several modules that turn users' activities inthe digital environment into reusable knowledge objects and artificialintelligence that one day can work together to form the cognitive digital twinof users.", "output": "Cognitive Ledger Project: Towards Building Personal Digital Twins Through Cognitive Blockchain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FedL) has emerged as a popular technique for distributingmodel training over a set of wireless devices, via iterative local updates (atdevices) and global aggregations (at the server). In this paper, we developparallel successive learning (PSL), which expands the FedL architecture alongthree dimensions: (i) Network, allowing decentralized cooperation among thedevices via device-to-device (D2D) communications. (ii) Heterogeneity,interpreted at three levels: (ii-a) Learning: PSL considers heterogeneousnumber of stochastic gradient descent iterations with different mini-batchsizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with dataarrival and departure, where the distributions of local datasets evolve overtime, captured via a new metric for model/concept drift. (ii-c) Device: PSLconsiders devices with different computation and communication capabilities.(iii) Proximity, where devices have different distances to each other and theaccess point. PSL considers the realistic scenario where global aggregationsare conducted with idle times in-between them for resource efficiencyimprovements, and incorporates data dispersion and model dispersion with localmodel condensation into FedL. Our analysis sheds light on the notion of coldvs. warmed up models, and model inertia in distributed machine learning. Wethen propose network-aware dynamic model tracking to optimize the modellearning vs. resource efficiency tradeoff, which we show is an NP-hardsignomial programming problem. We finally solve this problem through proposinga general optimization solver. Our numerical results reveal new findings on theinterdependencies between the idle times in-between the global aggregations,model/concept drift, and D2D cooperation configuration.", "output": "Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "State-of-the-art deep Q-learning methods update Q-values using statetransition tuples sampled from the experience replay buffer. This strategyoften uniformly and randomly samples or prioritizes data sampling based onmeasures such as the temporal difference (TD) error. Such sampling strategiescan be inefficient at learning Q-function because a state's Q-value depends onthe Q-value of successor states. If the data sampling strategy ignores theprecision of the Q-value estimate of the next state, it can lead to useless andoften incorrect updates to the Q-values. To mitigate this issue, we organizethe agent's experience into a graph that explicitly tracks the dependencybetween Q-values of states. Each edge in the graph represents a transitionbetween two states by executing a single action. We perform value backups via abreadth-first search starting from that expands vertices in the graph startingfrom the set of terminal states and successively moving backward. Weempirically show that our method is substantially more data-efficient thanseveral baselines on a diverse range of goal-reaching tasks. Notably, theproposed method also outperforms baselines that consume more batches oftraining experience and operates from high-dimensional observational data suchas images.", "output": "Topological Experience Replay."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The dominant framework for off-policy multi-goal reinforcement learninginvolves estimating goal conditioned Q-value function. When learning to achievemultiple goals, data efficiency is intimately connected with the generalizationof the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a,g) using monolithic neural networks. To improve the generalization of theQ-function, we propose a bilinear decomposition that represents the Q-value viaa low-rank approximation in the form of a dot product between two vectorfields. The first vector field, f(s, a), captures the environment's localdynamics at the state s; whereas the second component, {phi}(s, g), capturesthe global relationship between the current state and the goal. We show thatour bilinear decomposition scheme substantially improves data efficiency, andhas superior transfer to out-of-distribution goals compared to prior methods.Empirical evidence is provided on the simulated Fetch robot task-suite anddexterous manipulation with a Shadow hand.", "output": "Bilinear value networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated scoring of open-ended student responses has the potential tosignificantly reduce human grader effort. Recent advances in automated scoringoften leverage textual representations based on pre-trained language modelssuch as BERT and GPT as input to scoring models. Most existing approaches traina separate model for each item/question, which is suitable for scenarios suchas essay scoring where items can be quite different from one another. However,these approaches have two limitations: 1) they fail to leverage item linkagefor scenarios such as reading comprehension where multiple items may share areading passage; 2) they are not scalable since storing one model per itembecomes difficult when models have a large number of parameters. In this paper,we report our (grand prize-winning) solution to the National Assessment ofEducation Progress (NAEP) automated scoring challenge for readingcomprehension. Our approach, in-context BERT fine-tuning, produces a singleshared scoring model for all items with a carefully-designed input structure toprovide contextual information on each item. We demonstrate the effectivenessof our approach via local evaluations using the training dataset provided bythe challenge. We also discuss the biases, common error types, and limitationsof our approach.", "output": "Automated Scoring for Reading Comprehension via In-context BERT Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the popularity of Graph Neural Networks (GNNs), various GNN-basedmethods have been designed to reason on knowledge graphs (KGs). An importantdesign component of GNN-based KG reasoning methods is called the propagationpath, which contains a set of involved entities in each propagation step.Existing methods use hand-designed propagation paths, ignoring the correlationbetween the entities and the query relation. In addition, the number ofinvolved entities will explosively grow at larger propagation steps. In thiswork, we are motivated to learn an adaptive propagation path in order to filterout irrelevant entities while preserving promising targets. First, we design anincremental sampling mechanism where the nearby targets and layer-wiseconnections can be preserved with linear complexity. Second, we design alearning-based sampling distribution to identify the semantically relatedentities. Extensive experiments show that our method is powerful, efficient,and semantic-aware. The code is available at", "output": "AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a convolution structure for learning SE(3)-equivariantfeatures from 3D point clouds. It can be viewed as an equivariant version ofkernel point convolutions (KPConv), a widely used convolution form to processpoint cloud data. Compared with existing equivariant networks, our design issimple, lightweight, fast, and easy to be integrated with existingtask-specific point cloud learning pipelines. We achieve these desirableproperties by combining group convolutions and quotient representations.Specifically, we discretize SO(3) to finite groups for their simplicity whileusing SO(2) as the stabilizer subgroup to form spherical quotient featurefields to save computations. We also propose a permutation layer to recoverSO(3) features from spherical features to preserve the capacity to distinguishrotations. Experiments show that our method achieves comparable or superiorperformance in various tasks, including object classification, pose estimation,and keypoint-matching, while consuming much less memory and running faster thanexisting work. The proposed method can foster the development of equivariantmodels for real-world applications based on point clouds.", "output": "E2PN: Efficient SE(3)-Equivariant Point Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the significant advances achieved in Artificial Neural Networks(ANNs), their design process remains notoriously tedious, depending primarilyon intuition, experience and trial-and-error. This human-dependent process isoften time-consuming and prone to errors. Furthermore, the models are generallybound to their training contexts, with no considerations to their surroundingenvironments. Continual adaptiveness and automation of neural networks is ofparamount importance to several domains where model accessibility is limitedafter deployment (e.g IoT devices, self-driving vehicles, etc.). Additionally,even accessible models require frequent maintenance post-deployment to overcomeissues such as Concept/Data Drift, which can be cumbersome and restrictive. Byleveraging and combining approaches from Neural Architecture Search (NAS) andContinual Learning (CL), more robust and adaptive agents can be developed. Thisstudy conducts the first extensive review on the intersection between NAS andCL, formalizing the prospective Continually-Adaptive Neural Networks (CANNs)paradigm and outlining research directions for lifelong autonomous ANNs.", "output": "Exploring the Intersection between Neural Architecture Search and Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In reinforcement learning (RL), adversarial policies can be developed bytraining an adversarial agent to minimize a target agent's rewards. Prior workhas studied black-box versions of these attacks where the adversary onlyobserves the world state and treats the target agent as any other part of theenvironment. However, this does not take into account additional structure inthe problem. In this work, we take inspiration from the literature on white-boxattacks to train more effective adversarial policies. We study white-boxadversarial policies and show that having access to a target agent's internalstate can be useful for identifying its vulnerabilities. We make twocontributions. (1) We introduce white-box adversarial policies where anattacker observes both a target's internal state and the world state at eachtimestep. We formulate ways of using these policies to attack agents in2-player games and text-generating language models. (2) We demonstrate thatthese policies can achieve higher initial and asymptotic performance against atarget agent than black-box controls. Code is available at", "output": "White-Box Adversarial Policies in Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "described by multiple instances (e.g., image patches) and simultaneouslyassociated with multiple labels. Existing MIML methods are useful in manyapplications but most of which suffer from relatively low accuracy and trainingefficiency due to several issues: i) the inter-label correlations(i.e., theprobabilistic correlations between the multiple labels corresponding to anobject) are neglected; ii) the inter-instance correlations (i.e., theprobabilistic correlations of different instances in predicting the objectlabel) cannot be learned directly (or jointly) with other types of correlationsdue to the missing instance labels; iii) diverse inter-correlations (e.g.,inter-label correlations, inter-instance correlations) can only be learned inmultiple stages. To resolve these issues, a new single-stage framework calledbroad multi-instance multi-label learning (BMIML) is proposed. In BMIML, thereare three innovative modules: i) an auto-weighted label enhancement learning(AWLEL) based on broad learning system (BLS) is designed, which simultaneouslyand efficiently captures the inter-label correlations while traditional BLScannot; ii) A specific MIML neural network called scalable multi-instanceprobabilistic regression (SMIPR) is constructed to effectively estimate theinter-instance correlations using the object label only, which can provideadditional probabilistic information for learning; iii) Finally, an interactivedecision optimization (IDO) is designed to combine and optimize the resultsfrom AWLEL and SMIPR and form a single-stage framework. Experiments show thatBMIML is highly competitive to (or even better than) existing methods inaccuracy and much faster than most MIML methods even for large medical imagedata sets (&gt; 90K images).", "output": "Single-Stage Broad Multi-Instance Multi-Label Learning (BMIML) with Diverse Inter-Correlations and its application to medical image classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A variety of different performance metrics are commonly used in the machinelearning literature for the evaluation of classification systems. Some of themost common ones for measuring quality of hard decisions are standard andbalanced accuracy, standard and balanced error rate, F-beta score, and Matthewscorrelation coefficient (MCC). In this document, we review the definition ofthese and other metrics and compare them with the expected cost (EC), a metricintroduced in every statistical learning course but rarely used in the machinelearning literature. We show that both the standard and balanced error ratesare special cases of the EC. Further, we show its relation with F-score and MCCand argue that EC is superior to these traditional metrics, being more elegant,general, and intuitive, as well as being based on basic principles fromstatistics.The metrics above measure the quality of hard decisions. Yet, most modernclassification systems output continuous scores for the classes which we maywant to evaluate directly. Metrics for measuring the quality of system scoresinclude the area under the ROC curve, equal error rate, cross-entropy, Brierscore, and Bayes EC or Bayes risk, among others. The last three metrics arespecial cases of a family of metrics given by the expected value of properscoring rules (PSRs). We review the theory behind these metrics and argue thatthey are the most principled way to measure the quality of the posteriorprobabilities produced by a system. Finally, we show how to use these metricsto compute the system's calibration loss and compare this metric with thestandard expected calibration error (ECE), arguing that calibration loss basedon PSRs is superior to the ECE for a variety of reasons.", "output": "Analysis and Comparison of Classification Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "How to boost speech pre-training with textual data is an unsolved problem dueto the fact that speech and text are very different modalities with distinctcharacteristics. In this paper, we propose a cross-modal Speech and LanguageModel (SpeechLM) to explicitly align speech and text pre-training with apre-defined unified discrete representation. Specifically, we introduce twoalternative discrete tokenizers to bridge the speech and text modalities,including phoneme-unit and hidden-unit tokenizers, which can be trained using asmall amount of paired speech-text data. Based on the trained tokenizers, weconvert the unlabeled speech and text data into tokens of phoneme units orhidden units. The pre-training objective is designed to unify the speech andthe text into the same discrete semantic space with a unified Transformernetwork. We evaluate SpeechLM on various spoken language processing tasksincluding speech recognition, speech translation, and universal representationevaluation framework SUPERB, demonstrating significant improvements oncontent-related tasks. Code and models are available at", "output": "SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The manifold hypothesis, which assumes that data lies on or close to anunknown manifold of low intrinsic dimension, is a staple of modern machinelearning research. However, recent work has shown that real-world data exhibitsdistinct non-manifold structures, i.e. singularities, that can lead toerroneous findings. Detecting such singularities is therefore crucial as aprecursor to interpolation and inference tasks. We address this issue bydeveloping a topological framework that (i) quantifies the local intrinsicdimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness'of a point along multiple scales. Our approach identifies singularities ofcomplex spaces, while also capturing singular structures and local geometriccomplexity in image data.", "output": "Topological Singularity Detection at Multiple Scales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Reinforcement Learning (RL) has emerged as a powerful paradigm fortraining neural policies to solve complex control tasks. However, thesepolicies tend to be overfit to the exact specifications of the task andenvironment they were trained on, and thus do not perform well when conditionsdeviate slightly or when composed hierarchically to solve even more complextasks. Recent work has shown that training a mixture of policies, as opposed toa single one, that are driven to explore different regions of the state-actionspace can address this shortcoming by generating a diverse set of behaviors,referred to as skills, that can be collectively used to great effect inadaptation tasks or for hierarchical planning. This is typically realized byincluding a diversity term - often derived from information theory - in theobjective function optimized by RL. However these approaches often requirecareful hyperparameter tuning to be effective. In this work, we demonstratethat less widely-used neuroevolution methods, specifically Quality Diversity(QD), are a competitive alternative to information-theory-augmented RL forskill discovery. Through an extensive empirical evaluation comparing eightstate-of-the-art algorithms (four flagship algorithms from each line of work)on the basis of (i) metrics directly evaluating the skills' diversity, (ii) theskills' performance on adaptation tasks, and (iii) the skills' performance whenused as primitives for hierarchical planning; QD methods are found to provideequal, and sometimes improved, performance whilst being less sensitive tohyperparameters and more scalable. As no single method is found to providenear-optimal performance across all environments, there is a rich scope forfurther research which we support by proposing future directions and providingoptimized open-source implementations.", "output": "Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Score-based generative models (SGMs) learn a family of noise-conditionalscore functions corresponding to the data density perturbed with increasinglylarge amounts of noise. These perturbed data densities are linked together bythe Fokker-Planck equation (FPE), a partial differential equation (PDE)governing the spatial-temporal evolution of a density undergoing a diffusionprocess. In this work, we derive a corresponding equation called the score FPEthat characterizes the noise-conditional scores of the perturbed data densities(i.e., their gradients). Surprisingly, despite the impressive empiricalperformance, we observe that scores learned through denoising score matching(DSM) fail to fulfill the underlying score FPE, which is an inherentself-consistency property of the ground truth score. We prove that satisfyingthe score FPE is desirable as it improves the likelihood and the degree ofconservativity. Hence, we propose to regularize the DSM objective to enforcesatisfaction of the score FPE, and we show the effectiveness of this approachacross various datasets.", "output": "FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "CORL is an open-source library that provides thoroughly benchmarkedsingle-file implementations of both deep offline and offline-to-onlinereinforcement learning algorithms. It emphasizes a simple developing experiencewith a straightforward codebase and a modern analysis tracking tool. In CORL,we isolate methods implementation into separate single files, makingperformance-relevant details easier to recognize. Additionally, an experimenttracking feature is available to help log metrics, hyperparameters,dependencies, and more to the cloud. Finally, we have ensured the reliabilityof the implementations by benchmarking commonly employed D4RL datasetsproviding a transparent source of results that can be reused for robustevaluation tools such as performance profiles, probability of improvement, orexpected online performance.", "output": "CORL: Research-oriented Deep Offline Reinforcement Learning Library."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a new perspective on time series forecasting. In existingtime series forecasting methods, the models take a sequence of numerical valuesas input and yield numerical values as output. The existing SOTA models arelargely based on the Transformer architecture, modified with multiple encodingmechanisms to incorporate the context and semantics around the historical data.Inspired by the successes of pre-trained language foundation models, we pose aquestion about whether these models can also be adapted to solve time-seriesforecasting. Thus, we propose a new forecasting paradigm: prompt-based timeseries forecasting (PromptCast). In this novel task, the numerical input andoutput are transformed into prompts and the forecasting task is framed in asentence-to-sentence manner, making it possible to directly apply languagemodels for forecasting purposes. To support and facilitate the research of thistask, we also present a large-scale dataset (PISA) that includes threereal-world forecasting scenarios. We evaluate different SOTA numerical-basedforecasting methods and language generation models. The benchmark results withvarious forecasting settings demonstrate the proposed PromptCast with languagegeneration models is a promising research direction. Additionally, incomparison to conventional numerical-based forecasting, PromptCast shows a muchbetter generalization ability under the zero-shot setting.", "output": "PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a smoothly broken power law functional form (that we refer to as aBroken Neural Scaling Law (BNSL)) that accurately models &amp; extrapolates thescaling behaviors of deep neural networks (i.e. how the evaluation metric ofinterest varies as amount of compute used for training (or inference), numberof model parameters, training dataset size, model input size, number oftraining steps, or upstream performance varies) for various architectures &amp; foreach of various tasks within a large &amp; diverse set of upstream &amp; downstreamtasks, in zero-shot, prompted, &amp; finetuned settings. This set includeslarge-scale vision, language, audio, video, diffusion, generative modeling,multimodal learning, contrastive learning, AI alignment, AI capabilities,robotics, out-of-distribution (OOD) generalization, continual learning,transfer learning, uncertainty estimation / calibration, OOD detection,adversarial robustness, distillation, sparsity, retrieval, quantization,pruning, fairness, molecules, computer programming/coding, math word problems,\"emergent phase transitions\", arithmetic, supervised learning,unsupervised/self-supervised learning, &amp; reinforcement learning (single agent &amp;multi-agent). When compared to other functional forms for neural scaling, thisfunctional form yields extrapolations of scaling behavior that are considerablymore accurate on this set. Moreover, this functional form accurately models &amp;extrapolates scaling behavior that other functional forms are incapable ofexpressing such as the nonmonotonic transitions present in the scaling behaviorof phenomena such as double descent &amp; the delayed, sharp inflection pointspresent in the scaling behavior of tasks such as arithmetic. Lastly, we usethis functional form to glean insights about the limit of the predictability ofscaling behavior. Code is available at", "output": "Broken Neural Scaling Laws."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personal assistants, automatic speech recognizers and dialogue understandingsystems are becoming more critical in our interconnected digital world. A clearexample is air traffic control (ATC) communications. ATC aims at guidingaircraft and controlling the airspace in a safe and optimal manner. Thesevoice-based dialogues are carried between an air traffic controller (ATCO) andpilots via very-high frequency radio channels. In order to incorporate thesenovel technologies into ATC (low-resource domain), large-scale annotateddatasets are required to develop the data-driven AI systems. Two examples areautomatic speech recognition (ASR) and natural language understanding (NLU). Inthis paper, we introduce the ATCO2 corpus, a dataset that aims at fosteringresearch on the challenging ATC field, which has lagged behind due to lack ofannotated data. The ATCO2 corpus covers 1) data collection and pre-processing,2) pseudo-annotations of speech data, and 3) extraction of ATC-related namedentities. The ATCO2 corpus is split into three subsets. 1) ATCO2-test-setcorpus contains 4 hours of ATC speech with manual transcripts and a subset withgold annotations for named-entity recognition (callsign, command, value). 2)The ATCO2-PL-set corpus consists of 5281 hours of unlabeled ATC data enrichedwith automatic transcripts from an in-domain speech recognizer, contextualinformation, speaker turn information, signal-to-noise ratio estimate andEnglish language detection score per sample. Both available for purchasethrough ELDA at <a href=\" http URL</a> 3)The ATCO2-test-set-1h corpus is a one-hour subset from the original test setcorpus, that we are offering for free at  We expectthe ATCO2 corpus will foster research on robust ASR and NLU not only in thefield of ATC communications but also in the general research community.", "output": "ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The traditional Neural Network-development process requires substantialexpert knowledge and relies heavily on intuition and trial-and-error. NeuralArchitecture Search (NAS) frameworks were introduced to robustly search fornetwork topologies, as well as facilitate the automated development of NeuralNetworks. While some optimization approaches -- such as Genetic Algorithms --have been extensively explored in the NAS context, other MetaheuristicOptimization algorithms have not yet been investigated. In this study, weevaluate the viability of Artificial Bee Colony optimization for NeuralArchitecture Search. Our proposed framework, HiveNAS, outperforms existingstate-of-the-art Swarm Intelligence-based NAS frameworks in a fraction of thetime.", "output": "HiveNAS: Neural Architecture Search using Artificial Bee Colony Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks in reinforcement learning (RL) often assumehighly-privileged access to the victim's parameters, environment, or data.Instead, this paper proposes a novel adversarial setting called a Cheap TalkMDP in which an Adversary can merely append deterministic messages to theVictim's observation, resulting in a minimal range of influence. The Adversarycannot occlude ground truth, influence underlying environment dynamics orreward signals, introduce non-stationarity, add stochasticity, see the Victim'sactions, or access their parameters. Additionally, we present a simplemeta-learning algorithm called Adversarial Cheap Talk (ACT) to trainAdversaries in this setting. We demonstrate that an Adversary trained with ACTstill significantly influences the Victim's training and testing performance,despite the highly constrained setting. Affecting train-time performancereveals a new attack vector and provides insight into the success and failuremodes of existing RL algorithms. More specifically, we show that an ACTAdversary is capable of harming performance by interfering with the learner'sfunction approximation, or instead helping the Victim's performance byoutputting useful features. Finally, we show that an ACT Adversary canmanipulate messages during train-time to directly and arbitrarily control theVictim at test-time. Project video and code are available at", "output": "Adversarial Cheap Talk."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual Learning (CL) is the process of learning ceaselessly a sequence oftasks. Most existing CL methods deal with independent data (e.g., images andtext) for which many benchmark frameworks and results under standardexperimental settings are available. However, CL methods for graph data (graphCL) are surprisingly underexplored because of (a) the lack of standardexperimental settings, especially regarding how to deal with the dependencybetween instances, (b) the lack of benchmark datasets and scenarios, and (c)high complexity in implementation and evaluation due to the dependency. In thispaper, regarding (a), we define four standard incremental settings (task-,class-, domain-, and time-incremental) for graph data, which are naturallyapplied to many node-, link-, and graph-level problems. Regarding (b), weprovide 25 benchmark scenarios based on 15 real-world graphs. Regarding (c), wedevelop BeGin, an easy and fool-proof framework for graph CL. BeGin is easilyextended since it is modularized with reusable modules for data processing,algorithm design, and evaluation. Especially, the evaluation module iscompletely separated from user code to eliminate potential mistakes. Using allthe above, we report extensive benchmark results of 10 graph CL methods.Compared to the latest benchmark for graph CL, using BeGin, we cover 3x morecombinations of incremental settings and levels of problems. All assets for thebenchmark framework are available at ", "output": "BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal effect estimation from observational data is a fundamental task inempirical sciences. It becomes particularly challenging when unobservedconfounders are involved in a system. This paper focuses on front-dooradjustment -- a classic technique which, using observed mediators allows toidentify causal effects even in the presence of unobserved confounding. Whilethe statistical properties of the front-door estimation are quite wellunderstood, its algorithmic aspects remained unexplored for a long time.Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the firstpolynomial-time algorithm for finding sets satisfying the front-door criterionin a given directed acyclic graph (DAG), with an $O(n^3(n+m))$ run time, where$n$ denotes the number of variables and $m$ the number of edges of the causalgraph. In our work, we give the first linear-time, i.e., $O(n+m)$, algorithmfor this task, which thus reaches the asymptotically optimal time complexity.This result implies an $O(n(n+m))$ delay enumeration algorithm of allfront-door adjustment sets, again improving previous work by Jeong et al. by afactor of $n^3$. Moreover, we provide the first linear-time algorithm forfinding a minimal front-door adjustment set. We offer implementations of ouralgorithms in multiple programming languages to facilitate practical usage andempirically validate their feasibility, even for large graphs.", "output": "Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce $textit{PCFTL (Probabilistic CounterFactual Temporal Logic)}$,a new probabilistic temporal logic for the verification of Markov DecisionProcesses (MDP). PCFTL is the first to include operators for causal reasoning,allowing us to express interventional and counterfactual queries. Given a pathformula $phi$, an interventional property is concerned with the satisfactionprobability of $phi$ if we apply a particular change $I$ to the MDP (e.g.,switching to a different policy); a counterfactual allows us to compute, givenan observed MDP path $tau$, what the outcome of $phi$ would have been had weapplied $I$ in the past. For its ability to reason about textit{what-if}scenarios involving different configurations of the MDP, our approachrepresents a departure from existing probabilistic temporal logics that canonly reason about a fixed system configuration. From a syntactic viewpoint, weintroduce a generalized counterfactual operator that subsumes bothinterventional and counterfactual probabilities as well as the traditionalprobabilistic operator found in e.g., PCTL. From a semantics viewpoint, ourlogic is interpreted over a structural causal model translation of the MDP,which gives us a representation amenable to counterfactual reasoning. Weevaluate PCFTL in the context of safe reinforcement learning using a benchmarkof grid-world models.", "output": "Causal Temporal Reasoning for Markov Decision Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the Detailed Outline Control (DOC) framework for improvinglong-range plot coherence when automatically generatingseveral-thousand-word-long stories. DOC consists of two complementarycomponents: a detailed outliner and a detailed controller. The detailedoutliner creates a more detailed, hierarchically structured outline, shiftingcreative burden from the main drafting procedure to the planning stage. Thedetailed controller ensures the more detailed outline is still respected duringgeneration by controlling story passages to align with outline details. Inhuman evaluations of automatically generated stories, DOC substantiallyoutperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5%absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humansalso judged DOC to be much more controllable in an interactive generationsetting.", "output": "DOC: Improving Long Story Coherence With Detailed Outline Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "EXplainable Artificial Intelligence (XAI) is a vibrant research topic in theartificial intelligence community, with growing interest across methods anddomains. Much has been written about the subject, yet XAI still lacks sharedterminology and a framework capable of providing structural soundness toexplanations. In our work, we address these issues by proposing a noveldefinition of explanation that is a synthesis of what can be found in theliterature. We recognize that explanations are not atomic but the combinationof evidence stemming from the model and its input-output mapping, and the humaninterpretation of this evidence. Furthermore, we fit explanations into theproperties of faithfulness (i.e., the explanation being a true description ofthe model's inner workings and decision-making process) and plausibility (i.e.,how much the explanation looks convincing to the user). Using our proposedtheoretical framework simplifies how these properties are operationalized andit provides new insight into common explanation methods that we analyze as casestudies.", "output": "A Theoretical Framework for AI Models Explainability with Application in Biomedicine."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale chest x-ray datasets have been curated for the detection ofabnormalities using deep learning, with the potential to provide substantialbenefits across many clinical applications. However, each dataset focuses onlyon a subset of findings that can be simultaneously present in a patient, makingit challenging to train models that aggregate multiple datasets together.Therefore, data harmonization is crucial to leverage these datasets inaggregate to train clinically useful models with a complete representation ofabnormalities that may occur within the thorax. To that end, we proposesurgical aggregation, a collaborative learning framework for harmonizing andaggregating knowledge from distributed heterogeneous datasets with partialannotations. We evaluate surgical aggregation across synthetic and real-worldheterogeneous datasets with partial annotations. Our results indicate thatsurgical aggregation outperforms current strategies, generalizes better, andhas the potential to facilitate the development of clinically useful modelseven when using datasets with heterogeneous disease labels.", "output": "Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI explanations are often mentioned as a way to improve human-AIdecision-making, but empirical studies have not found consistent evidence ofexplanations' effectiveness and, on the contrary, suggest that they canincrease overreliance when the AI system is wrong. While many factors mayaffect reliance on AI support, one important factor is how decision-makersreconcile their own intuition -- beliefs or heuristics, based on priorknowledge, experience, or pattern recognition, used to make judgments -- withthe information provided by the AI system to determine when to override AIpredictions. We conduct a think-aloud, mixed-methods study with two explanationtypes (feature- and example-based) for two prediction tasks to explore howdecision-makers' intuition affects their use of AI predictions andexplanations, and ultimately their choice of when to rely on AI. Our resultsidentify three types of intuition involved in reasoning about AI predictionsand explanations: intuition about the task outcome, features, and AIlimitations. Building on these, we summarize three observed pathways fordecision-makers to apply their own intuition and override AI predictions. Weuse these pathways to explain why (1) the feature-based explanations we useddid not improve participants' decision outcomes and increased theiroverreliance on AI, and (2) the example-based explanations we used improveddecision-makers' performance over feature-based explanations and helped achievecomplementary human-AI performance. Overall, our work identifies directions forfurther development of AI decision-support systems and explanation methods thathelp decision-makers effectively apply their intuition to achieve appropriatereliance on AI.", "output": "Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Instruction-following agents must ground language into their observation andaction spaces. Learning to ground language is challenging, typically requiringdomain-specific engineering or large quantities of human interaction data. Toaddress this challenge, we propose using pretrained vision-language models(VLMs) to supervise embodied agents. We combine ideas from model distillationand hindsight experience replay (HER), using a VLM to retroactively generatelanguage describing the agent's behavior. Simple prompting allows us to controlthe supervision signal, teaching an agent to interact with novel objects basedon their names (e.g., planes) or their features (e.g., colors) in a 3D renderedenvironment. Fewshot prompting lets us teach abstract category membership,including pre-existing categories (food vs toys) and ad-hoc ones (arbitrarypreferences over objects). Our work outlines a new and effective way to useinternet-scale VLMs, repurposing the generic language grounding acquired bysuch models to teach task-relevant groundings to embodied agents.", "output": "Distilling Internet-Scale Vision-Language Models into Embodied Agents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose an efficient method to ground pretrained text-only language modelsto the visual domain, enabling them to process arbitrarily interleavedimage-and-text data, and generate text interleaved with retrieved images. Ourmethod leverages the abilities of language models learnt from large scaletext-only pretraining, such as in-context learning and free-form textgeneration. We keep the language model frozen, and finetune input and outputlinear layers to enable cross-modality interactions. This allows our model toprocess arbitrarily interleaved image-and-text inputs, and generate free-formtext interleaved with retrieved images. We achieve strong zero-shot performanceon grounded tasks such as contextual image retrieval and multimodal dialogue,and showcase compelling interactive abilities. Our approach works with anyoff-the-shelf language model and paves the way towards an effective, generalsolution for leveraging pretrained language models in visually groundedsettings.", "output": "Grounding Language Models to Images for Multimodal Inputs and Outputs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training complex machine learning (ML) architectures requires a compute andtime consuming process of selecting the right optimizer and tuning itshyper-parameters. A new paradigm of learning optimizers from data has emergedas a better alternative to hand-designed ML optimizers. We propose Mnemosyneoptimizer, that uses Performers: implicit low-rank attention Transformers. Itcan learn to train entire neural network architectures including otherTransformers without any task-specific optimizer tuning. We show thatMnemosyne: (a) generalizes better than popular LSTM optimizer, (b) inparticular can successfully train Vision Transformers (ViTs) whilemeta--trained on standard MLPs and (c) can initialize optimizers for fasterconvergence in Robotics applications. We believe that these results open thepossibility of using Transformers to build foundational optimization modelsthat can address the challenges of regular Transformer training. We complementour results with an extensive theoretical analysis of the compact associativememory used by Mnemosyne.", "output": "Mnemosyne: Learning to Train Transformers with Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning methods can detect Android malware with very high accuracy.However, these classifiers have an Achilles heel, concept drift: they rapidlybecome out of date and ineffective, due to the evolution of malware apps andbenign apps. Our research finds that, after training an Android malwareclassifier on one year's worth of data, the F1 score quickly dropped from 0.99to 0.76 after 6 months of deployment on new test samples.In this paper, we propose new methods to combat the concept drift problem ofAndroid malware classifiers. Since machine learning technique needs to becontinuously deployed, we use active learning: we select new samples foranalysts to label, and then add the labeled samples to the training set toretrain the classifier. Our key idea is, similarity-based uncertainty is morerobust against concept drift. Therefore, we combine contrastive learning withactive learning. We propose a new hierarchical contrastive learning scheme, anda new sample selection technique to continuously train the Android malwareclassifier. Our evaluation shows that this leads to significant improvements,compared to previously published methods for active learning. Our approachreduces the false negative rate from 14% (for the best baseline) to 9%, whilealso reducing the false positive rate (from 0.86% to 0.48%). Also, our approachmaintains more consistent performance across a seven-year time period than pastmethods.", "output": "Continuous Learning for Android Malware Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although the variational autoencoder (VAE) and its conditional extension(CVAE) are capable of state-of-the-art results across multiple domains, theirprecise behavior is still not fully understood, particularly in the context ofdata (like images) that lie on or near a low-dimensional manifold. For example,while prior work has suggested that the globally optimal VAE solution can learnthe correct manifold dimension, a necessary (but not sufficient) condition forproducing samples from the true data distribution, this has never beenrigorously proven. Moreover, it remains unclear how such considerations wouldchange when various types of conditioning variables are introduced, or when thedata support is extended to a union of manifolds (e.g., as is likely the casefor MNIST digits and related). In this work, we address these points by firstproving that VAE global minima are indeed capable of recovering the correctmanifold dimension. We then extend this result to more general CVAEs,demonstrating practical scenarios whereby the conditioning variables allow themodel to adaptively learn manifolds of varying dimension across samples. Ouranalyses, which have practical implications for various CVAE design choices,are also supported by numerical results on both synthetic and real-worlddatasets.", "output": "Learning Manifold Dimensions with Conditional Variational Autoencoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "MADDPG is an algorithm in multi-agent reinforcement learning (MARL) thatextends the popular single-agent method, DDPG, to multi-agent scenarios.Importantly, DDPG is an algorithm designed for continuous action spaces, wherethe gradient of the state-action value function exists. For this algorithm towork in discrete action spaces, discrete gradient estimation must be performed.For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisationwhich relaxes a discrete distribution into a similar continuous one. Thismethod, however, is statistically biased, and a recent MARL benchmarking papersuggests that this bias makes MADDPG perform poorly in grid-world situations,where the action space is discrete. Fortunately, many alternatives to the GSexist, boasting a wide range of properties. This paper explores several ofthese alternatives and integrates them into MADDPG for discrete grid-worldscenarios. The corresponding impact on various performance metrics is thenmeasured and analysed. It is found that one of the proposed estimators performssignificantly better than the original GS in several tasks, achieving up to 55%higher returns, along with faster convergence.", "output": "Revisiting the Gumbel-Softmax in MADDPG."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing methods for explainable artificial intelligence (XAI), includingpopular feature importance measures such as SAGE, are mostly restricted to thebatch learning scenario. However, machine learning is often applied in dynamicenvironments, where data arrives continuously and learning must be done in anonline manner. Therefore, we propose iSAGE, a time- and memory-efficientincrementalization of SAGE, which is able to react to changes in the model aswell as to drift in the data-generating process. We further provide efficientfeature removal methods that break (interventional) and retain (observational)feature dependencies. Moreover, we formally analyze our explanation method toshow that iSAGE adheres to similar theoretical properties as SAGE. Finally, weevaluate our approach in a thorough experimental analysis based onwell-established data sets and data streams with concept drift.", "output": "iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human behavior is conditioned by codes and norms that constrain action.Rules, ``manners,'' laws, and moral imperatives are examples of classes ofconstraints that govern human behavior. These systems of constraints are\"messy:\" individual constraints are often poorly defined, what constraints arerelevant in a particular situation may be unknown or ambiguous, constraintsinteract and conflict with one another, and determining how to act within thebounds of the relevant constraints may be a significant challenge, especiallywhen rapid decisions are needed. Despite such messiness, humans incorporateconstraints in their decisions robustly and rapidly. General,artificially-intelligent agents must also be able to navigate the messiness ofsystems of real-world constraints in order to behave predictability andreliably. In this paper, we characterize sources of complexity in constraintprocessing for general agents and describe a computational-level analysis forsuch constraint compliance. We identify key algorithmic requirements based onthe computational-level analysis and outline an initial, exploratoryimplementation of a general approach to constraint compliance.", "output": "Computational-level Analysis of Constraint Compliance for General Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent success of ChatGPT and GPT-4 has drawn widespread attention tomultimodal dialogue systems. However, the academia community lacks a datasetthat can validate the multimodal generation capabilities of Visual LanguageModels (VLMs) in textual-visual chat tasks. In this paper, we construct two newmultimodal datasets: the synthetic CLEVR-ATVC dataset (620K) and the manuallypictured Fruit-ATVC dataset (50K), both featuring visual and text-based inputsand outputs. Additionally, to enable the multimodal system to reject humanrequests (i.e., demonstrate accountability), as in language-based ChatGPTconversations, we develop and incorporate specific rules into the datasets assupervisory signals. This allows the trained VLM to provide a yes or no answerafter visual and textual reasoning, accompanied by a language explanation as towhy the human instruction cannot be excuted. In our method, we propose atwo-state training procedure to train the image auto-encoder andauto-regressive transformer from scratch. The first state involves a discretevariational autoencoder (dVAE) to compress each image into short tokens, whichare then concatenated with text tokens as a single data stream to be fed intothe decoder-based transformer for generating visual re-creation and textualfeedback in the second state. We provide comprehensive analyses of experimentalresults in terms of re-created image quality, answer accuracy, and the modelbehavior when faced with uncertainty and imperfect user queries. We hope ourexplorations and findings contribute valuable insights regarding theaccountability of textual-visual generative models.", "output": "Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of recovering a latent graph where the observationsat each node are emph{aliased}, and transitions are stochastic. Observationsare gathered by an agent traversing the graph. Aliasing means that multiplenodes emit the same observation, so the agent can not know in which node it islocated. The agent needs to uncover the hidden topology as accurately aspossible and in as few steps as possible. This is equivalent to efficientrecovery of the transition probabilities of a partially observable Markovdecision process (POMDP) in which the observation probabilities are known. Analgorithm for efficiently exploring (and ultimately recovering) the latentgraph is provided. Our approach is exponentially faster than naive explorationin a variety of challenging topologies with aliased observations whileremaining competitive with existing baselines in the unaliased regime.", "output": "Fast exploration and learning of latent graphs with aliased observations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quality Diversity (QD) has emerged as a powerful alternative optimizationparadigm that aims at generating large and diverse collections of solutions,notably with its flagship algorithm MAP-ELITES (ME) which evolves solutionsthrough mutations and crossovers. While very effective for some unstructuredproblems, early ME implementations relied exclusively on random search toevolve the population of solutions, rendering them notoriouslysample-inefficient for high-dimensional problems, such as when evolving neuralnetworks. Follow-up works considered exploiting gradient information to guidethe search in order to address these shortcomings through techniques borrowedfrom either Black-Box Optimization (BBO) or Reinforcement Learning (RL). Whilemixing RL techniques with ME unlocked state-of-the-art performance for roboticscontrol problems that require a good amount of exploration, it also plaguedthese ME variants with limitations common among RL algorithms that ME was freeof, such as hyperparameter sensitivity, high stochasticity as well as traininginstability, including when the population size increases as some componentsare shared across the population in recent approaches. Furthermore, existingapproaches mixing ME with RL tend to be tied to a specific RL algorithm, whicheffectively prevents their use on problems where the corresponding RL algorithmfails. To address these shortcomings, we introduce a flexible framework thatallows the use of any RL algorithm and alleviates the aforementionedlimitations by evolving populations of agents (whose definition includehyperparameters and all learnable parameters) instead of just policies. Wedemonstrate the benefits brought about by our framework through extensivenumerical experiments on a number of robotics control problems, some of whichwith deceptive rewards, taken from the QD-RL literature.", "output": "Evolving Populations of Diverse RL Agents with MAP-Elites."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present LLaMA-Adapter, a lightweight adaption method to efficientlyfine-tune LLaMA into an instruction-following model. Using 52K self-instructdemonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters uponthe frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, andprepend them to the word tokens at higher transformer layers. Then, azero-initialized attention mechanism with zero gating is proposed, whichadaptively injects the new instructional cues into LLaMA, while effectivelypreserves its pre-trained knowledge. With our efficient training, LLaMA-Adaptercan generate high-quality responses, comparable to Alpaca with fully fine-tuned7B parameters. Besides language commands, our approach can be simply extendedto multi-modal instructions for learning image-conditioned LLaMA model, whichachieves superior reasoning performance on ScienceQA and COCO Captionbenchmarks. Furthermore, we also evaluate the zero-initialized attentionmechanism for fine-tuning other pre-trained models (ViT, RoBERTa) ontraditional vision and language tasks, demonstrating the superiorgeneralization capacity of our approach. Code is released at", "output": "LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the context of neuroevolution, Quality-Diversity algorithms have proveneffective in generating repertoires of diverse and efficient policies byrelying on the definition of a behavior space. A natural goal induced by thecreation of such a repertoire is trying to achieve behaviors on demand, whichcan be done by running the corresponding policy from the repertoire. However,in uncertain environments, two problems arise. First, policies can lackrobustness and repeatability, meaning that multiple episodes under slightlydifferent conditions often result in very different behaviors. Second, due tothe discrete nature of the repertoire, solutions vary discontinuously. Here wepresent a new approach to achieve behavior-conditioned trajectory generationbased on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrainsthe selection of solutions to those that are the most consistent in thebehavior space. Second, the Quality-Diversity Transformer (QDT), aTransformer-based model conditioned on continuous behavior descriptors, whichtrains on a dataset generated by policies from a ME-LS repertoire and learns toautoregressively generate sequences of actions that achieve target behaviors.Results show that ME-LS produces consistent and robust policies, and that itscombination with the QDT yields a single policy capable of achieving diversebehaviors on demand with high accuracy.", "output": "The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The fervor for Non-Fungible Tokens (NFTs) attracted countless creators,leading to a Big Bang of digital assets driven by latent or explicit forms ofinspiration, as in many creative processes. This work exploits VisionTransformers and graph-based modeling to delve into visual inspirationphenomena between NFTs over the years. Our goals include unveiling the mainstructural traits that shape visual inspiration networks, exploring theinterrelation between visual inspiration and asset performances, investigatingcrypto influence on inspiration processes, and explaining the inspirationrelationships among NFTs. Our findings unveil how the pervasiveness ofinspiration led to a temporary saturation of the visual feature space, theimpact of the dichotomy between inspiring and inspired NFTs on their financialperformance, and an intrinsic self-regulatory mechanism between markets andinspiration waves. Our work can serve as a starting point for gaining a broaderview of the evolution of Web3.", "output": "Visually Wired NFTs: Exploring the Role of Inspiration in Non-Fungible Tokens."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study contextual combinatorial bandits with probabilistically triggeredarms (C$^2$MAB-T) under a variety of smoothness conditions that capture a widerange of applications, such as contextual cascading bandits and contextualinfluence maximization bandits. Under the triggering probability modulated(TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novelanalysis that achieves an $tilde{O}(dsqrt{KT})$ regret bound, removing apotentially exponentially large factor $O(1/p_{min})$, where $d$ is thedimension of contexts, $p_{min}$ is the minimum positive probability that anyarm can be triggered, and batch-size $K$ is the maximum number of arms that canbe triggered per round. Under the variance modulated (VM) or triggeringprobability and variance modulated (TPVM) conditions, we propose a newvariance-adaptive algorithm VAC$^2$-UCB and derive a regret bound$tilde{O}(dsqrt{T})$, which is independent of the batch-size $K$. As avaluable by-product, our analysis technique and variance-adaptive algorithm canbe applied to the CMAB-T and C$^2$MAB setting, improving existing results thereas well. We also include experiments that demonstrate the improved performanceof our algorithms compared with benchmark algorithms on synthetic andreal-world datasets.", "output": "Contextual Combinatorial Bandits with Probabilistically Triggered Arms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a machine-learning-based tool for the Lean proof assistant thatsuggests relevant premises for theorems being proved by a user. The designprinciples for the tool are (1) tight integration with the proof assistant, (2)ease of use and installation, (3) a lightweight and fast approach. For thispurpose, we designed a custom version of the random forest model, trained in anonline fashion. It is implemented directly in Lean, which was possible thanksto the rich and efficient metaprogramming features of Lean 4. The random forestis trained on data extracted from mathlib -- Lean's mathematics library. Weexperiment with various options for producing training features and labels. Theadvice from a trained model is accessible to the user via the suggest_premisestactic which can be called in an editor while constructing a proofinteractively.", "output": "Machine-Learned Premise Selection for Lean."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Privacy-utility tradeoff remains as one of the fundamental issues ofdifferentially private machine learning. This paper introduces a geometricallyinspired kernel-based approach to mitigate the accuracy-loss issue inclassification. In this approach, a representation of the affine hull of givendata points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leadsto a novel distance measure that hides privacy-sensitive information aboutindividual data points and improves the privacy-utility tradeoff viasignificantly reducing the risk of membership inference attacks. Theeffectiveness of the approach is demonstrated through experiments on MNISTdataset, Freiburg groceries dataset, and a real biomedical dataset. It isverified that the approach remains computationally practical. The applicationof the approach to federated learning is considered and it is observed that theaccuracy-loss due to data being distributed is either marginal or notsignificantly high.", "output": "On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence has the remarkable ability to assemble basic skills intocomplex ones so as to solve complex tasks. This ability is equally importantfor Artificial Intelligence (AI), and thus, we assert that in addition to thedevelopment of large, comprehensive intelligent models, it is equally crucialto equip such models with the capability to harness various domain-specificexpert models for complex task-solving in the pursuit of Artificial GeneralIntelligence (AGI). Recent developments in Large Language Models (LLMs) havedemonstrated remarkable learning and reasoning abilities, making them promisingas a controller to select, synthesize, and execute external models to solvecomplex tasks. In this project, we develop OpenAGI, an open-source AGI researchplatform, specifically designed to offer complex, multi-step tasks andaccompanied by task-specific datasets, evaluation metrics, and a diverse rangeof extensible models. OpenAGI formulates complex tasks as natural languagequeries, serving as input to the LLM. The LLM subsequently selects,synthesizes, and executes models provided by OpenAGI to address the task.Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)mechanism, which uses the task-solving result as feedback to improve the LLM'stask-solving ability. Thus, the LLM is responsible for synthesizing variousexternal models for solving complex tasks, while RLTF provides feedback toimprove its task-solving ability, enabling a feedback loop for self-improvingAI. We believe that the paradigm of LLMs operating various expert models forcomplex task-solving is a promising approach towards AGI. To facilitate thecommunity's long-term improvement and evaluation of AGI's ability, weopen-source the code, benchmark, and evaluation methods of the OpenAGI projectat ", "output": "OpenAGI: When LLM Meets Domain Experts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans possess an extraordinary ability to create and utilize tools, allowingthem to overcome physical limitations and explore new frontiers. With theadvent of foundation models, AI systems have the potential to be equally adeptin tool use as humans. This paradigm, i.e., tool learning with foundationmodels, combines the strengths of specialized tools and foundation models toachieve enhanced accuracy, efficiency, and automation in problem-solving.Despite its immense potential, there is still a lack of a comprehensiveunderstanding of key challenges, opportunities, and future endeavors in thisfield. To this end, we present a systematic investigation of tool learning inthis paper. We first introduce the background of tool learning, including itscognitive origins, the paradigm shift of foundation models, and thecomplementary roles of tools and models. Then we recapitulate existing toollearning research into tool-augmented and tool-oriented learning. We formulatea general tool learning framework: starting from understanding the userinstruction, models should learn to decompose a complex task into severalsubtasks, dynamically adjust their plan through reasoning, and effectivelyconquer each sub-task by selecting appropriate tools. We also discuss how totrain models for improved tool-use capabilities and facilitate thegeneralization in tool learning. Considering the lack of a systematic toollearning evaluation in prior works, we experiment with 18 representative toolsand show the potential of current foundation models in skillfully utilizingtools. Finally, we discuss several open problems that require furtherinvestigation for tool learning. Overall, we hope this paper could inspirefuture research in integrating tools with foundation models.", "output": "Tool Learning with Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large data-driven image models are extensively used to support creative andartistic work. Under the currently predominant distribution-fitting paradigm, adataset is treated as ground truth to be approximated as closely as possible.Yet, many creative applications demand a diverse range of output, and creatorsoften strive to actively diverge from a given data distribution. We argue thatan adjustment of modelling objectives, from pure mode coverage towards modebalancing, is necessary to accommodate the goal of higher output diversity. Wepresent diversity weights, a training scheme that increases a model's outputdiversity by balancing the modes in the training dataset. First experiments ina controlled setting demonstrate the potential of our method. We discussconnections of our approach to diversity, equity, and inclusion in generativemachine learning more generally, and computational creativity specifically. Animplementation of our algorithm is available at", "output": "Towards Mode Balancing of Generative Models via Diversity Weights."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning new programming skills requires tailored guidance. With theemergence of advanced Natural Language Generation models like the ChatGPT API,there is now a possibility of creating a convenient and personalized tutoringsystem with AI for computer science education. This paper presents GPTutor, aChatGPT-powered programming tool, which is a Visual Studio Code extension usingthe ChatGPT API to provide programming code explanations. By integrating VisualStudio Code API, GPTutor can comprehensively analyze the provided code byreferencing the relevant source codes. As a result, GPTutor can use designedprompts to explain the selected code with a pop-up message. GPTutor is nowpublished at the Visual Studio Code Extension Marketplace, and its source codeis openly accessible on GitHub. Preliminary evaluation indicates that GPTutordelivers the most concise and accurate explanations compared to vanilla ChatGPTand GitHub Copilot. Moreover, the feedback from students and teachers indicatedthat GPTutor is user-friendly and can explain given codes satisfactorily.Finally, we discuss possible future research directions for GPTutor. Thisincludes enhancing its performance and personalization via further promptprogramming, as well as evaluating the effectiveness of GPTutor with realusers.", "output": "GPTutor: a ChatGPT-powered programming tool for code explanation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language has a strong influence on our perceptions of time and rewards. Thisraises the question of whether large language models, when asked the samequestion in different languages, show different preferences for rewards overtime and if their choices are similar to those of humans. In this study, weanalyze the responses of GPT-3.5 (hereafter referred to as GPT) to prompts inmultiple languages, exploring preferences between smaller, sooner rewards andlarger, later rewards. Our results show that GPT displays greater patience whenprompted in languages with weak future tense references (FTR), such as Germanand Mandarin, compared to languages with strong FTR, like English and French.These findings are consistent with the existing literature and suggest acorrelation between GPT's choices and the preferences of speakers of theselanguages. However, further analysis reveals that the preference for earlier orlater rewards does not systematically change with reward gaps, indicating alexicographic preference for earlier payments. While GPT may capture intriguingvariations across languages, our findings indicate that the choices made bythese models do not correspond to those of human decision-makers.", "output": "Exploring the Influence of Language on Time-Reward Perceptions in Large Language Models: A Study Using GPT-3.5."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In his seminal paper \"Computing Machinery and Intelligence\", Alan Turingintroduced the \"imitation game\" as part of exploring the concept of machineintelligence. The Turing Test has since been the subject of much analysis,debate, refinement and extension. Here we sidestep the question of whether aparticular machine can be labeled intelligent, or can be said to match humancapabilities in a given context. Instead, but inspired by Turing, we drawattention to the seemingly simpler challenge of determining whether one isinteracting with a human or with a machine, in the context of everyday life. Weare interested in reflecting upon the importance of this Human-or-Machinequestion and the use one may make of a reliable answer thereto. WhereasTuring's original test is widely considered to be more of a thought experiment,the Human-or-Machine question as discussed here has obvious practicalsignificance. And while the jury is still not in regarding the possibility ofmachines that can mimic human behavior with high fidelity in everyday contexts,we argue that near-term exploration of the issues raised here can contribute todevelopment methods for computerized systems, and may also improve ourunderstanding of human behavior in general.", "output": "Human or Machine: Reflections on Turing-Inspired Testing for the Everyday."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spectral-temporal graph neural network is a promising abstraction underlyingmost time series forecasting models that are based on graph neural networks(GNNs). However, more is needed to know about the underpinnings of this branchof methods. In this paper, we establish a theoretical framework that unravelsthe expressive power of spectral-temporal GNNs. Our results show that linearspectral-temporal GNNs are universal under mild assumptions, and theirexpressive power is bounded by our extended first-order Weisfeiler-Lemanalgorithm on discrete-time dynamic graphs. To make our findings useful inpractice on valid instantiations, we discuss related constraints in detail andoutline a theoretical blueprint for designing spatial and temporal modules inspectral domains. Building on these insights and to demonstrate how powerfulspectral-temporal GNNs are based on our framework, we propose a simpleinstantiation named Temporal Graph GegenConv (TGC), which significantlyoutperforms most existing models with only linear components and shows bettermodel efficiency.", "output": "How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diverse evaluation benchmarks play a crucial role to assess a wide range ofcapabilities of large language models (LLM). Although plenty of endeavors havebeen dedicated to building valuable benchmarks, there is still little workaiming at evaluating the capability of LLM in multistep interactiveenvironments. Noticing that LLM requires a text representation of theenvironment observations for interaction, we choose to fill such a blank bybuilding a novel benchmark based on the information user interface (InfoUI).InfoUI consists of rich text contents and can be represented in some textformats, thus is suitable for the assessment of interaction ability of LLM.Additionally, the complex structures of InfoUI can further raise a challengefor LLM to understand structured texts rather than plain texts. An interactionplatform is always used to evaluate an agent, however, there is still a lack ofa satisfactory interaction platform dedicated to InfoUI. Consequently, wepropose to build a novel easily-extendable, adaptable, and close-to-realityinteraction platform, Mobile-Env, to provide a base for an appropriatebenchmark. Based on Mobile-Env, an InfoUI task set WikiHow is then built toestablish a benchmark for the multistep interaction capability of LLM instructured text-based environments. Agents based on a series of LLMs are testedon the task set to obtain an insight into the potential and challenge of LLMfor InfoUI interaction. It is sincerely welcome that the community contributenew environments and new task sets for Mobile-Env to provide better testbenchmarks and facilitate the development of the corresponding domains.", "output": "Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Online surgical phase recognition plays a significant role towards buildingcontextual tools that could quantify performance and oversee the execution ofsurgical workflows. Current approaches are limited since they train spatialfeature extractors using frame-level supervision that could lead to incorrectpredictions due to similar frames appearing at different phases, and poorlyfuse local and global features due to computational constraints which canaffect the analysis of long videos commonly encountered in surgicalinterventions. In this paper, we present a two-stage method, called Long VideoTransformer (LoViT) for fusing short- and long-term temporal information thatcombines a temporally-rich spatial feature extractor and a multi-scale temporalaggregator consisting of two cascaded L-Trans modules based on self-attention,followed by a G-Informer module based on ProbSparse self-attention forprocessing global temporal information. The multi-scale temporal head thencombines local and global features and classifies surgical phases using phasetransition-aware supervision. Our approach outperforms state-of-the-art methodson the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet,LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracyon Cholec80 and a 3.1 pp improvement on AutoLaparo. Moreover, it achieves a 5.3pp improvement in phase-level Jaccard on AutoLaparo and a 1.55 pp improvementon Cholec80. Our results demonstrate the effectiveness of our approach inachieving state-of-the-art performance of surgical phase recognition on twodatasets of different surgical procedures and temporal sequencingcharacteristics whilst introducing mechanisms that cope with long videos.", "output": "LoViT: Long Video Transformer for Surgical Phase Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a novel transformer architecture for graph representationlearning. The core insight of our method is to fully consider the informationpropagation among nodes and edges in a graph when building the attention modulein the transformer blocks. Specifically, we propose a new attention mechanismcalled Graph Propagation Attention (GPA). It explicitly passes the informationamong nodes and edges in three ways, i.e. node-to-node, node-to-edge, andedge-to-node, which is essential for learning graph-structured data. On thisbasis, we design an effective transformer architecture named Graph PropagationTransformer (GPTrans) to further help learn graph data. We verify theperformance of GPTrans in a wide range of graph learning experiments on severalbenchmark datasets. These results show that our method outperforms manystate-of-the-art transformer-based graph models with better performance. Thecode will be released at ", "output": "Graph Propagation Transformer for Graph Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Millions of slum dwellers suffer from poor accessibility to urban servicesdue to inadequate road infrastructure within slums, and road planning for slumsis critical to the sustainable development of cities. Existing re-blocking orheuristic methods are either time-consuming which cannot generalize todifferent slums, or yield sub-optimal road plans in terms of accessibility andconstruction costs. In this paper, we present a deep reinforcement learningbased approach to automatically layout roads for slums. We propose a genericgraph model to capture the topological structure of a slum, and devise a novelgraph neural network to select locations for the planned roads. Through maskedpolicy optimization, our model can generate road plans that connect places in aslum at minimal construction costs. Extensive experiments on real-world slumsin different countries verify the effectiveness of our model, which cansignificantly improve accessibility by 14.3% against existing baseline methods.Further investigations on transferring across different tasks demonstrate thatour model can master road planning skills in simple scenarios and adapt them tomuch more complicated ones, indicating the potential of applying our model inreal-world slum upgrading. The code and data are available at", "output": "Road Planning for Slums via Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Indoor localization has gained significant attention in recent years due toits various applications in smart homes, industrial automation, and healthcare,especially since more people rely on their wireless devices for location-basedservices. Deep learning-based solutions have shown promising results inaccurately estimating the position of wireless devices in indoor environmentsusing wireless parameters such as Channel State Information (CSI) and ReceivedSignal Strength Indicator (RSSI). However, despite the success of deeplearning-based approaches in achieving high localization accuracy, these modelssuffer from a lack of generalizability and can not be readily-deployed to newenvironments or operate in dynamic environments without retraining. In thispaper, we propose meta-learning-based localization models to address the lackof generalizability that persists in conventionally trained DL-basedlocalization models. Furthermore, since meta-learning algorithms requirediverse datasets from several different scenarios, which can be hard to collectin the context of localization, we design and propose a new meta-learningalgorithm, TB-MAML (Task Biased Model Agnostic Meta Learning), intended tofurther improve generalizability when the dataset is limited. Lastly, weevaluate the performance of TB-MAML-based localization against conventionallytrained localization models and localization done using other meta-learningalgorithms.", "output": "A Meta-learning based Generalizable Indoor Localization Model using Channel State Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural architecture search (NAS) for Graph neural networks (GNNs), calledNAS-GNNs, has achieved significant performance over manually designed GNNarchitectures. However, these methods inherit issues from the conventional NASmethods, such as high computational cost and optimization difficulty. Moreimportantly, previous NAS methods have ignored the uniqueness of GNNs, whereGNNs possess expressive power without training. With the randomly-initializedweights, we can then seek the optimal architecture parameters via the sparsecoding objective and derive a novel NAS-GNNs method, namely neural architecturecoding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and canefficiently compute in linear time. Empirical evaluations on multiple GNNbenchmark datasets demonstrate that our approach leads to state-of-the-artperformance, which is up to $200times$ faster and $18.8%$ more accurate thanthe strong baselines.", "output": "Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic occupancy prediction aims to infer dense geometry and semantics ofsurroundings for an autonomous agent to operate safely in the 3D environment.Existing occupancy prediction methods are almost entirely trained onhuman-annotated volumetric data. Although of high quality, the generation ofsuch 3D annotations is laborious and costly, restricting them to a few specificobject categories in the training dataset. To address this limitation, thispaper proposes Open Vocabulary Occupancy (OVO), a novel approach that allowssemantic occupancy prediction of arbitrary classes but without the need for 3Dannotations during training. Keys to our approach are (1) knowledgedistillation from a pre-trained 2D open-vocabulary segmentation model to the 3Doccupancy network, and (2) pixel-voxel filtering for high-quality training datageneration. The resulting framework is simple, compact, and compatible withmost state-of-the-art semantic occupancy prediction models. On NYUv2 andSemanticKITTI datasets, OVO achieves competitive performance compared tosupervised semantic occupancy prediction approaches. Furthermore, we conductextensive analyses and ablation studies to offer insights into the design ofthe proposed framework. Our code is publicly available at", "output": "OVO: Open-Vocabulary Occupancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The current trend of scaling language models involves increasing bothparameter count and training dataset size. Extrapolating this trend suggeststhat training dataset size may soon be limited by the amount of text dataavailable on the internet. Motivated by this limit, we investigate scalinglanguage models in data-constrained regimes. Specifically, we run a large setof experiments varying the extent of data repetition and compute budget,ranging up to 900 billion training tokens and 9 billion parameter models. Wefind that with constrained data for a fixed compute budget, training with up to4 epochs of repeated data yields negligible changes to loss compared to havingunique data. However, with more repetition, the value of adding computeeventually decays to zero. We propose and empirically validate a scaling lawfor compute optimality that accounts for the decreasing value of repeatedtokens and excess parameters. Finally, we experiment with approaches mitigatingdata scarcity, including augmenting the training dataset with code data orremoving commonly used filters. Models and datasets from our 400 training runsare freely available at ", "output": "Scaling Data-Constrained Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although there has been significant interest in applying machine learningtechniques to structured data, the expressivity (i.e., a description of whatcan be learned) of such techniques is still poorly understood. In this paper,we study data transformations based on graph neural networks (GNNs). First, wenote that the choice of how a dataset is encoded into a numeric formprocessable by a GNN can obscure the characterisation of a model'sexpressivity, and we argue that a canonical encoding provides an appropriatebasis. Second, we study the expressivity of monotonic max-sum GNNs, which covera subclass of GNNs with max and sum aggregation functions. We show that, foreach such GNN, one can compute a Datalog program such that applying the GNN toany dataset produces the same facts as a single round of application of theprogram's rules to the dataset. Monotonic max-sum GNNs can sum an unboundednumber of feature vectors which can result in arbitrarily large feature values,whereas rule application requires only a bounded number of constants. Hence,our result shows that the unbounded summation of monotonic max-sum GNNs doesnot increase their expressive power. Third, we sharpen our result to thesubclass of monotonic max GNNs, which use only the max aggregation function,and identify a corresponding class of Datalog programs.", "output": "On the Correspondence Between Monotonic Max-Sum GNNs and Datalog."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parallel neurosymbolic architectures have been applied effectively in NLP bydistilling knowledge from a logic theory into a deep model.However, prior artfaces several limitations including supporting restricted forms of logictheories and relying on the assumption of independence between the logic andthe deep network. We present Concordia, a framework overcoming the limitationsof prior art. Concordia is agnostic both to the deep network and the logictheory offering support for a wide range of probabilistic theories. Ourframework can support supervised training of both components and unsupervisedtraining of the neural component. Concordia has been successfully applied totasks beyond NLP and data classification, improving the accuracy ofstate-of-the-art on collective activity detection, entity linking andrecommendation tasks.", "output": "Parallel Neurosymbolic Integration with Concordia."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) successfully model natural language from vastamounts of text without the need for explicit supervision. In this paper, weinvestigate the efficacy of LLMs in modeling passwords. We present PassGPT, aLLM trained on password leaks for password generation. PassGPT outperformsexisting methods based on generative adversarial networks (GAN) by guessingtwice as many previously unseen passwords. Furthermore, we introduce theconcept of guided password generation, where we leverage PassGPT samplingprocedure to generate passwords matching arbitrary constraints, a feat lackingin current GAN-based strategies. Lastly, we conduct an in-depth analysis of theentropy and probability distribution that PassGPT defines over passwords anddiscuss their use in enhancing existing password strength estimators.", "output": "PassGPT: Password Modeling and (Guided) Generation with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative AI has experienced remarkable growth in recent years, leading to awide array of applications across diverse domains. In this paper, we present acomprehensive survey of more than 350 generative AI applications, providing astructured taxonomy and concise descriptions of various unimodal and evenmultimodal generative AIs. The survey is organized into sections, covering awide range of unimodal generative AI applications such as text, images, video,gaming and brain information. Our survey aims to serve as a valuable resourcefor researchers and practitioners to navigate the rapidly expanding landscapeof generative AI, facilitating a better understanding of the currentstate-of-the-art and fostering further innovation in the field.", "output": "A survey of Generative AI Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthesizing novel view images from a few views is a challenging butpractical problem. Existing methods often struggle with producing high-qualityresults or necessitate per-object optimization in such few-view settings due tothe insufficient information provided. In this work, we explore leveraging thestrong 2D priors in pre-trained diffusion models for synthesizing novel viewimages. 2D diffusion models, nevertheless, lack 3D awareness, leading todistorted image synthesis and compromising the identity. To address theseproblems, we propose DreamSparse, a framework that enables the frozenpre-trained diffusion model to generate geometry and identity-consistent novelview image. Specifically, DreamSparse incorporates a geometry module designedto capture 3D features from sparse views as a 3D prior. Subsequently, a spatialguidance model is introduced to convert these 3D feature maps into spatialinformation for the generative process. This information is then used to guidethe pre-trained diffusion model, enabling it to generate geometricallyconsistent images without tuning it. Leveraging the strong image priors in thepre-trained diffusion models, DreamSparse is capable of synthesizinghigh-quality novel views for both object and scene-level images andgeneralising to open-set images. Experimental results demonstrate that ourframework can effectively synthesize novel view images from sparse views andoutperforms baselines in both trained and open-set category images. Moreresults can be found on our project page:", "output": "DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since the release of OpenAI's ChatGPT, generative language models haveattracted extensive public attention. The increased usage has highlightedgenerative models' broad utility, but also revealed several forms of embeddedbias. Some is induced by the pre-training corpus; but additional bias specificto generative models arises from the use of subjective fine-tuning to avoidgenerating harmful content. Fine-tuning bias may come from individual engineersand company policies, and affects which prompts the model chooses to refuse. Inthis experiment, we characterize ChatGPT's refusal behavior using a black-boxattack. We first query ChatGPT with a variety of offensive and benign prompts(n=1,706), then manually label each response as compliance or refusal. Manualexamination of responses reveals that refusal is not cleanly binary, and lieson a continuum; as such, we map several different kinds of responses to abinary of compliance or refusal. The small manually-labeled dataset is used totrain a refusal classifier, which achieves an accuracy of 96%. Second, we usethis refusal classifier to bootstrap a larger (n=10,000) dataset adapted fromthe Quora Insincere Questions dataset. With this machine-labeled data, we traina prompt classifier to predict whether ChatGPT will refuse a given question,without seeing ChatGPT's response. This prompt classifier achieves 76% accuracyon a test set of manually labeled questions (n=985). We examine our classifiersand the prompt n-grams that are most predictive of either compliance orrefusal. Our datasets and code are available at", "output": "I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dialogue response selection aims to select an appropriate response fromseveral candidates based on a given user and system utterance history. Recentstudies have been improving the accuracy of dialogue response selection throughpost-training, mostly relying on naive masked language modeling methods.However, the recently developed generative methods have shown promising textrepresentation capabilities in IR community, which could potentially lead tobetter dialogue semantics modeling. Thus, in this paper, we propose Dial-MAE(Dialogue Contextual Masking Auto-encoder), a straightforward yet effectivepost-training technique tailored for dialogue response selection. Dial-MAE usesan asymmetric encoder-decoder architecture that learns to better compress thesemantics of the dialogue into dialogue-dense vectors. The process of Dial-MAEinvolves a deep encoder creating a dialogue embedding with the masked dialoguecontext, followed by a shallow decoder that uses this embedding along with thehighly masked response to restore the original response. Our experiments havedemonstrated that Dial-MAE is highly effective, achieving state-of-the-artperformance on two commonly evaluated benchmarks.", "output": "ConTextual Masked Auto-Encoder for Retrieval-based Dialogue Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Instruction-tuned large language models have revolutionized natural languageprocessing and have shown great potential in applications such asconversational agents. These models, such as GPT-4, can not only masterlanguage but also solve complex tasks in areas like mathematics, coding,medicine, and law. Despite their impressive capabilities, there is still a lackof comprehensive understanding regarding their full potential, primarily due tothe black-box nature of many models and the absence of holistic evaluationstudies. To address these challenges, we present INSTRUCTEVAL, a morecomprehensive evaluation suite designed specifically for instruction-tunedlarge language models. Unlike previous works, our evaluation involves arigorous assessment of models based on problem-solving, writing ability, andalignment to human values. We take a holistic approach to analyze variousfactors affecting model performance, including the pretraining foundation,instruction-tuning data, and training methods. Our findings reveal that thequality of instruction data is the most crucial factor in scaling modelperformance. While open-source models demonstrate impressive writing abilities,there is substantial room for improvement in problem-solving and alignment. Weare encouraged by the rapid development of models by the open-source community,but we also highlight the need for rigorous evaluation to support claims madeabout these models. Through INSTRUCTEVAL, we aim to foster a deeperunderstanding of instruction-tuned models and advancements in theircapabilities. INSTRUCTEVAL is publicly available at", "output": "INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt-based learning has been proved to be an effective way in pre-trainedlanguage models (PLMs), especially in low-resource scenarios like few-shotsettings. However, the trustworthiness of PLMs is of paramount significance andpotential vulnerabilities have been shown in prompt-based templates that couldmislead the predictions of language models, causing serious security concerns.In this paper, we will shed light on some vulnerabilities of PLMs, by proposinga prompt-based adversarial attack on manual templates in black box scenarios.First of all, we design character-level and word-level heuristic approaches tobreak manual templates separately. Then we present a greedy algorithm for theattack based on the above heuristic destructive approaches. Finally, weevaluate our approach with the classification tasks on three variants of BERTseries models and eight datasets. And comprehensive experimental resultsjustify the effectiveness of our approach in terms of attack success rate andattack speed. Further experimental studies indicate that our proposed methodalso displays good capabilities in scenarios with varying shot counts, templatelengths and query counts, exhibiting good generalizability.", "output": "COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel adversarial defence mechanism for imageclassification -- CARSO -- inspired by cues from cognitive neuroscience. Themethod is synergistically complementary to adversarial training and relies onknowledge of the internal representation of the attacked classifier. Exploitinga generative model for adversarial purification, conditioned on suchrepresentation, it samples reconstructions of inputs to be finally classified.Experimental evaluation by a well-established benchmark of varied, strongadaptive attacks, across diverse image datasets and classifier architectures,shows that CARSO is able to defend the classifier significantly better thanstate-of-the-art adversarial training alone -- with a tolerable clean accuracytoll. Furthermore, the defensive architecture succeeds in effectively shieldingitself from unforeseen threats, and end-to-end attacks adapted to foolstochastic defences. Code and pre-trained models are available at .", "output": "CARSO: Counter-Adversarial Recall of Synthetic Observations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a new real and synthetic dataset called NeRFBKspecifically designed for testing and comparing NeRF-based 3D reconstructionalgorithms. High-quality 3D reconstruction has significant potential in variousfields, and advancements in image-based algorithms make it essential toevaluate new advanced techniques. However, gathering diverse data with preciseground truth is challenging and may not encompass all relevant applications.The NeRFBK dataset addresses this issue by providing multi-scale, indoor andoutdoor datasets with high-resolution images and videos and camera parametersfor testing and comparing NeRF-based algorithms. This paper presents the designand creation of the NeRFBK benchmark, various examples and applicationscenarios, and highlights its potential for advancing the field of 3Dreconstruction.", "output": "NERFBK: A High-Quality Benchmark for NERF-Based 3D Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper tackles the fundamental passive eavesdropping problem in modernwireless communications in which the location and the channel state information(CSI) of the attackers are unknown. In this regard, we propose deploying anunmanned aerial vehicle (UAV) that serves as a mobile aerial relay (AR) to helpground base station (GBS) support a subset of vulnerable users. More precisely,our solution (1) clusters the single-antenna users in two groups to be eitherserved by the GBS directly or via the AR, (2) employs optimal multi-userbeamforming to the directly served users, and (3) optimizes the AR's 3Dposition, its multi-user beamforming matrix and transmit powers by combiningclosed-form solutions with machine learning techniques. Specifically, we designa plain beamforming and power optimization combined with a deep reinforcementlearning (DRL) algorithm for an AR to optimize its trajectory for the securitymaximization of the served users. Numerical results show that the multi-usermultiple input, single output (MU-MISO) system split between a GBS and an ARwith optimized transmission parameters without knowledge of the eavesdroppingchannels achieves high secrecy capacities that scale well with increasing thenumber of users.", "output": "UAV Trajectory and Multi-User Beamforming Optimization for Clustered Users Against Passive Eavesdropping Attacks With Unknown CSI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) offer significant promise as a knowledge sourcefor robotic task learning. Prompt engineering has been shown to be effectivefor eliciting knowledge from an LLM but alone is insufficient for acquiringrelevant, situationally grounded knowledge for an embodied robotic agentlearning novel tasks. We describe a cognitive-agent approach that extends andcomplements prompt engineering, mitigating its limitations, and thus enabling arobot to acquire new task knowledge matched to its native languagecapabilities, embodiment, environment, and user preferences. The approach is toincrease the response space of LLMs and deploy general strategies, embeddedwithin the autonomous robot, to evaluate, repair, and select among candidateresponses produced by the LLM. We describe the approach and experiments thatshow how a robot, by retrieving and evaluating a breadth of responses from theLLM, can achieve &gt;75% task completion in one-shot learning without useroversight. The approach achieves 100% task completion when human oversight(such as indication of preference) is provided, while greatly reducing how muchhuman oversight is needed.", "output": "Improving Knowledge Extraction from LLMs for Robotic Task Learning through Agent Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Event extraction (EE) is a crucial task aiming at extracting events fromtexts, which includes two subtasks: event detection (ED) and event argumentextraction (EAE). In this paper, we check the reliability of EE evaluations andidentify three major pitfalls: (1) The data preprocessing discrepancy makes theevaluation results on the same dataset not directly comparable, but the datapreprocessing details are not widely noted and specified in papers. (2) Theoutput space discrepancy of different model paradigms makes different-paradigmEE models lack grounds for comparison and also leads to unclear mapping issuesbetween predictions and annotations. (3) The absence of pipeline evaluation ofmany EAE-only works makes them hard to be directly compared with EE works andmay not well reflect the model performance in real-world pipeline scenarios. Wedemonstrate the significant influence of these pitfalls through comprehensivemeta-analyses of recent papers and empirical experiments. To avoid thesepitfalls, we suggest a series of remedies, including specifying datapreprocessing, standardizing outputs, and providing pipeline evaluationresults. To help implement these remedies, we develop a consistent evaluationframework OMNIEVENT, which can be obtained from", "output": "The Devil is in the Details: On the Pitfalls of Event Extraction Evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While several recent works have identified societal-scale andextinction-level risks to humanity arising from artificial intelligence, fewhave attempted an {em exhaustive taxonomy} of such risks. Many exhaustivetaxonomies are possible, and some are useful -- particularly if they reveal newrisks or practical approaches to safety. This paper explores a taxonomy basedon accountability: whose actions lead to the risk, are the actors unified, andare they deliberate? We also provide stories to illustrate how the various risktypes could each play out, including risks arising from unanticipatedinteractions of many AI systems, as well as risks from deliberate misuse, forwhich combined technical and policy solutions are indicated.", "output": "TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Strokes2Surface, an offline geometry-reconstruction pipeline builtupon a 4D Sketching Interface, MR.Sketch, targeted at architectural design. Thepipeline recovers a curve network from designer-drawn strokes, thus bridgingbetween concept design and digital modeling stages in architectural design. Theinput to our pipeline consists of 3D strokes' polyline vertices and theircorresponding timestamps (as of the fourth dimension), along with additionalgeometric and stylus-related recorded properties. Inspired by sketchconsolidation and sketch-based modeling methods, our pipeline leverages suchdata and combines three Machine Learning (ML) models; a classifier and twoclustering models. In particular, based on observations of practices designerstypically employ in architectural design sketches, we solve a binaryclassification problem to recognize whether a stroke depicts a boundary andedge or is used to fill in the enclosing areas and faces of the intendedarchitectural object. Followed by the two clustering models, strokes of eachtype are further parsed into groups, each representing either a single edge ora single face. Next, groups representing edges are approximated with B-splinecurves, followed by a topology-recovering process identifying and fixingdesired connectivities between the curves forming a well-connected curvenetwork. Next, groups representing the faces are employed to detect the cyclesbounding patches in the curve network, resulting in the final surface meshgeometry of the architectural object. We confirm the usability ofStrokes2Surface via a user study and further validate and compare our resultsagainst a range of reconstructions computed using alternative methods. We alsointroduce our manually labeled dataset of 4D architectural design sketches forfurther use in the community.", "output": "Strokes2Surface: Recovering Curve Networks From 4D Architectural Design Sketches."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quadruped animals seamlessly transition between gaits as they changelocomotion speeds. While the most widely accepted explanation for gaittransitions is energy efficiency, there is no clear consensus on thedetermining factor, nor on the potential effects from terrain properties. Inthis article, we propose that viability, i.e. the avoidance of falls,represents an important criterion for gait transitions. We investigate theemergence of gait transitions through the interaction between supraspinal drive(brain), the central pattern generator in the spinal cord, the body, andexteroceptive sensing by leveraging deep reinforcement learning and roboticstools. Consistent with quadruped animal data, we show that the walk-trot gaittransition for quadruped robots on flat terrain improves both viability andenergy efficiency. Furthermore, we investigate the effects of discrete terrain(i.e. crossing successive gaps) on imposing gait transitions, and find theemergence of trot-pronk transitions to avoid non-viable states. Compared withother potential criteria such as peak forces and energy efficiency, viabilityis the only improved factor after gait transitions on both flat and discretegap terrains, suggesting that viability could be a primary and universalobjective of gait transitions, while other criteria are secondary objectivesand/or a consequence of viability. Moreover, we deploy our learned controllerin sim-to-real hardware experiments and demonstrate state-of-the-art quadrupedagility in challenging scenarios, where the Unitree A1 quadruped autonomouslytransitions gaits between trot and pronk to cross consecutive gaps of up to 30cm (83.3 % of the body-length) at over 1.3 m/s.", "output": "DeepTransition: Viability Leads to the Emergence of Gait Transitions in Learning Anticipatory Quadrupedal Locomotion Skills."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, diffusion models have achieved remarkable performance in datageneration, e.g., generating high-quality images. Nevertheless, chemistrymolecules often have complex non-Euclidean spatial structures, with thebehavior changing dynamically and unpredictably. Most existing diffusion modelshighly rely on computing the probability distribution, i.e., Gaussiandistribution, in Euclidean space, which cannot capture internal non-Euclideanstructures of molecules, especially the hierarchical structures of the implicitmanifold surface represented by molecules. It has been observed that thecomplex hierarchical structures in hyperbolic embedding space become moreprominent and easier to be captured. In order to leverage both the datageneration power of diffusion models and the strong capability to extractcomplex geometric features of hyperbolic embedding, we propose to extend thediffusion model to hyperbolic manifolds for molecule generation, namely,Hyperbolic Graph Diffusion Model (HGDM). The proposed HGDM employs a hyperbolicvariational autoencoder to generate the hyperbolic hidden representation ofnodes and then a score-based hyperbolic graph neural network is used to learnthe distribution in hyperbolic space. Numerical experimental results show thatthe proposed HGDM achieves higher performance on several molecular datasets,compared with state-of-the-art methods.", "output": "Hyperbolic Graph Diffusion Model for Molecule Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Advanced persistent threats (APTs) have novel features such as multi-stagepenetration, highly-tailored intention, and evasive tactics. APTs defenserequires fusing multi-dimensional Cyber threat intelligence data to identifyattack intentions and conducts efficient knowledge discovery strategies bydata-driven machine learning to recognize entity relationships. However,data-driven machine learning lacks generalization ability on fresh or unknownsamples, reducing the accuracy and practicality of the defense model. Besides,the private deployment of these APT defense models on heterogeneousenvironments and various network devices requires significant investment incontext awareness (such as known attack entities, continuous network states,and current security strategies). In this paper, we propose a few-shotmulti-domain knowledge rearming (FMKR) scheme for context-aware defense againstAPTs. By completing multiple small tasks that are generated from differentnetwork domains with meta-learning, the FMKR firstly trains a model with gooddiscrimination and generalization ability for fresh and unknown APT attacks. Ineach FMKR task, both threat intelligence and local entities are fused into thesupport/query sets in meta-learning to identify possible attack stages.Secondly, to rearm current security strategies, an finetuning-based deploymentmechanism is proposed to transfer learned knowledge into the student model,while minimizing the defense cost. Compared to multiple model replacementstrategies, the FMKR provides a faster response to attack behaviors whileconsuming less scheduling cost. Based on the feedback from multiple real usersof the Industrial Internet of Things (IIoT) over 2 months, we demonstrate thatthe proposed scheme can improve the defense satisfaction rate.", "output": "Few-shot Multi-domain Knowledge Rearming for Context-aware Defence against Advanced Persistent Threats."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt learning has been proven to be highly effective in improvingpre-trained language model (PLM) adaptability, surpassing conventionalfine-tuning paradigms, and showing exceptional promise in an ever-growinglandscape of applications and APIs tailored for few-shot learning scenarios.Despite the growing prominence of prompt learning-based APIs, their securityconcerns remain underexplored. In this paper, we undertake a pioneering studyon the Trojan susceptibility of prompt-learning PLM APIs. We identified severalkey challenges, including discrete-prompt, few-shot, and black-box settings,which limit the applicability of existing backdoor attacks. To address thesechallenges, we propose TrojPrompt, an automatic and black-box framework toeffectively generate universal and stealthy triggers and insert Trojans intohard prompts. Specifically, we propose a universal API-driven trigger discoveryalgorithm for generating universal triggers for various inputs by queryingvictim PLM APIs using few-shot data samples. Furthermore, we introduce a novelprogressive trojan poisoning algorithm designed to generate poisoned promptsthat retain efficacy and transferability across a diverse range of models. Ourexperiments and results demonstrate TrojPrompt's capacity to effectively insertTrojans into text prompts in real-world black-box PLM APIs, while maintainingexceptional performance on clean test sets and significantly outperformingbaseline models. Our work sheds light on the potential security risks incurrent models and offers a potential defensive approach.", "output": "TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The present article highlights the pressing need for identifying andcontrolling illicit activities on the dark web. While only 4% of theinformation available on the internet is accessible through regular searchengines, the deep web contains a plethora of information, including personaldata and online accounts, that is not indexed by search engines. The dark web,which constitutes a subset of the deep web, is a notorious breeding ground forvarious illegal activities, such as drug trafficking, weapon sales, and moneylaundering. Against this backdrop, the authors propose a novel search enginethat leverages deep learning to identify and extract relevant images related toillicit activities on the dark web. Specifically, the system can detect thetitles of illegal activities on the dark web and retrieve pertinent images fromwebsites with a .onion extension. The authors have collected a comprehensivedataset named darkoob and the proposed method achieves an accuracy of 94% onthe test dataset. Overall, the proposed search engine represents a significantstep forward in identifying and controlling illicit activities on the dark web.By contributing to internet and community security, this technology has thepotential to mitigate a wide range of social, economic, and politicalchallenges arising from illegal activities on the dark web.", "output": "Dark web activity classification using deep learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With rich visual data, such as images, becoming readily associated withitems, visually-aware recommendation systems (VARS) have been widely used indifferent applications. Recent studies have shown that VARS are vulnerable toitem-image adversarial attacks, which add human-imperceptible perturbations tothe clean images associated with those items. Attacks on VARS pose new securitychallenges to a wide range of applications such as e-Commerce and socialnetworks where VARS are widely used. How to secure VARS from such adversarialattacks becomes a critical problem. Currently, there is still a lack ofsystematic study on how to design secure defense strategies against visualattacks on VARS. In this paper, we attempt to fill this gap by proposing anadversarial image reconstruction and detection framework to secure VARS. Ourproposed method can simultaneously (1) secure VARS from adversarial attackscharacterized by local perturbations by image reconstruction based on globalvision transformers; and (2) accurately detect adversarial examples using anovel contrastive learning approach. Meanwhile, our framework is designed to beused as both a filter and a detector so that they can be jointly trained toimprove the flexibility of our defense strategy to a variety of attacks andVARS models. We have conducted extensive experimental studies with two popularattack methods (FGSM and PGD). Our experimental results on two real-worlddatasets show that our defense strategy against visual attacks is effective andoutperforms existing methods on different attacks. Moreover, our method candetect adversarial examples with high accuracy.", "output": "Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The accurate modelling of the Point Spread Function (PSF) is of paramountimportance in astronomical observations, as it allows for the correction ofdistortions and blurring caused by the telescope and atmosphere. PSF modellingis crucial for accurately measuring celestial objects' properties. The lastdecades brought us a steady increase in the power and complexity ofastronomical telescopes and instruments. Upcoming galaxy surveys like Euclidand LSST will observe an unprecedented amount and quality of data. Modellingthe PSF for these new facilities and surveys requires novel modellingtechniques that can cope with the ever-tightening error requirements. Thepurpose of this review is three-fold. First, we introduce the opticalbackground required for a more physically-motivated PSF modelling and proposean observational model that can be reused for future developments. Second, weprovide an overview of the different physical contributors of the PSF,including the optic- and detector-level contributors and the atmosphere. Weexpect that the overview will help better understand the modelled effects.Third, we discuss the different methods for PSF modelling from the parametricand non-parametric families for ground- and space-based telescopes, with theiradvantages and limitations. Validation methods for PSF models are thenaddressed, with several metrics related to weak lensing studies discussed indetail. Finally, we explore current challenges and future directions in PSFmodelling for astronomical telescopes.", "output": "Point spread function modelling for astronomical telescopes: a review focused on weak gravitational lensing studies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we develop machine learning techniques to identify unknownprinters in early modern (c.~1500--1800) English printed books. Specifically,we focus on matching uniquely damaged character type-imprints in anonymouslyprinted books to works with known printers in order to provide evidence oftheir origins. Until now, this work has been limited to manual investigationsby analytical bibliographers. We present a Contrastive Attention-based MetricLearning approach to identify similar damage across character image pairs,which is sensitive to very subtle differences in glyph shapes, yet robust tovarious confounding sources of noise associated with digitized historicalbooks. To overcome the scarce amount of supervised data, we design a randomdata synthesis procedure that aims to simulate bends, fractures, and inkingvariations induced by the early printing process. Our method successfullyimproves downstream damaged type-imprint matching among printed works from thisperiod, as validated by in-domain human experts. The results of our approach ontwo important philosophical works from the Early Modern period demonstratepotential to extend the extant historical research about the origins andcontent of these books.", "output": "Contrastive Attention Networks for Attribution of Early Modern Print."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ensuring diagnostic performance of AI models before clinical use is key tothe safe and successful adoption of these technologies. Studies reporting AIapplied to digital pathology images for diagnostic purposes have rapidlyincreased in number in recent years. The aim of this work is to provide anoverview of the diagnostic accuracy of AI in digital pathology images from allareas of pathology. This systematic review and meta-analysis includeddiagnostic accuracy studies using any type of artificial intelligence appliedto whole slide images (WSIs) in any disease type. The reference standard wasdiagnosis through histopathological assessment and / or immunohistochemistry.Searches were conducted in PubMed, EMBASE and CENTRAL in June 2022. Weidentified 2976 studies, of which 100 were included in the review and 48 in thefull meta-analysis. Risk of bias and concerns of applicability were assessedusing the QUADAS-2 tool. Data extraction was conducted by two investigators andmeta-analysis was performed using a bivariate random effects model. 100 studieswere identified for inclusion, equating to over 152,000 whole slide images(WSIs) and representing many disease types. Of these, 48 studies were includedin the meta-analysis. These studies reported a mean sensitivity of 96.3% (CI94.1-97.7) and mean specificity of 93.3% (CI 90.5-95.4) for AI. There wassubstantial heterogeneity in study design and all 100 studies identified forinclusion had at least one area at high or unclear risk of bias. This reviewprovides a broad overview of AI performance across applications in whole slideimaging. However, there is huge variability in study design and availableperformance data, with details around the conduct of the study and make up ofthe datasets frequently missing. Overall, AI offers good accuracy when appliedto WSIs but requires more rigorous evaluation of its performance.", "output": "Diagnostic test accuracy (DTA) of artificial intelligence in digital pathology: a systematic review, meta-analysis and quality assessment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in zero-shot learning have enabled the use of pairedimage-text data to replace structured labels, replacing the need for expertannotated datasets. Models such as CLIP-based CheXzero utilize theseadvancements in the domain of chest X-ray interpretation. We hypothesize thatdomain pre-trained models such as CXR-BERT, BlueBERT, and ClinicalBERT offerthe potential to improve the performance of CLIP-like models with specificdomain knowledge by replacing BERT weights at the cost of breaking the originalmodel's alignment. We evaluate the performance of zero-shot classificationmodels with domain-specific pre-training for detecting low-prevalencepathologies. Even though replacing the weights of the original CLIP-BERTdegrades model performance on commonly found pathologies, we show thatpre-trained text towers perform exceptionally better on low-prevalencediseases. This motivates future ensemble models with a combination ofdifferently trained language models for maximal performance.", "output": "Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Motion retargeting is a fundamental problem in computer graphics and computervision. Existing approaches usually have many strict requirements, such as thesource-target skeletons needing to have the same number of joints or share thesame topology. To tackle this problem, we note that skeletons with differentstructure may have some common body parts despite the differences in jointnumbers. Following this observation, we propose a novel, flexible motionretargeting framework. The key idea of our method is to regard the body part asthe basic retargeting unit rather than directly retargeting the whole bodymotion. To enhance the spatial modeling capability of the motion encoder, weintroduce a pose-aware attention network (PAN) in the motion encoding phase.The PAN is pose-aware since it can dynamically predict the joint weights withineach body part based on the input pose, and then construct a shared latentspace for each body part by feature pooling. Extensive experiments show thatour approach can generate better motion retargeting results both qualitativelyand quantitatively than state-of-the-art methods. Moreover, we also show thatour framework can generate reasonable results even for a more challengingretargeting scenario, like retargeting between bipedal and quadrupedalskeletons because of the body part retargeting strategy and PAN. Our code ispublicly available.", "output": "Pose-aware Attention Network for Flexible Motion Retargeting by Body Part."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a robust and reliable evaluation metric for generative models byintroducing topological and statistical treatments for rigorous supportestimation. Existing metrics, such as Inception Score (IS), Fr'echet InceptionDistance (FID), and the variants of Precision and Recall (P&amp;R), heavily relyon supports that are estimated from sample features. However, the reliabilityof their estimation has not been seriously discussed (and overlooked) eventhough the quality of the evaluation entirely depends on it. In this paper, wepropose Topological Precision and Recall (TopP&amp;R, pronounced 'topper'), whichprovides a systematic approach to estimating supports, retaining onlytopologically and statistically important features with a certain level ofconfidence. This not only makes TopP&amp;R strong for noisy features, but alsoprovides statistical consistency. Our theoretical and experimental results showthat TopP&amp;R is robust to outliers and non-independent and identicallydistributed (Non-IID) perturbations, while accurately capturing the true trendof change in samples. To the best of our knowledge, this is the firstevaluation metric focused on the robust estimation of the support and providesits statistical consistency under noise.", "output": "TopP\\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a novel superpoint-based transformer architecture for efficientsemantic segmentation of large-scale 3D scenes. Our method incorporates a fastalgorithm to partition point clouds into a hierarchical superpoint structure,which makes our preprocessing 7 times times faster than existingsuperpoint-based approaches. Additionally, we leverage a self-attentionmechanism to capture the relationships between superpoints at multiple scales,leading to state-of-the-art performance on three challenging benchmarkdatasets: S3DIS (76.0% mIoU 6-fold validation), KITTI-360 (63.5% on Val), andDALES (79.6%). With only 212k parameters, our approach is up to 200 times morecompact than other state-of-the-art models while maintaining similarperformance. Furthermore, our model can be trained on a single GPU in 3 hoursfor a fold of the S3DIS dataset, which is 7x to 70x fewer GPU-hours than thebest-performing methods. Our code and models are accessible atgithub.com/drprojects/superpoint_transformer.", "output": "Efficient 3D Semantic Segmentation with Superpoint Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent progress in 3D scene understanding enables scalable learning ofrepresentations across large datasets of diverse scenes. As a consequence,generalization to unseen scenes and objects, rendering novel views from just asingle or a handful of input images, and controllable scene generation thatsupports editing, is now possible. However, training jointly on a large numberof scenes typically compromises rendering quality when compared to single-sceneoptimized models such as NeRFs. In this paper, we leverage recent progress indiffusion models to equip 3D scene representation learning models with theability to render high-fidelity novel views, while retaining benefits such asobject-level scene editing to a large degree. In particular, we propose DORSal,which adapts a video diffusion architecture for 3D scene generation conditionedon object-centric slot-based representations of scenes. On both complexsynthetic multi-object scenes and on the real-world large-scale Street Viewdataset, we show that DORSal enables scalable neural rendering of 3D sceneswith object-level editing and improves upon existing approaches.", "output": "DORSal: Diffusion for Object-centric Representations of Scenes $\\textit{et al.}$."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud segmentation is one of the most important tasks in computervision with widespread scientific, industrial, and commercial applications. Theresearch thereof has resulted in many breakthroughs in 3D object and sceneunderstanding. Previous methods typically utilized hierarchical architecturesfor feature representation. However, the commonly used sampling and groupingmethods in hierarchical networks are only based on point-wise three-dimensionalcoordinates, ignoring local semantic homogeneity of point clusters.Additionally, the prevalent Farthest Point Sampling (FPS) method is often acomputational bottleneck. To address these issues, we propose a novel 3D pointcloud representation network, called Dynamic Clustering Transformer Network(DCTNet). It has an encoder-decoder architecture, allowing for both local andglobal feature learning. Specifically, we propose novel semantic feature-baseddynamic sampling and clustering methods in the encoder, which enables the modelto be aware of local semantic homogeneity for local feature aggregation.Furthermore, in the decoder, we propose an efficient semantic feature-guidedupsampling method. Our method was evaluated on an object-based dataset(ShapeNet), an urban navigation dataset (Toronto-3D), and a multispectral LiDARdataset, verifying the performance of DCTNet across a wide variety of practicalengineering applications. The inference speed of DCTNet is 3.8-16.8$times$faster than existing State-of-the-Art (SOTA) models on the ShapeNet dataset,while achieving an instance-wise mIoU of $86.6%$, the current top score. Ourmethod similarly outperforms previous methods on the other datasets, verifyingit as the new State-of-the-Art in point cloud segmentation.", "output": "Dynamic Clustering Transformer Network for Point Cloud Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current approaches for knowledge distillation in semantic segmentation tendto adopt a holistic approach that treats all spatial locations equally.However, for dense prediction tasks, it is crucial to consider the knowledgerepresentation for different spatial locations in a different manner.Furthermore, edge regions between adjacent categories are highly uncertain dueto context information leakage, which is particularly pronounced for compactnetworks. To address this challenge, this paper proposes a novel approachcalled boundary-privileged knowledge distillation (BPKD). BPKD distills theknowledge of the teacher model's body and edges separately from the compactstudent model. Specifically, we employ two distinct loss functions: 1) EdgeLoss, which aims to distinguish between ambiguous classes at the pixel level inedge regions. 2) Body Loss, which utilizes shape constraints and selectivelyattends to the inner-semantic regions. Our experiments demonstrate that theproposed BPKD method provides extensive refinements and aggregation for edgeand body regions. Additionally, the method achieves state-of-the-artdistillation performance for semantic segmentation on three popular benchmarkdatasets, highlighting its effectiveness and generalization ability. BPKD showsconsistent improvements over various lightweight semantic segmentationstructures. The code is available at url{", "output": "BPKD: Boundary Privileged Knowledge Distillation For Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Speckle noise has long been an extensively studied problem in medicalimaging. In recent years, there have been significant advances in leveragingdeep learning methods for noise reduction. Nevertheless, adaptation ofsupervised learning models to unseen domains remains a challenging problem.Specifically, deep neural networks (DNNs) trained for computational imagingtasks are vulnerable to changes in the acquisition system's physicalparameters, such as: sampling space, resolution, and contrast. Even within thesame acquisition system, performance degrades across datasets of differentbiological tissues. In this work, we propose a few-shot supervised learningframework for optical coherence tomography (OCT) noise reduction, that offers adramatic increase in training speed and requires only a single image, or partof an image, and a corresponding speckle suppressed ground truth, for training.Furthermore, we formulate the domain shift problem for OCT diverse imagingsystems, and prove that the output resolution of a despeckling trained model isdetermined by the source domain resolution. We also provide possible remedies.We propose different practical implementations of our approach, verify andcompare their applicability, robustness, and computational efficiency. Ourresults demonstrate significant potential for generally improving samplecomplexity, generalization, and time efficiency, for coherent and non-coherentnoise reduction via supervised learning models, that can also be leveraged forother real-time computer vision applications.", "output": "Domain-Aware Few-Shot Learning for Optical Coherence Tomography Noise Reduction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have emerged as a powerful method of generative modelingacross a range of fields, capable of producing stunning photo-realistic imagesfrom natural language descriptions. However, these models lack explicit controlover the 3D structure of the objects in the generated images. In this paper, wepropose a novel method that incorporates 3D geometry control into diffusionmodels, making them generate even more realistic and diverse images. To achievethis, our method exploits ControlNet, which extends diffusion models by usingvisual prompts in addition to text prompts. We generate images of 3D objectstaken from a 3D shape repository (e.g., ShapeNet and Objaverse), render themfrom a variety of poses and viewing directions, compute the edge maps of therendered images, and use these edge maps as visual prompts to generaterealistic images. With explicit 3D geometry control, we can easily change the3D structures of the objects in the generated images and obtain ground-truth 3Dannotations automatically. This allows us to use the generated images toimprove a lot of vision tasks, e.g., classification and 3D pose estimation, inboth in-distribution (ID) and out-of-distribution (OOD) settings. Wedemonstrate the effectiveness of our method through extensive experiments onImageNet-50, ImageNet-R, PASCAL3D+, ObjectNet3D, and OOD-CV datasets. Theresults show that our method significantly outperforms existing methods acrossmultiple benchmarks (e.g., 4.6 percentage points on ImageNet-50 using ViT and3.5 percentage points on PASCAL3D+ and ObjectNet3D using NeMo).", "output": "Adding 3D Geometry Control to Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperspectral images are typically composed of hundreds of narrow andcontiguous spectral bands, each containing information about the materialcomposition of the imaged scene. However, these images can be affected byvarious sources of noise, distortions, or data losses, which can significantlydegrade their quality and usefulness. To address these problems, we introducetwo novel self-supervised Hyperspectral Images (HSI) inpainting algorithms: LowRank and Sparsity Constraint Plug-and-Play (LRS-PnP), and its extensionLRS-PnP-DIP, which features the strong learning capability, but is still freeof external training data. We conduct the stability analysis under some mildassumptions which guarantees the algorithm to converge. It is specifically veryhelpful for the practical applications. Extensive experiments demonstrate thatthe proposed solution is able to produce visually and qualitatively superiorinpainting results, achieving state-of-the-art performance. The code forreproducing the results is available aturl{", "output": "Self-supervised Deep Hyperspectral Inpainting with the Sparsity and Low-Rank Considerations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose an autonomous information seeking visual questionanswering framework, AVIS. Our method leverages a Large Language Model (LLM) todynamically strategize the utilization of external tools and to investigatetheir outputs, thereby acquiring the indispensable knowledge needed to provideanswers to the posed questions. Responding to visual questions that necessitateexternal knowledge, such as \"What event is commemorated by the buildingdepicted in this image?\", is a complex task. This task presents a combinatorialsearch space that demands a sequence of actions, including invoking APIs,analyzing their responses, and making informed decisions. We conduct a userstudy to collect a variety of instances of human decision-making when facedwith this task. This data is then used to design a system comprised of threecomponents: an LLM-powered planner that dynamically determines which tool touse next, an LLM-powered reasoner that analyzes and extracts key informationfrom the tool outputs, and a working memory component that retains the acquiredinformation throughout the process. The collected user behavior serves as aguide for our system in two key ways. First, we create a transition graph byanalyzing the sequence of decisions made by users. This graph delineatesdistinct states and confines the set of actions available at each state.Second, we use examples of user decision-making to provide our LLM-poweredplanner and reasoner with relevant contextual instances, enhancing theircapacity to make informed decisions. We show that AVIS achievesstate-of-the-art results on knowledge-intensive visual question answeringbenchmarks such as Infoseek and OK-VQA.", "output": "AVIS: Autonomous Visual Information Seeking with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As generative AI becomes more prevalent, it is important to study how humanusers interact with such models. In this work, we investigate how people usetext-to-image models to generate desired target images. To study thisinteraction, we created ArtWhisperer, an online game where users are given atarget image and are tasked with iteratively finding a prompt that creates asimilar-looking image as the target. Through this game, we recorded over 50,000human-AI interactions; each interaction corresponds to one text prompt createdby a user and the corresponding generated image. The majority of these arerepeated interactions where a user iterates to find the best prompt for theirtarget image, making this a unique sequential dataset for studying human-AIcollaborations. In an initial analysis of this dataset, we identify severalcharacteristics of prompt interactions and user strategies. People submitdiverse prompts and are able to discover a variety of text descriptions thatgenerate similar images. Interestingly, prompt diversity does not decrease asusers find better prompts. We further propose to a new metric the study thesteerability of AI using our dataset. We define steerability as the expectednumber of interactions required to adequately complete a task. We estimate thisvalue by fitting a Markov chain for each target task and calculating theexpected time to reach an adequate score in the Markov chain. We quantify andcompare AI steerability across different types of target images and twodifferent models, finding that images of cities and natural world images aremore steerable than artistic and fantasy images. These findings provideinsights into human-AI interaction behavior, present a concrete method ofassessing AI steerability, and demonstrate the general utility of theArtWhisperer dataset.", "output": "ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) models that achieve high average accuracy can stillunderperform on semantically coherent subsets (i.e. \"slices\") of data. Thisbehavior can have significant societal consequences for the safety or bias ofthe model in deployment, but identifying these underperforming slices can bedifficult in practice, especially in domains where practitioners lack access togroup annotations to define coherent subsets of their data. Motivated by thesechallenges, ML researchers have developed new slice discovery algorithms thataim to group together coherent and high-error subsets of data. However, therehas been little evaluation focused on whether these tools help humans formcorrect hypotheses about where (for which groups) their model underperforms. Weconduct a controlled user study (N = 15) where we show 40 slices output by twostate-of-the-art slice discovery algorithms to users, and ask them to formhypotheses about where an object detection model underperforms. Our resultsprovide positive evidence that these tools provide some benefit over a naivebaseline, and also shed light on challenges faced by users during thehypothesis formation step. We conclude by discussing design opportunities forML and HCI researchers. Our findings point to the importance of centering userswhen designing and evaluating new tools for slice discovery.", "output": "Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current state-of-the-art methods for text-to-shape generation either requiresupervised training using a labeled dataset of pre-defined 3D shapes, orperform expensive inference-time optimization of implicit neuralrepresentations. In this work, we present ZeroForge, an approach for zero-shottext-to-shape generation that avoids both pitfalls. To achieve open-vocabularyshape generation, we require careful architectural adaptation of existingfeed-forward approaches, as well as a combination of data-free CLIP-loss andcontrastive losses to avoid mode collapse. Using these techniques, we are ableto considerably expand the generative ability of existing feed-forwardtext-to-shape models such as CLIP-Forge. We support our method via extensivequalitative and quantitative evaluations", "output": "ZeroForge: Feedforward Text-to-Shape Without 3D Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Encoding whole slide images (WSI) as graphs is well motivated since it makesit possible for the gigapixel resolution WSI to be represented in its entiretyfor the purpose of graph learning. To this end, WSIs can be broken into smallerpatches that represent the nodes of the graph. Then, graph-based learningmethods can be utilized for the grading and classification of cancer. Messagepassing among neighboring nodes is the foundation of graph-based learningmethods. However, they do not take into consideration any positionalinformation for any of the patches, and if two patches are found intopologically isomorphic neighborhoods, their embeddings are nearly similar toone another. In this work, classification of cancer from WSIs is performed withpositional embedding and graph attention. In order to represent the positionalembedding of the nodes in graph classification, the proposed method makes useof spline convolutional neural networks (CNN). The algorithm is then testedwith the WSI dataset for grading prostate cancer and kidney cancer. Acomparison of the proposed method with leading approaches in cancer diagnosisand grading verify improved performance. The identification of cancerousregions in WSIs is another critical task in cancer diagnosis. In this work, theexplainability of the proposed model is also addressed. A gradient-basedexplainbility approach is used to generate the saliency mapping for the WSIs.This can be used to look into regions of WSI that are responsible for cancerdiagnosis thus rendering the proposed model explainable.", "output": "Explainable and Position-Aware Learning in Digital Pathology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual learning (CL) has attracted increasing attention in the recentpast. It aims to mimic the human ability to learn new concepts withoutcatastrophic forgetting. While existing CL methods accomplish this to someextent, they are still prone to semantic drift of the learned feature space.Foundation models, which are endowed with a robust feature representation,learned from very large datasets, provide an interesting substrate for thesolution of the CL problem. Recent work has also shown that they can be adaptedto specific tasks by prompt tuning techniques that leave the generality of therepresentation mostly unscathed. An open question is, however, how to learnboth prompts that are task specific and prompts that are global, i.e. capturecross-task information. In this work, we propose the Prompt Of Prompts (POP)model, which addresses this goal by progressively learning a group oftask-specified prompts and a group of global prompts, denoted as POP, tointegrate information from the former. We show that a foundation model equippedwith POP learning is able to outperform classic CL methods by a significantmargin. Moreover, as prompt tuning only requires a small set of trainingsamples, POP is able to perform CL in the few-shot setting, while stilloutperforming competing methods trained on the entire dataset.", "output": "POP: Prompt Of Prompts for Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image segmentation based on deep learning often fails when deployedon images from a different domain. The domain adaptation methods aim to solvedomain-shift challenges, but still face some problems. The transfer learningmethods require annotation on the target domain, and the generativeunsupervised domain adaptation (UDA) models ignore domain-specificrepresentations, whose generated quality highly restricts segmentationperformance. In this study, we propose a novel Structure-Modal Constrained(SMC) UDA framework based on a discriminative paradigm and introduce edgestructure as a bridge between domains. The proposed multi-modal learningbackbone distills structure information from image texture to distinguishdomain-invariant edge structure. With the structure-constrained self-learningand progressive ROI, our methods segment the kidney by locating the 3D spatialstructure of the edge. We evaluated SMC-UDA on public renal segmentationdatasets, adapting from the labeled source domain (CT) to the unlabeled targetdomain (CT/MRI). The experiments show that our proposed SMC-UDA has a stronggeneralization and outperforms generative UDA methods.", "output": "SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents CLIPXPlore, a new framework that leverages avision-language model to guide the exploration of the 3D shape space. Manyrecent methods have been developed to encode 3D shapes into a learned latentshape space to enable generative design and modeling. Yet, existing methodslack effective exploration mechanisms, despite the rich information. To thisend, we propose to leverage CLIP, a powerful pre-trained vision-language model,to aid the shape-space exploration. Our idea is threefold. First, we couple theCLIP and shape spaces by generating paired CLIP and shape codes through sketchimages and training a mapper network to connect the two spaces. Second, toexplore the space around a given shape, we formulate a co-optimization strategyto search for the CLIP code that better matches the geometry of the shape.Third, we design three exploration modes, binary-attribute-guided, text-guided,and sketch-guided, to locate suitable exploration trajectories in shape spaceand induce meaningful changes to the shape. We perform a series of experimentsto quantitatively and visually compare CLIPXPlore with different baselines ineach of the three exploration modes, showing that CLIPXPlore can produce manymeaningful exploration results that cannot be achieved by the existingsolutions.", "output": "CLIPXPlore: Coupled CLIP and Shape Spaces for 3D Shape Exploration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the widespread application of optimal transport (OT), its calculationbecomes essential, and various algorithms have emerged. However, the existingmethods either have low efficiency or cannot represent discontinuous maps. Anovel reusable neural OT solver OT-Net is thus presented, which first learnsBrenier's height representation via the neural network to obtain its potential,and then gained the OT map by computing the gradient of the potential. Thealgorithm has two merits, 1) it can easily represent discontinuous maps, whichallows it to match any target distribution with discontinuous supports andachieve sharp boundaries. This can well eliminate mode collapse in thegenerated models. 2) The OT map can be calculated straightly by the proposedalgorithm when new target samples are added, which greatly improves theefficiency and reusability of the map. Moreover, the theoretical error bound ofthe algorithm is analyzed, and we have demonstrated the empirical success ofour approach in image generation, color transfer, and domain adaptation.", "output": "OT-Net: A Reusable Neural Optimal Transport Solver."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cell recognition is a fundamental task in digital histopathology imageanalysis. Point-based cell recognition (PCR) methods normally require a vastnumber of annotations, which is extremely costly, time-consuming andlabor-intensive. Semi-supervised learning (SSL) can provide a shortcut to makefull use of cell information in gigapixel whole slide images without exhaustivelabeling. However, research into semi-supervised point-based cell recognition(SSPCR) remains largely overlooked. Previous SSPCR works are all built ondensity map-based PCR models, which suffer from unsatisfactory accuracy, slowinference speed and high sensitivity to hyper-parameters. To address theseissues, end-to-end PCR models are proposed recently. In this paper, we developa SSPCR framework suitable for the end-to-end PCR models for the first time.Overall, we use the current models to generate pseudo labels for unlabeledimages, which are in turn utilized to supervise the models training. Besides,we introduce a co-teaching strategy to overcome the confirmation bias problemthat generally exists in self-training. A distribution alignment technique isalso incorporated to produce high-quality, unbiased pseudo labels for unlabeleddata. Experimental results on four histopathology datasets concerning differenttypes of staining styles show the effectiveness and versatility of the proposedframework. Code is available attextcolor{magenta}{url{", "output": "Semi-supervised Cell Recognition under Point Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autism spectrum disorder (ASD) is a developmental disorder characterized bysignificant social communication impairments and difficulties perceiving andpresenting communication cues. Machine learning techniques have been broadlyadopted to facilitate autism studies and assessments. However, computationalmodels are primarily concentrated on specific analysis and validated on privatedatasets in the autism community, which limits comparisons across models due toprivacy-preserving data sharing complications. This work presents a novelprivacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmarkdataset, collected from play therapy interventions of children with Autism.MMASD includes data from 32 children with ASD, and 1,315 data samples segmentedfrom over 100 hours of intervention recordings. To promote public access, eachdata sample consists of four privacy-preserving modalities of data: (1) opticalflow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scoresof children, e.g., ADOS scores. MMASD aims to assist researchers and therapistsin understanding children's cognitive status, monitoring their progress duringtherapy, and customizing the treatment plan accordingly. It also hasinspiration for downstream tasks such as action quality assessment andinterpersonal synchrony estimation. MMASD dataset can be easily accessed at", "output": "MMASD: A Multimodal Dataset for Autism Intervention Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-Image (T2I) generation with diffusion models allows users to controlthe semantic content in the synthesized images given text conditions. As afurther step toward a more customized image creation application, we introducea new multi-modality generation setting that synthesizes images based on notonly the semantic-level textual input but also on the pixel-level visualconditions. Existing literature first converts the given visual information tosemantic-level representation by connecting it to languages, and thenincorporates it into the original denoising process. Seemingly intuitive, suchmethodological design loses the pixel values during the semantic transition,thus failing to fulfill the task scenario where the preservation of low-levelvision is desired (e.g., ID of a given face image). To this end, we proposeCyclic One-Way Diffusion (COW), a training-free framework for creatingcustomized images with respect to semantic text and pixel-visual conditioning.Notably, we observe that sub-regions of an image impose mutual interference,just like physical diffusion, to achieve ultimate harmony along the denoisingtrajectory. Thus we propose to repetitively utilize the given visual conditionin a cyclic way, by planting the visual condition as a high-concentration``seed'' at the initialization step of the denoising process, and ``diffuse''it into a harmonious picture by controlling a one-way information flow from thevisual condition. We repeat the destroy-and-construct process multiple times togradually but steadily impose the internal diffusion process within the image.Experiments on the challenging one-shot face and text-conditioned imagesynthesis task demonstrate our superiority in terms of speed, image quality,and conditional fidelity compared to learning-based text-vision conditionalmethods.", "output": "Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Masked autoencoder (MAE) has attracted unprecedented attention and achievesremarkable performance in many vision tasks. It reconstructs random maskedimage patches (known as proxy task) during pretraining and learns meaningfulsemantic representations that can be transferred to downstream tasks. However,MAE has not been thoroughly explored in ultrasound imaging. In this work, weinvestigate the potential of MAE for ultrasound image recognition. Motivated bythe unique property of ultrasound imaging in high noise-to-signal ratio, wepropose a novel deblurring MAE approach that incorporates deblurring into theproxy task during pretraining. The addition of deblurring facilitates thepretraining to better recover the subtle details presented in the ultrasoundimages, thus improving the performance of the downstream classification task.Our experimental results demonstrate the effectiveness of our deblurring MAE,achieving state-of-the-art performance in ultrasound image classification.Overall, our work highlights the potential of MAE for ultrasound imagerecognition and presents a novel approach that incorporates deblurring tofurther improve its effectiveness.", "output": "Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The bokeh effect is an artistic technique that blurs out-of-focus areas in aphotograph and has gained interest due to recent developments in text-to-imagesynthesis and the ubiquity of smart-phone cameras and photo-sharing apps. Priorwork on rendering bokeh effects have focused on post hoc image manipulation toproduce similar blurring effects in existing photographs using classicalcomputer graphics or neural rendering techniques, but have either depthdiscontinuity artifacts or are restricted to reproducing bokeh effects that arepresent in the training data. More recent diffusion based models can synthesizeimages with an artistic style, but either require the generation ofhigh-dimensional masks, expensive fine-tuning, or affect global imagecharacteristics. In this paper, we present GBSD, the first generativetext-to-image model that synthesizes photorealistic images with a bokeh style.Motivated by how image synthesis occurs progressively in diffusion models, ourapproach combines latent diffusion models with a 2-stage conditioning algorithmto render bokeh effects on semantically defined objects. Since we can focus theeffect on objects, this semantic bokeh effect is more versatile than classicalrendering techniques. We evaluate GBSD both quantitatively and qualitativelyand demonstrate its ability to be applied in both text-to-image andimage-to-image settings.", "output": "GBSD: Generative Bokeh with Stage Diffusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Latent diffusion models achieve state-of-the-art performance on a variety ofgenerative tasks, such as image synthesis and image editing. However, therobustness of latent diffusion models is not well studied. Previous works onlyfocus on the adversarial attacks against the encoder or the output image underwhite-box settings, regardless of the denoising process. Therefore, in thispaper, we aim to analyze the robustness of latent diffusion models morethoroughly. We first study the influence of the components inside latentdiffusion models on their white-box robustness. In addition to white-boxscenarios, we evaluate the black-box robustness of latent diffusion models viatransfer attacks, where we consider both prompt-transfer and model-transfersettings and possible defense mechanisms. However, all these explorations needa comprehensive benchmark dataset, which is missing in the literature.Therefore, to facilitate the research of the robustness of latent diffusionmodels, we propose two automatic dataset construction pipelines for two kindsof image editing models and release the whole dataset. Our code and dataset areavailable at url{", "output": "On the Robustness of Latent Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Albeit achieving high predictive accuracy across many challenging computervision problems, recent studies suggest that deep neural networks (DNNs) tendto make overconfident predictions, rendering them poorly calibrated. Most ofthe existing attempts for improving DNN calibration are limited toclassification tasks and restricted to calibrating in-domain predictions.Surprisingly, very little to no attempts have been made in studying thecalibration of object detection methods, which occupy a pivotal space invision-based security-sensitive, and safety-critical applications. In thispaper, we propose a new train-time technique for calibrating modern objectdetection methods. It is capable of jointly calibrating multiclass confidenceand box localization by leveraging their predictive uncertainties. We performextensive experiments on several in-domain and out-of-domain detectionbenchmarks. Results demonstrate that our proposed train-time calibration methodconsistently outperforms several baselines in reducing calibration error forboth in-domain and out-of-domain predictions. Our code and models are availableat ", "output": "Multiclass Confidence and Localization Calibration for Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semi-supervised learning (SSL) methods, which can leverage a large amount ofunlabeled data for improved performance, has attracted increasing attentionrecently. In this paper, we introduce a novel Context-aware Conditional CrossPseudo Supervision method (referred as C$^3$PS) for semi-supervised medicalimage segmentation. Unlike previously published Cross Pseudo Supervision (CPS)works, this paper introduces a novel Conditional Cross Pseudo Supervision(CCPS) mechanism where the cross pseudo supervision is conditioned on a givenclass label. Context-awareness is further introduced in the CCPS to improve thequality of pseudo-labels for cross pseudo supervision. The proposed method hasthe additional advantage that in the later training stage, it can focus on thelearning of hard organs. Validated on two typical yet challenging medical imagesegmentation tasks, our method demonstrates superior performance over thestate-of-the-art methods.", "output": "C$^3$PS: Context-aware Conditional Cross Pseudo Supervision for Semi-supervised Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given two images depicting a person and a garment worn by another person, ourgoal is to generate a visualization of how the garment might look on the inputperson. A key challenge is to synthesize a photorealistic detail-preservingvisualization of the garment, while warping the garment to accommodate asignificant body pose and shape change across the subjects. Previous methodseither focus on garment detail preservation without effective pose and shapevariation, or allow try-on with the desired shape and pose but lack garmentdetails. In this paper, we propose a diffusion-based architecture that unifiestwo UNets (referred to as Parallel-UNet), which allows us to preserve garmentdetails and warp the garment for significant pose and body change in a singlenetwork. The key ideas behind Parallel-UNet include: 1) garment is warpedimplicitly via a cross attention mechanism, 2) garment warp and person blendhappen as part of a unified process as opposed to a sequence of two separatetasks. Experimental results indicate that TryOnDiffusion achievesstate-of-the-art performance both qualitatively and quantitatively.", "output": "TryOnDiffusion: A Tale of Two UNets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Action anticipation, which aims to recognize the action with a partialobservation, becomes increasingly popular due to a wide range of applications.In this paper, we investigate the problem of 3D action anticipation fromstreaming videos with the target of understanding best practices for solvingthis problem. We first introduce several complementary evaluation metrics andpresent a basic model based on frame-wise action classification. To achievebetter performance, we then investigate two important factors, i.e., the lengthof the training clip and clip sampling method. We also explore multi-tasklearning strategies by incorporating auxiliary information from two aspects:the full action representation and the class-agnostic action label. Ourcomprehensive experiments uncover the best practices for 3D actionanticipation, and accordingly we propose a novel method with a multi-task loss.The proposed method considerably outperforms the recent methods and exhibitsthe state-of-the-art performance on standard benchmarks.", "output": "Delving into 3D Action Anticipation from Streaming Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The importance of inference in Machine Learning (ML) has led to an explosivenumber of different proposals in ML, and particularly in Deep Learning. In anattempt to reduce the complexity of Convolutional Neural Networks, we propose aVolterra filter-inspired Network architecture. This architecture introducescontrolled non-linearities in the form of interactions between the delayedinput samples of data. We propose a cascaded implementation of VolterraFiltering so as to significantly reduce the number of parameters required tocarry out the same classification task as that of a conventional NeuralNetwork. We demonstrate an efficient parallel implementation of this VolterraNeural Network (VNN), along with its remarkable performance while retaining arelatively simpler and potentially more tractable structure. Furthermore, weshow a rather sophisticated adaptation of this network to nonlinearly fuse theRGB (spatial) information and the Optical Flow (temporal) information of avideo sequence for action recognition. The proposed approach is evaluated onUCF-101 and HMDB-51 datasets for action recognition, and is shown to outperformstate of the art CNN approaches.", "output": "Volterra Neural Networks (VNNs)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Convolutional Neural Networks (DCNNs) have demonstrated impressiverobustness to recognize objects under transformations (eg. blur or noise) whenthese transformations are included in the training set. A hypothesis to explainsuch robustness is that DCNNs develop invariant neural representations thatremain unaltered when the image is transformed. However, to what extent thishypothesis holds true is an outstanding question, as robustness totransformations could be achieved with properties different from invariance,eg. parts of the network could be specialized to recognize either transformedor non-transformed images. This paper investigates the conditions under whichinvariant neural representations emerge by leveraging that they facilitaterobustness to transformations beyond the training distribution. Concretely, weanalyze a training paradigm in which only some object categories are seentransformed during training and evaluate whether the DCNN is robust totransformations across categories not seen transformed. Our results withstate-of-the-art DCNNs indicate that invariant neural representations do notalways drive robustness to transformations, as networks show robustness forcategories seen transformed during training even in the absence of invariantneural representations. Invariance only emerges as the number of transformedcategories in the training set is increased. This phenomenon is much moreprominent with local transformations such as blurring and high-pass filteringthan geometric transformations such as rotation and thinning, which entailchanges in the spatial arrangement of the object. Our results contribute to abetter understanding of invariant neural representations in deep learning andthe conditions under which it spontaneously emerges.", "output": "Robustness to Transformations Across Categories: Is Robustness To Transformations Driven by Invariant Neural Representations?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is very challenging for various visual tasks such as image fusion,pedestrian detection and image-to-image translation in low light conditions dueto the loss of effective target areas. In this case, infrared and visibleimages can be used together to provide both rich detail information andeffective target areas. In this paper, we present LLVIP, a visible-infraredpaired dataset for low-light vision. This dataset contains 30976 images, or15488 pairs, most of which were taken at very dark scenes, and all of theimages are strictly aligned in time and space. Pedestrians in the dataset arelabeled. We compare the dataset with other visible-infrared datasets andevaluate the performance of some popular visual algorithms including imagefusion, pedestrian detection and image-to-image translation on the dataset. Theexperimental results demonstrate the complementary effect of fusion on imageinformation, and find the deficiency of existing algorithms of the three visualtasks in very low-light conditions. We believe the LLVIP dataset willcontribute to the community of computer vision by promoting image fusion,pedestrian detection and image-to-image translation in very low-lightapplications. The dataset is being released in Raw data is also provided for furtherresearch such as image registration.", "output": "LLVIP: A Visible-infrared Paired Dataset for Low-light Vision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual scenes are composed of visual concepts and have the property ofcombinatorial explosion. An important reason for humans to efficiently learnfrom diverse visual scenes is the ability of compositional perception, and itis desirable for artificial intelligence to have similar abilities.Compositional scene representation learning is a task that enables suchabilities. In recent years, various methods have been proposed to apply deepneural networks, which have been proven to be advantageous in representationlearning, to learn compositional scene representations via reconstruction,advancing this research direction into the deep learning era. Learning viareconstruction is advantageous because it may utilize massive unlabeled dataand avoid costly and laborious data annotation. In this survey, we firstoutline the current progress on reconstruction-based compositional scenerepresentation learning with deep neural networks, including developmenthistory and categorizations of existing methods from the perspectives of themodeling of visual scenes and the inference of scene representations; thenprovide benchmarks, including an open source toolbox to reproduce the benchmarkexperiments, of representative methods that consider the most extensivelystudied problem setting and form the foundation for other methods; and finallydiscuss the limitations of existing methods and future directions of thisresearch topic.", "output": "Compositional Scene Representation Learning via Reconstruction: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing demand for deep learning models on mobile devices,splitting neural network computation between the device and a more powerfuledge server has become an attractive solution. However, existing splitcomputing approaches often underperform compared to a naive baseline of remotecomputation on compressed data. Recent studies propose learning compressedrepresentations that contain more relevant information for superviseddownstream tasks, showing improved tradeoffs between compressed data size andsupervised performance. However, existing evaluation metrics only provide anincomplete picture of split computing. This study introduces supervisedcompression for split computing (SC2) and proposes new evaluation criteria:minimizing computation on the mobile device, minimizing transmitted data size,and maximizing model accuracy. We conduct a comprehensive benchmark study using10 baseline methods, three computer vision tasks, and over 180 trained models,and discuss various aspects of SC2. We also release sc2bench, a Python packagefor future research on SC2. Our proposed metrics and package will helpresearchers better understand the tradeoffs of supervised compression in splitcomputing.", "output": "SC2 Benchmark: Supervised Compression for Split Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision Transformers have witnessed prevailing success in a series of visiontasks. However, these Transformers often rely on extensive computational coststo achieve high performance, which is burdensome to deploy onresource-constrained devices. To alleviate this issue, we draw lessons fromdepthwise separable convolution and imitate its ideology to design an efficientTransformer backbone, i.e., Separable Vision Transformer, abbreviated asSepViT. SepViT helps to carry out the local-global information interactionwithin and among the windows in sequential order via a depthwise separableself-attention. The novel window token embedding and grouped self-attention areemployed to compute the attention relationship among windows with negligiblecost and establish long-range visual interactions across multiple windows,respectively. Extensive experiments on general-purpose vision benchmarksdemonstrate that SepViT can achieve a state-of-the-art trade-off betweenperformance and latency. Among them, SepViT achieves 84.2% top-1 accuracy onImageNet-1K classification while decreasing the latency by 40%, compared to theones with similar accuracy (e.g., CSWin). Furthermore, SepViT achieves 51.0%mIoU on ADE20K semantic segmentation task, 47.9 AP on the RetinaNet-based COCOdetection task, 49.4 box AP and 44.6 mask AP on Mask R-CNN-based COCO objectdetection and instance segmentation tasks.", "output": "SepViT: Separable Vision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine Learning with Deep Neural Networks (DNNs) has become a successfultool in solving tasks across various fields of application. However, thecomplexity of DNNs makes it difficult to understand how they solve theirlearned task. To improve the explainability of DNNs, we adapt methods fromneuroscience that analyze complex and opaque systems. Here, we draw inspirationfrom how neuroscience uses topographic maps to visualize brain activity. Toalso visualize activations of neurons in DNNs as topographic maps, we researchtechniques to layout the neurons in a two-dimensional space such that neuronsof similar activity are in the vicinity of each other. In this work, weintroduce and compare methods to obtain a topographic layout of neurons in aDNN layer. Moreover, we demonstrate how to use topographic activation maps toidentify errors or encoded biases and to visualize training processes. Ournovel visualization technique improves the transparency of DNN-baseddecision-making systems and is interpretable without expert knowledge inMachine Learning.", "output": "Visualizing Deep Neural Networks with Topographic Activation Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel generative saliency prediction framework that adopts aninformative energy-based model as a prior distribution. The energy-based priormodel is defined on the latent space of a saliency generator network thatgenerates the saliency map based on a continuous latent variables and anobserved image. Both the parameters of saliency generator and the energy-basedprior are jointly trained via Markov chain Monte Carlo-based maximum likelihoodestimation, in which the sampling from the intractable posterior and priordistributions of the latent variables are performed by Langevin dynamics. Withthe generative saliency model, we can obtain a pixel-wise uncertainty map froman image, indicating model confidence in the saliency prediction. Differentfrom existing generative models, which define the prior distribution of thelatent variables as a simple isotropic Gaussian distribution, our model uses anenergy-based informative prior which can be more expressive in capturing thelatent space of the data. With the informative energy-based prior, we extendthe Gaussian distribution assumption of generative models to achieve a morerepresentative distribution of the latent space, leading to more reliableuncertainty estimation. We apply the proposed frameworks to both RGB and RGB-Dsalient object detection tasks with both transformer and convolutional neuralnetwork backbones. We further propose an adversarial learning algorithm and avariational inference algorithm as alternatives to train the proposedgenerative framework. Experimental results show that our generative saliencymodel with an energy-based prior can achieve not only accurate saliencypredictions but also reliable uncertainty maps that are consistent with humanperception. Results and code are available at", "output": "An Energy-Based Prior for Generative Saliency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision-Language Transformers can be learned without low-level human labels(e.g. class labels, bounding boxes, etc). Existing work, whether explicitlyutilizing bounding boxes or patches, assumes that the visual backbone mustfirst be trained on ImageNet class prediction before being integrated into amultimodal linguistic pipeline. We show that this is not necessary andintroduce a new model Vision-Language from Captions (VLC) built on top ofMasked Auto-Encoders that does not require this supervision. In fact, in ahead-to-head comparison between ViLT, the current state-of-the-art patch-basedvision-language transformer which is pretrained with supervised objectclassification, and our model, VLC, we find that our approach 1. outperformsViLT on standard benchmarks, 2. provides more interpretable and intuitive patchvisualizations, and 3. is competitive with many larger models that utilize ROIstrained on annotated bounding-boxes.", "output": "Training Vision-Language Transformers from Captions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Facial semantic guidance (including facial landmarks, facial heatmaps, andfacial parsing maps) and facial generative adversarial networks (GAN) priorhave been widely used in blind face restoration (BFR) in recent years. Althoughexisting BFR methods have achieved good performance in ordinary cases, thesesolutions have limited resilience when applied to face images with seriousdegradation and pose-varied (e.g., looking right, looking left, laughing, etc.)in real-world scenarios. In this work, we propose a well-designed blind facerestoration network with generative facial prior. The proposed network ismainly comprised of an asymmetric codec and a StyleGAN2 prior network. In theasymmetric codec, we adopt a mixed multi-path residual block (MMRB) togradually extract weak texture features of input images, which can betterpreserve the original facial features and avoid excessive fantasy. The MMRB canalso be plug-and-play in other networks. Furthermore, thanks to the affluentand diverse facial priors of the StyleGAN2 model, we adopt it as the primarygenerator network in our proposed method and specially design a novelself-supervised training strategy to fit the distribution closer to the targetand flexibly restore natural and realistic facial details. Extensiveexperiments on synthetic and real-world datasets demonstrate that our modelperforms superior to the prior art for face restoration and facesuper-resolution tasks.", "output": "Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a convolution structure for learning SE(3)-equivariantfeatures from 3D point clouds. It can be viewed as an equivariant version ofkernel point convolutions (KPConv), a widely used convolution form to processpoint cloud data. Compared with existing equivariant networks, our design issimple, lightweight, fast, and easy to be integrated with existingtask-specific point cloud learning pipelines. We achieve these desirableproperties by combining group convolutions and quotient representations.Specifically, we discretize SO(3) to finite groups for their simplicity whileusing SO(2) as the stabilizer subgroup to form spherical quotient featurefields to save computations. We also propose a permutation layer to recoverSO(3) features from spherical features to preserve the capacity to distinguishrotations. Experiments show that our method achieves comparable or superiorperformance in various tasks, including object classification, pose estimation,and keypoint-matching, while consuming much less memory and running faster thanexisting work. The proposed method can foster the development of equivariantmodels for real-world applications based on point clouds.", "output": "E2PN: Efficient SE(3)-Equivariant Point Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the significant advances achieved in Artificial Neural Networks(ANNs), their design process remains notoriously tedious, depending primarilyon intuition, experience and trial-and-error. This human-dependent process isoften time-consuming and prone to errors. Furthermore, the models are generallybound to their training contexts, with no considerations to their surroundingenvironments. Continual adaptiveness and automation of neural networks is ofparamount importance to several domains where model accessibility is limitedafter deployment (e.g IoT devices, self-driving vehicles, etc.). Additionally,even accessible models require frequent maintenance post-deployment to overcomeissues such as Concept/Data Drift, which can be cumbersome and restrictive. Byleveraging and combining approaches from Neural Architecture Search (NAS) andContinual Learning (CL), more robust and adaptive agents can be developed. Thisstudy conducts the first extensive review on the intersection between NAS andCL, formalizing the prospective Continually-Adaptive Neural Networks (CANNs)paradigm and outlining research directions for lifelong autonomous ANNs.", "output": "Exploring the Intersection between Neural Architecture Search and Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep convolutional autoencoders provide an effective tool for learningnon-linear dimensionality reduction in an unsupervised way. Recently, they havebeen used for the task of anomaly detection in the visual domain. By optimisingfor the reconstruction error using anomaly-free examples, the common belief isthat a corresponding network should fail to accurately reconstruct anomalousregions in the application phase. This goal is typically addressed bycontrolling the capacity of the network by either reducing the size of thebottleneck layer or enforcing sparsity constraints on its activations. However,neither of these techniques does explicitly penalize reconstruction ofanomalous signals often resulting in poor detection. We tackle this problem byadapting a self-supervised learning regime, which allows to use discriminativeinformation during training focusing on the data manifold by means of amodified reconstruction error. This regularizes the model to produce locallyconsistent reconstructions, while replacing irregularities by acting as afilter for anomalous patterns. In contrast to related approaches, inferencewith our method is very efficient during training and prediction processing theentire input image in one single step. Our experiments on the MVTec AD datasetdemonstrate high recognition and localization performance of the proposedmethod. On the texture-subset, in particular, our approach consistentlyoutperforms a bunch of recent anomaly detection methods by a big margin.", "output": "Self-Supervised Training with Autoencoders for Visual Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Besides standard cameras, autonomous vehicles typically include multipleadditional sensors, such as lidars and radars, which help acquire richerinformation for perceiving the content of the driving scene. While severalrecent works focus on fusing certain pairs of sensors - such as camera withlidar or radar - by using architectural components specific to the examinedsetting, a generic and modular sensor fusion architecture is missing from theliterature. In this work, we propose HRFuser, a modular architecture formulti-modal 2D object detection. It fuses multiple sensors in amulti-resolution fashion and scales to an arbitrary number of input modalities.The design of HRFuser is based on state-of-the-art high-resolution networks forimage-only dense prediction and incorporates a novel multi-windowcross-attention block as the means to perform fusion of multiple modalities atmultiple resolutions. We demonstrate via extensive experiments on nuScenes andthe adverse conditions DENSE datasets that our model effectively leveragescomplementary features from additional modalities, substantially improving uponcamera-only performance and consistently outperforming state-of-the-art 3D and2D fusion methods evaluated on 2D object detection metrics. The source code ispublicly available.", "output": "HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As two fundamental representation modalities of 3D objects, 3D point cloudsand multi-view 2D images record shape information from different domains ofgeometric structures and visual appearances. In the current deep learning era,remarkable progress in processing such two data modalities has been achievedthrough respectively customizing compatible 3D and 2D network architectures.However, unlike multi-view image-based 2D visual modeling paradigms, which haveshown leading performance in several common 3D shape recognition benchmarks,point cloud-based 3D geometric modeling paradigms are still highly limited byinsufficient learning capacity, due to the difficulty of extractingdiscriminative features from irregular geometric signals. In this paper, weexplore the possibility of boosting deep 3D point cloud encoders bytransferring visual knowledge extracted from deep 2D image encoders under astandard teacher-student distillation workflow. Generally, we propose PointMCD,a unified multi-view cross-modal distillation architecture, including apretrained deep image encoder as the teacher and a deep point encoder as thestudent. To perform heterogeneous feature alignment between 2D visual and 3Dgeometric domains, we further investigate visibility-aware feature projection(VAFP), by which point-wise embeddings are reasonably aggregated intoview-specific geometric descriptors. By pair-wisely aligning multi-view visualand geometric descriptors, we can obtain more powerful deep point encoderswithout exhausting and complicated network modification. Experiments on 3Dshape classification, part segmentation, and unsupervised learning stronglyvalidate the effectiveness of our method. The code and data will be publiclyavailable at ", "output": "PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Few-shot learning (FSL) aims to learn a classifier that can be easily adaptedto recognize novel classes with only a few labeled examples. Some recent workabout FSL has yielded promising classification performance, where theimage-level feature is used to calculate the similarity among samples forclassification. However, the image-level feature ignores abundant fine-grainedand structural in-formation of objects that may be transferable and consistentbetween seen and unseen classes. How can humans easily identify novel classeswith several sam-ples? Some study from cognitive science argues that humans canrecognize novel categories through primitives. Although base and novelcategories are non-overlapping, they can share some primitives in common.Inspired by above re-search, we propose a Primitive Mining and ReasoningNetwork (PMRN) to learn primitive-aware representations based on metric-basedFSL model. Concretely, we first add Self-supervision Jigsaw task (SSJ) forfeature extractor parallelly, guiding the model to encode visual patterncorresponding to object parts into fea-ture channels. To further minediscriminative representations, an Adaptive Chan-nel Grouping (ACG) method isapplied to cluster and weight spatially and se-mantically related visualpatterns to generate a group of visual primitives. To fur-ther enhance thediscriminability and transferability of primitives, we propose a visualprimitive Correlation Reasoning Network (CRN) based on graph convolu-tionalnetwork to learn abundant structural information and internal correlation amongprimitives. Finally, a primitive-level metric is conducted for classificationin a meta-task based on episodic training strategy. Extensive experiments showthat our method achieves state-of-the-art results on six standard benchmarks.", "output": "Learning Primitive-aware Discriminative Representations for Few-shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In computer vision, camera pose estimation from correspondences between 3Dgeometric entities and their projections into the image has been a widelyinvestigated problem. Although most state-of-the-art methods exploit low-levelprimitives such as points or lines, the emergence of very effective CNN-basedobject detectors in the recent years has paved the way to the use ofhigher-level features carrying semantically meaningful information. Pioneeringworks in that direction have shown that modelling 3D objects by ellipsoids and2D detections by ellipses offers a convenient manner to link 2D and 3D data.However, the mathematical formalism most often used in the related litteraturedoes not enable to easily distinguish ellipsoids and ellipses from otherquadrics and conics, leading to a loss of specificity potentially detrimentalin some developments. Moreover, the linearization process of the projectionequation creates an over-representation of the camera parameters, also possiblycausing an efficiency loss. In this paper, we therefore introduce anellipsoid-specific theoretical framework and demonstrate its beneficialproperties in the context of pose estimation. More precisely, we first showthat the proposed formalism enables to reduce the pose estimation problem to aposition or orientation-only estimation problem in which the remaining unknownscan be derived in closed-form. Then, we demonstrate that it can be furtherreduced to a 1 Degree-of-Freedom (1DoF) problem and provide the analyticalderivations of the pose as a function of that unique scalar unknown. Weillustrate our theoretical considerations by visual examples and include adiscussion on the practical aspects. Finally, we release this paper along withthe corresponding source code in order to contribute towards more efficientresolutions of ellipsoid-related pose estimation problems.", "output": "Perspective-1-Ellipsoid: Formulation, Analysis and Solutions of the Camera Pose Estimation Problem from One Ellipse-Ellipsoid Correspondence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "described by multiple instances (e.g., image patches) and simultaneouslyassociated with multiple labels. Existing MIML methods are useful in manyapplications but most of which suffer from relatively low accuracy and trainingefficiency due to several issues: i) the inter-label correlations(i.e., theprobabilistic correlations between the multiple labels corresponding to anobject) are neglected; ii) the inter-instance correlations (i.e., theprobabilistic correlations of different instances in predicting the objectlabel) cannot be learned directly (or jointly) with other types of correlationsdue to the missing instance labels; iii) diverse inter-correlations (e.g.,inter-label correlations, inter-instance correlations) can only be learned inmultiple stages. To resolve these issues, a new single-stage framework calledbroad multi-instance multi-label learning (BMIML) is proposed. In BMIML, thereare three innovative modules: i) an auto-weighted label enhancement learning(AWLEL) based on broad learning system (BLS) is designed, which simultaneouslyand efficiently captures the inter-label correlations while traditional BLScannot; ii) A specific MIML neural network called scalable multi-instanceprobabilistic regression (SMIPR) is constructed to effectively estimate theinter-instance correlations using the object label only, which can provideadditional probabilistic information for learning; iii) Finally, an interactivedecision optimization (IDO) is designed to combine and optimize the resultsfrom AWLEL and SMIPR and form a single-stage framework. Experiments show thatBMIML is highly competitive to (or even better than) existing methods inaccuracy and much faster than most MIML methods even for large medical imagedata sets (&gt; 90K images).", "output": "Single-Stage Broad Multi-Instance Multi-Label Learning (BMIML) with Diverse Inter-Correlations and its application to medical image classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-based Visual Question Answering~(TextVQA) aims to produce correctanswers for given questions about the images with multiple scene texts. In mostcases, the texts naturally attach to the surface of the objects. Therefore,spatial reasoning between texts and objects is crucial in TextVQA. However,existing approaches are constrained within 2D spatial information learned fromthe input images and rely on transformer-based architectures to reasonimplicitly during the fusion process. Under this setting, these 2D spatialreasoning approaches cannot distinguish the fine-grain spatial relationsbetween visual objects and scene texts on the same image plane, therebyimpairing the interpretability and performance of TextVQA models. In thispaper, we introduce 3D geometric information into a human-like spatialreasoning process to capture the contextual knowledge of key objectsstep-by-step. %we formulate a human-like spatial reasoning process byintroducing 3D geometric information for capturing key objects' contextualknowledge. To enhance the model's understanding of 3D spatial relationships,Specifically, (i)~we propose a relation prediction module for accuratelylocating the region of interest of critical objects; (ii)~we design adepth-aware attention calibration module for calibrating the OCR tokens'attention according to critical objects. Extensive experiments show that ourmethod achieves state-of-the-art performance on TextVQA and ST-VQA datasets.More encouragingly, our model surpasses others by clear margins of 5.7% and12.1% on questions that involve spatial reasoning in TextVQA and ST-VQA validsplit. Besides, we also verify the generalizability of our model on thetext-based image captioning task.", "output": "Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Keypoint detection serves as the basis for many computer vision and roboticsapplications. Despite the fact that colored point clouds can be readilyobtained, most existing keypoint detectors extract only geometry-salientkeypoints, which can impede the overall performance of systems that intend to(or have the potential to) leverage color information. To promote advances insuch systems, we propose an efficient multi-modal keypoint detector that canextract both geometry-salient and color-salient keypoints in colored pointclouds. The proposed CEntroid Distance (CED) keypoint detector comprises anintuitive and effective saliency measure, the centroid distance, that can beused in both 3D space and color space, and a multi-modal non-maximumsuppression algorithm that can select keypoints with high saliency in two ormore modalities. The proposed saliency measure leverages directly thedistribution of points in a local neighborhood and does not require normalestimation or eigenvalue decomposition. We evaluate the proposed method interms of repeatability and computational efficiency (i.e. running time) againststate-of-the-art keypoint detectors on both synthetic and real-world datasets.Results demonstrate that our proposed CED keypoint detector requires minimalcomputational time while attaining high repeatability. To showcase one of thepotential applications of the proposed method, we further investigate the taskof colored point cloud registration. Results suggest that our proposed CEDdetector outperforms state-of-the-art handcrafted and learning-based keypointdetectors in the evaluated scenes. The C++ implementation of the proposedmethod is made publicly available at", "output": "Centroid Distance Keypoint Detector for Colored Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Perception is crucial for robots that act in real-world environments, asautonomous systems need to see and understand the world around them to actproperly. Panoptic segmentation provides an interpretation of the scene bycomputing a pixelwise semantic label together with instance IDs. In this paper,we address panoptic segmentation using RGB-D data of indoor scenes. We proposea novel encoder-decoder neural network that processes RGB and depth separatelythrough two encoders. The features of the individual encoders are progressivelymerged at different resolutions, such that the RGB features are enhanced usingcomplementary depth information. We propose a novel merging approach calledResidualExcite, which reweighs each entry of the feature map according to itsimportance. With our double-encoder architecture, we are robust to missingcues. In particular, the same model can train and infer on RGB-D, RGB-only, anddepth-only input data, without the need to train specialized models. Weevaluate our method on publicly available datasets and show that our approachachieves superior results compared to other common approaches for panopticsegmentation.", "output": "Robust Double-Encoder Network for RGB-D Panoptic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal representation learning has shown promising improvements onvarious vision-language tasks. Most existing methods excel at buildingglobal-level alignment between vision and language while lacking effectivefine-grained image-text interaction. In this paper, we propose a jointly maskedmultimodal modeling method to learn fine-grained multimodal representations.Our method performs joint masking on image-text input and integrates bothimplicit and explicit targets for the masked signals to recover. The implicittarget provides a unified and debiased objective for vision and language, wherethe model predicts latent multimodal representations of the unmasked input. Theexplicit target further enriches the multimodal representations by recoveringhigh-level and semantically meaningful information: momentum visual features ofimage patches and concepts of word tokens. Through such a masked modelingprocess, our model not only learns fine-grained multimodal interaction, butalso avoids the semantic gap between high-level representations and low- ormid-level prediction targets (e.g. image pixels), thus producing semanticallyrich multimodal representations that perform well on both zero-shot andfine-tuned settings. Our pre-trained model (named MAMO) achievesstate-of-the-art performance on various downstream vision-language tasks,including image-text retrieval, visual question answering, visual reasoning,and weakly-supervised visual grounding.", "output": "MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Plant phenotyping is a central task in agriculture, as it describes plants'growth stage, development, and other relevant quantities. Robots can helpautomate this process by accurately estimating plant traits such as the numberof leaves, leaf area, and the plant size. In this paper, we address the problemof joint semantic, plant instance, and leaf instance segmentation of cropfields from RGB data. We propose a single convolutional neural network thataddresses the three tasks simultaneously, exploiting their underlyinghierarchical structure. We introduce task-specific skip connections, which ourexperimental evaluation proves to be more beneficial than the usual schemes. Wealso propose a novel automatic post-processing, which explicitly addresses theproblem of spatially close instances, common in the agricultural domain becauseof overlapping leaves. Our architecture simultaneously tackles these problemsjointly in the agricultural context. Previous works either focus on plant orleaf segmentation, or do not optimise for semantic segmentation. Results showthat our system has superior performance compared to state-of-the-artapproaches, while having a reduced number of parameters and is operating atcamera frame rate.", "output": "Hierarchical Approach for Joint Semantic, Plant Instance, and Leaf Instance Segmentation in the Agricultural Domain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lane marking detection is fundamental for both advanced driving assistancesystems. However, detecting lane is highly challenging when the visibility of aroad lane marking is low due to real-life challenging environment and adverseweather. Most of the lane detection methods suffer from four types ofchallenges: (i) light effects i.e., shadow, glare of light, reflection etc.;(ii) Obscured visibility of eroded, blurred, colored and cracked lane caused bynatural disasters and adverse weather; (iii) lane marking occlusion bydifferent objects from surroundings (wiper, vehicles etc.); and (iv) presenceof confusing lane like lines inside the lane view e.g., guardrails, pavementmarking, road divider etc. Here, we propose a robust lane detection andtracking method with three key technologies. First, we introduce acomprehensive intensity threshold range (CITR) to improve the performance ofthe canny operator in detecting low intensity lane edges. Second, we propose atwo-step lane verification technique, the angle based geometric constraint(AGC) and length-based geometric constraint (LGC) followed by Hough Transform,to verify the characteristics of lane marking and to prevent incorrect lanedetection. Finally, we propose a novel lane tracking technique, by defining arange of horizontal lane position (RHLP) along the x axis which will beupdating with respect to the lane position of previous frame. It can keep trackof the lane position when either left or right or both lane markings arepartially and fully invisible. To evaluate the performance of the proposedmethod we used the DSDLDE [1] and SLD [2] dataset with 1080x1920 and 480x720resolutions at 24 and 25 frames/sec respectively. Experimental results showthat the average detection rate is 97.55%, and the average processing time is22.33 msec/frame, which outperform the state of-the-art method.", "output": "Vision-Based Robust Lane Detection and Tracking under Different Challenging Environmental Conditions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement Learning (RL) algorithms can solve challenging control problemsdirectly from image observations, but they often require millions ofenvironment interactions to do so. Recently, model-based RL algorithms havegreatly improved sample-efficiency by concurrently learning an internal modelof the world, and supplementing real environment interactions with imaginedrollouts for policy improvement. However, learning an effective model of theworld from scratch is challenging, and in stark contrast to humans that relyheavily on world understanding and visual cues for learning new skills. In thiswork, we investigate whether internal models learned by modern model-based RLalgorithms can be leveraged to solve new, distinctly different tasks faster. Wepropose Model-Based Cross-Task Transfer (XTRA), a framework forsample-efficient online RL with scalable pretraining and finetuning of learnedworld models. By offline multi-task pretraining and online cross-taskfinetuning, we achieve substantial improvements over a baseline trained fromscratch; we improve mean performance of model-based algorithm EfficientZero by23%, and by as much as 71% in some instances.", "output": "On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven approaches recently achieved remarkable success in magneticresonance imaging (MRI) reconstruction, but integration into clinical routineremains challenging due to a lack of generalizability and interpretability. Inthis paper, we address these challenges in a unified framework based ongenerative image priors. We propose a novel deep neural network basedregularizer which is trained in a generative setting on reference magnitudeimages only. After training, the regularizer encodes higher-level domainstatistics which we demonstrate by synthesizing images without data. Embeddingthe trained model in a classical variational approach yields high-qualityreconstructions irrespective of the sub-sampling pattern. In addition, themodel shows stable behavior when confronted with out-of-distribution data inthe form of contrast variation. Furthermore, a probabilistic interpretationprovides a distribution of reconstructions and hence allows uncertaintyquantification. To reconstruct parallel MRI, we propose a fast algorithm tojointly estimate the image and the sensitivity maps. The results demonstratecompetitive performance, on par with state-of-the-art end-to-end deep learningmethods, while preserving the flexibility with respect to sub-sampling patternsand allowing for uncertainty quantification.", "output": "Stable Deep MRI Reconstruction using Generative Priors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a deep learning-based approach for skull reconstruction for MONAI,which has been pre-trained on the MUG500+ skull dataset. The implementationfollows the MONAI contribution guidelines, hence, it can be easily tried outand used, and extended by MONAI users. The primary goal of this paper lies inthe investigation of open-sourcing codes and pre-trained deep learning modelsunder the MONAI framework. Nowadays, open-sourcing software, especially(pre-trained) deep learning models, has become increasingly important. Over theyears, medical image analysis experienced a tremendous transformation. Over adecade ago, algorithms had to be implemented and optimized with low-levelprogramming languages, like C or C++, to run in a reasonable time on a desktopPC, which was not as powerful as today's computers. Nowadays, users havehigh-level scripting languages like Python, and frameworks like PyTorch andTensorFlow, along with a sea of public code repositories at hand. As a result,implementations that had thousands of lines of C or C++ code in the past, cannow be scripted with a few lines and in addition executed in a fraction of thetime. To put this even on a higher level, the Medical Open Network forArtificial Intelligence (MONAI) framework tailors medical imaging research toan even more convenient process, which can boost and push the whole field. TheMONAI framework is a freely available, community-supported, open-source andPyTorch-based framework, that also enables to provide research contributionswith pre-trained models to others. Codes and pre-trained weights for skullreconstruction are publicly available at:", "output": "Open-Source Skull Reconstruction with MONAI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most neural networks for computer vision are designed to infer using RGBimages. However, these RGB images are commonly encoded in JPEG before saving todisk; decoding them imposes an unavoidable overhead for RGB networks. Instead,our work focuses on training Vision Transformers (ViT) directly from theencoded features of JPEG. This way, we can avoid most of the decoding overhead,accelerating data load. Existing works have studied this aspect but they focuson CNNs. Due to how these encoded features are structured, CNNs require heavymodification to their architecture to accept such data. Here, we show that thisis not the case for ViTs. In addition, we tackle data augmentation directly onthese encoded features, which to our knowledge, has not been explored in-depthfor training in this setting. With these two improvements -- ViT and dataaugmentation -- we show that our ViT-Ti model achieves up to 39.2% fastertraining and 17.9% faster inference with no accuracy loss compared to the RGBcounterpart.", "output": "RGB no more: Minimally-decoded JPEG Vision Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of deep learning based image representation learning (IRL)methods has attracted great attention for various image understanding problems.Most of these methods require the availability of a high quantity and qualityof annotated training images, which can be time-consuming and costly to gather.To reduce labeling costs, crowdsourced data, automatic labeling procedures orcitizen science projects can be considered. However, such approaches increasethe risk of including label noise in training data. It may result inoverfitting on noisy labels when discriminative reasoning is employed. Thisleads to sub-optimal learning procedures, and thus inaccurate characterizationof images. To address this, we introduce a generative reasoning integratedlabel noise robust deep representation learning (GRID) approach. Our approachaims to model the complementary characteristics of discriminative andgenerative reasoning for IRL under noisy labels. To this end, we firstintegrate generative reasoning into discriminative reasoning through asupervised variational autoencoder. This allows GRID to automatically detecttraining samples with noisy labels. Then, through our label noise robust hybridrepresentation learning strategy, GRID adjusts the whole learning procedure forIRL of these samples through generative reasoning and that of other samplesthrough discriminative reasoning. Our approach learns discriminative imagerepresentations while preventing interference of noisy labels independentlyfrom the IRL method being selected. Thus, unlike the existing methods, GRIDdoes not depend on the type of annotation, neural network architecture, lossfunction or learning task, and thus can be directly utilized for variousproblems. Experimental results show its effectiveness compared tostate-of-the-art methods. The code of GRID is publicly available at", "output": "Generative Reasoning Integrated Label Noise Robust Deep Image Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A prominent paradigm for graph neural networks is based on themessage-passing framework. In this framework, information communication isrealized only between neighboring nodes. The challenge of approaches that usethis paradigm is to ensure efficient and accurate long-distance communicationbetween nodes, as deep convolutional networks are prone to oversmoothing. Inthis paper, we present a novel method based on time derivative graph diffusion(TIDE) to overcome these structural limitations of the message-passingframework. Our approach allows for optimizing the spatial extent of diffusionacross various tasks and network channels, thus enabling medium andlong-distance communication efficiently. Furthermore, we show that ourarchitecture design also enables local message-passing and thus inherits fromthe capabilities of local message-passing approaches. We show that on bothwidely used graph benchmarks and synthetic mesh and graph datasets, theproposed framework outperforms state-of-the-art methods by a significant margin", "output": "TIDE: Time Derivative Diffusion for Deep Learning on Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Point cloud registration (PCR) is a popular research topic in computervision. Recently, the registration method in an evolutionary way has receivedcontinuous attention because of its robustness to the initial pose andflexibility in objective function design. However, most evolving registrationmethods cannot tackle the local optimum well and they have rarely investigatedthe success ratio, which implies the probability of not falling into localoptima and is closely related to the practicality of the algorithm.Evolutionary multi-task optimization (EMTO) is a widely used paradigm, whichcan boost exploration capability through knowledge transfer among relatedtasks. Inspired by this concept, this study proposes a novel evolvingregistration algorithm via EMTO, where the multi-task configuration is based onthe idea of solution space cutting. Concretely, one task searching in cut spaceassists another task with complex function landscape in escaping from localoptima and enhancing successful registration ratio. To reduce unnecessarycomputational cost, a sparse-to-dense strategy is proposed. In addition, anovel fitness function robust to various overlap rates as well as aproblem-specific metric of computational cost is introduced. Compared with 8evolving approaches, 4 traditional approaches and 3 deep learning approaches onthe object-scale and scene-scale registration datasets, experimental resultsdemonstrate that the proposed method has superior performances in terms ofprecision and tackling local optima.", "output": "Evolutionary Multitasking with Solution Space Cutting for Point Cloud Registration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we examine the effectiveness of pre-training for visuo-motorcontrol tasks. We revisit a simple Learning-from-Scratch (LfS) baseline thatincorporates data augmentation and a shallow ConvNet, and find that thisbaseline is surprisingly competitive with recent approaches (PVR, MVP, R3M)that leverage frozen visual representations trained on large-scale visiondatasets -- across a variety of algorithms, task domains, and metrics insimulation and on a real robot. Our results demonstrate that these methods arehindered by a significant domain gap between the pre-training datasets andcurrent benchmarks for visuo-motor control, which is alleviated by finetuning.Based on our findings, we provide recommendations for future research inpre-training for control and hope that our simple yet strong baseline will aidin accurately benchmarking progress in this area.", "output": "On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Natural data is redundant yet predominant architectures tile computationuniformly across their input and output space. We propose the RecurrentInterface Networks (RINs), an attention-based architecture that decouples itscore computation from the dimensionality of the data, enabling adaptivecomputation for more scalable generation of high-dimensional data. RINs focusthe bulk of computation (i.e. global self-attention) on a set of latent tokens,using cross-attention to read and write (i.e. route) information between latentand data tokens. Stacking RIN blocks allows bottom-up (data to latent) andtop-down (latent to data) feedback, leading to deeper and more expressiverouting. While this routing introduces challenges, this is less problematic inrecurrent computation settings where the task (and routing problem) changesgradually, such as iterative generation with diffusion models. We show how toleverage recurrence by conditioning the latent tokens at each forward pass ofthe reverse diffusion process with those from prior computation, i.e. latentself-conditioning. RINs yield state-of-the-art pixel diffusion models for imageand video generation, scaling to 1024X1024 images without cascades or guidance,while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.", "output": "Scalable Adaptive Computation for Iterative Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mixup is a popular data augmentation technique for training deep neuralnetworks where additional samples are generated by linearly interpolating pairsof inputs and their labels. This technique is known to improve thegeneralization performance in many learning paradigms and applications. In thiswork, we first analyze Mixup and show that it implicitly regularizes infinitelymany directional derivatives of all orders. Based on this new insight, wepropose an improved version of Mixup, theoretically justified to deliver bettergeneralization performance than the vanilla Mixup. To demonstrate theeffectiveness of the proposed method, we conduct experiments across variousdomains such as images, tabular data, speech, and graphs. Our results show thatthe proposed method improves Mixup across multiple datasets using a variety ofarchitectures, for instance, exhibiting an improvement over Mixup by 0.8% inImageNet top-1 accuracy.", "output": "MixupE: Understanding and Improving Mixup from Directional Derivative Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "EXplainable Artificial Intelligence (XAI) is a vibrant research topic in theartificial intelligence community, with growing interest across methods anddomains. Much has been written about the subject, yet XAI still lacks sharedterminology and a framework capable of providing structural soundness toexplanations. In our work, we address these issues by proposing a noveldefinition of explanation that is a synthesis of what can be found in theliterature. We recognize that explanations are not atomic but the combinationof evidence stemming from the model and its input-output mapping, and the humaninterpretation of this evidence. Furthermore, we fit explanations into theproperties of faithfulness (i.e., the explanation being a true description ofthe model's inner workings and decision-making process) and plausibility (i.e.,how much the explanation looks convincing to the user). Using our proposedtheoretical framework simplifies how these properties are operationalized andit provides new insight into common explanation methods that we analyze as casestudies.", "output": "A Theoretical Framework for AI Models Explainability with Application in Biomedicine."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image segmentation is the technique that helps doctor view and has aprecise diagnosis, particularly in Colorectal Cancer. Specifically, with theincrease in cases, the diagnosis and identification need to be faster and moreaccurate for many patients; in endoscopic images, the segmentation task hasbeen vital to helping the doctor identify the position of the polyps or theache in the system correctly. As a result, many efforts have been made to applydeep learning to automate polyp segmentation, mostly to ameliorate the U-shapestructure. However, the simple skip connection scheme in UNet leads todeficient context information and the semantic gap between feature maps fromthe encoder and decoder. To deal with this problem, we propose a novelframework composed of ConvNeXt backbone and Multi Kernel Positional Embeddingblock. Thanks to the suggested module, our method can attain better accuracyand generalization in the polyps segmentation task. Extensive experiments showthat our model achieves the Dice coefficient of 0.8818 and the IOU score of0.8163 on the Kvasir-SEG dataset. Furthermore, on various datasets, we makecompetitive achievement results with other previous state-of-the-art methods.", "output": "Multi Kernel Positional Embedding ConvNeXt for Polyp Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale chest x-ray datasets have been curated for the detection ofabnormalities using deep learning, with the potential to provide substantialbenefits across many clinical applications. However, each dataset focuses onlyon a subset of findings that can be simultaneously present in a patient, makingit challenging to train models that aggregate multiple datasets together.Therefore, data harmonization is crucial to leverage these datasets inaggregate to train clinically useful models with a complete representation ofabnormalities that may occur within the thorax. To that end, we proposesurgical aggregation, a collaborative learning framework for harmonizing andaggregating knowledge from distributed heterogeneous datasets with partialannotations. We evaluate surgical aggregation across synthetic and real-worldheterogeneous datasets with partial annotations. Our results indicate thatsurgical aggregation outperforms current strategies, generalizes better, andhas the potential to facilitate the development of clinically useful modelseven when using datasets with heterogeneous disease labels.", "output": "Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The cost of vision-and-language pre-training has become increasinglyprohibitive due to end-to-end training of large-scale models. This paperproposes BLIP-2, a generic and efficient pre-training strategy that bootstrapsvision-language pre-training from off-the-shelf frozen pre-trained imageencoders and frozen large language models. BLIP-2 bridges the modality gap witha lightweight Querying Transformer, which is pre-trained in two stages. Thefirst stage bootstraps vision-language representation learning from a frozenimage encoder. The second stage bootstraps vision-to-language generativelearning from a frozen language model. BLIP-2 achieves state-of-the-artperformance on various vision-language tasks, despite having significantlyfewer trainable parameters than existing methods. For example, our modeloutperforms Flamingo80B by 8.7% on zero-shot VQAv2 with 54x fewer trainableparameters. We also demonstrate the model's emerging capabilities of zero-shotimage-to-text generation that can follow natural language instructions.", "output": "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose an efficient method to ground pretrained text-only language modelsto the visual domain, enabling them to process arbitrarily interleavedimage-and-text data, and generate text interleaved with retrieved images. Ourmethod leverages the abilities of language models learnt from large scaletext-only pretraining, such as in-context learning and free-form textgeneration. We keep the language model frozen, and finetune input and outputlinear layers to enable cross-modality interactions. This allows our model toprocess arbitrarily interleaved image-and-text inputs, and generate free-formtext interleaved with retrieved images. We achieve strong zero-shot performanceon grounded tasks such as contextual image retrieval and multimodal dialogue,and showcase compelling interactive abilities. Our approach works with anyoff-the-shelf language model and paves the way towards an effective, generalsolution for leveraging pretrained language models in visually groundedsettings.", "output": "Grounding Language Models to Images for Multimodal Inputs and Outputs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we provide an intuitive viewing to simplify the Siamese-basedtrackers by converting the tracking task to a classification. Under thisviewing, we perform an in-depth analysis for them through visual simulationsand real tracking examples, and find that the failure cases in some challengingsituations can be regarded as the issue of missing decisive samples in offlinetraining. Since the samples in the initial (first) frame contain richsequence-specific information, we can regard them as the decisive samples torepresent the whole sequence. To quickly adapt the base model to new scenes, acompact latent network is presented via fully using these decisive samples.Specifically, we present a statistics-based compact latent feature for fastadjustment by efficiently extracting the sequence-specific information.Furthermore, a new diverse sample mining strategy is designed for training tofurther improve the discrimination ability of the proposed compact latentnetwork. Finally, a conditional updating strategy is proposed to efficientlyupdate the basic models to handle scene variation during the tracking phase. Toevaluate the generalization ability and effectiveness and of our method, weapply it to adjust three classical Siamese-based trackers, namely SiamRPN++,SiamFC, and SiamBAN. Extensive experimental results on six recent datasetsdemonstrate that all three adjusted trackers obtain the superior performance interms of the accuracy, while having high running speed.", "output": "Adaptive Siamese Tracking with a Compact Latent Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the employment of deep learning methods has led to severalsignificant breakthroughs in artificial intelligence. Different fromtraditional machine learning models, deep learning-based approaches are able toextract features autonomously from raw data. This allows for bypassing thefeature engineering process, which is generally considered to be botherror-prone and tedious. Moreover, deep learning strategies often outperformtraditional models in terms of accuracy.", "output": "Stop overkilling simple tasks with black-box models and use transparent models instead."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately detecting crack boundaries is crucial for reliability assessmentand risk management of structures and materials, such as structural healthmonitoring, diagnostics, prognostics, and maintenance scheduling. Uncertaintyquantification of crack detection is challenging due to various stochasticfactors, such as measurement noises, signal processing, and modelsimplifications. A machine learning-based approach is proposed to quantify bothepistemic and aleatoric uncertainties concurrently. We introduce a BayesianBoundary-Aware Convolutional Network (B-BACN) that emphasizes uncertainty-awareboundary refinement to generate precise and reliable crack boundary detections.The proposed method employs a multi-task learning approach, where we use MonteCarlo Dropout to learn the epistemic uncertainty and a Gaussian samplingfunction to predict each sample's aleatoric uncertainty. Moreover, we include aboundary refinement loss to B-BACN to enhance the determination of defectboundaries. The proposed method is demonstrated with benchmark experimentalresults and compared with several existing methods. The experimental resultsillustrate the effectiveness of our proposed approach in uncertainty-awarecrack boundary detection, minimizing misclassification rate, and improvingmodel calibration capabilities.", "output": "B-BACN: Bayesian Boundary-Aware Convolutional Network for Crack Characterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Monocular depth estimation plays a fundamental role in computer vision. Dueto the costly acquisition of depth ground truth, self-supervised methods thatleverage adjacent frames to establish a supervisory signal have emerged as themost promising paradigms. In this work, we propose two novel ideas to improveself-supervised monocular depth estimation: 1) self-reference distillation and2) disparity offset refinement. Specifically, we use a parameter-optimizedmodel as the teacher updated as the training epochs to provide additionalsupervision during the training process. The teacher model has the samestructure as the student model, with weights inherited from the historicalstudent model. In addition, a multiview check is introduced to filter out theoutliers produced by the teacher model. Furthermore, we leverage the contextualconsistency between high-scale and low-scale features to obtain multiscaledisparity offsets, which are used to refine the disparity output incrementallyby aligning disparity information at different scales. The experimental resultson the KITTI and Make3D datasets show that our method outperforms previousstate-of-the-art competitors.", "output": "Self-Supervised Monocular Depth Estimation with Self-Reference Distillation and Disparity Offset Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We use concept-based interpretable models to mitigate shortcut learning.Existing methods lack interpretability. Beginning with a Blackbox, weiteratively emph{carve out} a mixture of interpretable experts (MoIE) and aemph{residual network}. Each expert explains a subset of data using FirstOrder Logic (FOL). While explaining a sample, the FOL from biased BB-derivedMoIE detects the shortcut effectively. Finetuning the BB with MetadataNormalization (MDN) eliminates the shortcut. The FOLs from thefinetuned-BB-derived MoIE verify the elimination of the shortcut. Ourexperiments show that MoIE does not hurt the accuracy of the original BB andeliminates shortcuts effectively.", "output": "Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Under good conditions, Neural Radiance Fields (NeRFs) have shown impressiveresults on novel view synthesis tasks. NeRFs learn a scene's color and densityfields by minimizing the photometric discrepancy between training views anddifferentiable renderings of the scene. Once trained from a sufficient set ofviews, NeRFs can generate novel views from arbitrary camera positions. However,the scene geometry and color fields are severely under-constrained, which canlead to artifacts, especially when trained with few input views.To alleviate this problem we learn a prior over scene geometry and color,using a denoising diffusion model (DDM). Our DDM is trained on RGBD patches ofthe synthetic Hypersim dataset and can be used to predict the gradient of thelogarithm of a joint probability distribution of color and depth patches. Weshow that, these gradients of logarithms of RGBD patch priors serve toregularize geometry and color of a scene. During NeRF training, random RGBDpatches are rendered and the estimated gradient of the log-likelihood isbackpropagated to the color and density fields. Evaluations on LLFF, the mostrelevant dataset, show that our learned prior achieves improved quality in thereconstructed geometry and improved generalization to novel views. Evaluationson DTU show improved reconstruction quality among NeRF methods.", "output": "DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While multi-modal foundation models pre-trained on large-scale data have beensuccessful in natural language understanding and vision recognition, their usein medical domains is still limited due to the fine-grained nature of medicaltasks and the high demand for domain knowledge. To address this challenge, wepropose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) whichleverages existing medical domain knowledge to guide vision-languagepre-training using paired chest X-rays and radiology reports. We evaluate KADon {four} external X-ray datasets and demonstrate that its zero-shotperformance is not only comparable to that of fully-supervised models, but alsosuperior to the average of three expert radiologists for three (out of five)pathologies with statistical significance. Moreover, when few-shot annotationis available, KAD outperforms all existing approaches in fine-tuning settings,demonstrating its potential for application in different clinical scenarios.", "output": "Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent success of ChatGPT and GPT-4 has drawn widespread attention tomultimodal dialogue systems. However, the academia community lacks a datasetthat can validate the multimodal generation capabilities of Visual LanguageModels (VLMs) in textual-visual chat tasks. In this paper, we construct two newmultimodal datasets: the synthetic CLEVR-ATVC dataset (620K) and the manuallypictured Fruit-ATVC dataset (50K), both featuring visual and text-based inputsand outputs. Additionally, to enable the multimodal system to reject humanrequests (i.e., demonstrate accountability), as in language-based ChatGPTconversations, we develop and incorporate specific rules into the datasets assupervisory signals. This allows the trained VLM to provide a yes or no answerafter visual and textual reasoning, accompanied by a language explanation as towhy the human instruction cannot be excuted. In our method, we propose atwo-state training procedure to train the image auto-encoder andauto-regressive transformer from scratch. The first state involves a discretevariational autoencoder (dVAE) to compress each image into short tokens, whichare then concatenated with text tokens as a single data stream to be fed intothe decoder-based transformer for generating visual re-creation and textualfeedback in the second state. We provide comprehensive analyses of experimentalresults in terms of re-created image quality, answer accuracy, and the modelbehavior when faced with uncertainty and imperfect user queries. We hope ourexplorations and findings contribute valuable insights regarding theaccountability of textual-visual generative models.", "output": "Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a real-world dataset of stereoscopic videos for color-mismatchcorrection. It includes real-world distortions achieved using a beam splitter.Our dataset is larger than any other for this task. We compared eightcolor-mismatch-correction methods on artificial and real-world datasets andshowed that local methods are best suited to artificial distortions and thatglobal methods are best suited to real-world distortions. Our efforts improvedon the latest local neural-network method for color-mismatch correction instereoscopic images, making it work faster and better on both artificial andreal-world distortions.", "output": "Color Mismatches in Stereoscopic Video: Real-World Dataset and Deep Correction Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There has been exploding interest in embracing Transformer-basedarchitectures for medical image segmentation. However, the lack of large-scaleannotated medical datasets make achieving performances equivalent to those innatural images challenging. Convolutional networks, in contrast, have higherinductive biases and consequently, are easily trainable to high performance.Recently, the ConvNeXt architecture attempted to modernize the standard ConvNetby mirroring Transformer blocks. In this work, we improve upon this to design amodernized and scalable convolutional architecture customized to challenges ofdata-scarce medical settings. We introduce MedNeXt, a Transformer-inspiredlarge kernel segmentation network which introduces - 1) A fully ConvNeXt 3DEncoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt upand downsampling blocks to preserve semantic richness across scales, 3) A noveltechnique to iteratively increase kernel sizes by upsampling small kernelnetworks, to prevent performance saturation on limited medical data, 4)Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt.This leads to state-of-the-art performance on 4 tasks on CT and MRI modalitiesand varying dataset sizes, representing a modernized deep architecture formedical image segmentation. Our code is made publicly available at:", "output": "MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Place recognition is an essential and challenging task in loop closing andglobal localization for robotics and autonomous driving applications.Benefiting from the recent advances in deep learning techniques, theperformance of LiDAR place recognition (LPR) has been greatly improved.However, current deep learning-based methods suffer from two major problems:poor generalization ability and catastrophic forgetting. In this paper, wepropose a continual contrastive learning method, named CCL, to tackle thecatastrophic forgetting problem and generally improve the robustness of LPRapproaches. Our CCL constructs a contrastive feature pool and utilizescontrastive loss to train more transferable representations of places. Whentransferred into new environments, our CCL continuously reviews the contrastivememory bank and applies a distribution-based knowledge distillation to maintainthe retrieval ability of the past data while continually learning to recognizenew places from the new data. We thoroughly evaluate our approach on Oxford,MulRan, and PNV datasets using three different LPR methods. The experimentalresults show that our CCL consistently improves the performance of differentmethods in different environments outperforming the state-of-the-art continuallearning method. The implementation of our method has been released at", "output": "CCL: Continual Contrastive Learning for LiDAR Place Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present LLaMA-Adapter, a lightweight adaption method to efficientlyfine-tune LLaMA into an instruction-following model. Using 52K self-instructdemonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters uponthe frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, andprepend them to the word tokens at higher transformer layers. Then, azero-initialized attention mechanism with zero gating is proposed, whichadaptively injects the new instructional cues into LLaMA, while effectivelypreserves its pre-trained knowledge. With our efficient training, LLaMA-Adaptercan generate high-quality responses, comparable to Alpaca with fully fine-tuned7B parameters. Besides language commands, our approach can be simply extendedto multi-modal instructions for learning image-conditioned LLaMA model, whichachieves superior reasoning performance on ScienceQA and COCO Captionbenchmarks. Furthermore, we also evaluate the zero-initialized attentionmechanism for fine-tuning other pre-trained models (ViT, RoBERTa) ontraditional vision and language tasks, demonstrating the superiorgeneralization capacity of our approach. Code is released at", "output": "LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The fervor for Non-Fungible Tokens (NFTs) attracted countless creators,leading to a Big Bang of digital assets driven by latent or explicit forms ofinspiration, as in many creative processes. This work exploits VisionTransformers and graph-based modeling to delve into visual inspirationphenomena between NFTs over the years. Our goals include unveiling the mainstructural traits that shape visual inspiration networks, exploring theinterrelation between visual inspiration and asset performances, investigatingcrypto influence on inspiration processes, and explaining the inspirationrelationships among NFTs. Our findings unveil how the pervasiveness ofinspiration led to a temporary saturation of the visual feature space, theimpact of the dichotomy between inspiring and inspired NFTs on their financialperformance, and an intrinsic self-regulatory mechanism between markets andinspiration waves. Our work can serve as a starting point for gaining a broaderview of the evolution of Web3.", "output": "Visually Wired NFTs: Exploring the Role of Inspiration in Non-Fungible Tokens."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model inversion (MI) attacks aim to infer and reconstruct private trainingdata by abusing access to a model. MI attacks have raised concerns about theleaking of sensitive information (e.g. private face images used in training aface recognition system). Recently, several algorithms for MI have beenproposed to improve the attack performance. In this work, we revisit MI, studytwo fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms,and propose solutions to these issues which lead to a significant boost inattack performance for all SOTA MI. In particular, our contributions aretwo-fold: 1) We analyze the optimization objective of SOTA MI algorithms, arguethat the objective is sub-optimal for achieving MI, and propose an improvedoptimization objective that boosts attack performance significantly. 2) Weanalyze \"MI overfitting\", show that it would prevent reconstructed images fromlearning semantics of training data, and propose a novel \"model augmentation\"idea to overcome this issue. Our proposed solutions are simple and improve allSOTA MI attack accuracy significantly. E.g., in the standard CelebA benchmark,our solutions improve accuracy by 11.8% and achieve for the first time over 90%attack accuracy. Our findings demonstrate that there is a clear risk of leakingsensitive information from deep learning models. We urge serious considerationto be given to the privacy implications. Our code, demo, and models areavailable at", "output": "Re-thinking Model Inversion Attacks Against Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As a neuromorphic sensor with high temporal resolution, the spike camerashows enormous potential in high-speed visual tasks. However, the high-speedsampling of light propagation processes by existing cameras brings unavoidablenoise phenomena. Eliminating the unique noise in spike stream is always a keypoint for spike-based methods. No previous work has addressed the detailednoise mechanism of the spike camera. To this end, we propose a systematic noisemodel for spike camera based on its unique circuit. In addition, we carefullyconstructed the noise evaluation equation and experimental scenarios to measurenoise variables. Based on our noise model, the first benchmark for spike streamdenoising is proposed which includes clear (noisy) spike stream. Further, wedesign a tailored spike stream denoising framework (DnSS) where denoised spikestream is obtained by decoding inferred inter-spike intervals. Experiments showthat DnSS has promising performance on the proposed benchmark. Eventually, DnSScan be generalized well on real spike stream.", "output": "Spike Stream Denoising via Spike Camera Simulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a new synthesis-based approach for batch imageprocessing. Unlike existing tools that can only apply global edits to theentire image, our method can apply fine-grained edits to individual objectswithin the image. For example, our method can selectively blur or crop specificobjects that have a certain property. To facilitate such fine-grained imageediting tasks, we propose a neuro-symbolic domain-specific language (DSL) thatcombines pre-trained neural networks for image classification with otherlanguage constructs that enable symbolic reasoning. Our method canautomatically learn programs in this DSL from user demonstrations by utilizinga novel synthesis algorithm. We have implemented the proposed technique in atool called ImageEye and evaluated it on 50 image editing tasks. Our evaluationshows that ImageEye is able to automate 96% of these tasks.", "output": "ImageEye: Batch Image Processing Using Program Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The success of the Neural Radiance Fields (NeRFs) for modeling and free-viewrendering static objects has inspired numerous attempts on dynamic scenes.Current techniques that utilize neural rendering for facilitating free-viewvideos (FVVs) are restricted to either offline rendering or are capable ofprocessing only brief sequences with minimal motion. In this paper, we presenta novel technique, Residual Radiance Field or ReRF, as a highly compact neuralrepresentation to achieve real-time FVV rendering on long-duration dynamicscenes. ReRF explicitly models the residual information between adjacenttimestamps in the spatial-temporal feature space, with a globalcoordinate-based tiny MLP as the feature decoder. Specifically, ReRF employs acompact motion grid along with a residual feature grid to exploit inter-framefeature similarities. We show such a strategy can handle large motions withoutsacrificing quality. We further present a sequential training scheme tomaintain the smoothness and the sparsity of the motion/residual grids. Based onReRF, we design a special FVV codec that achieves three orders of magnitudescompression rate and provides a companion ReRF player to support onlinestreaming of long-duration FVVs of dynamic scenes. Extensive experimentsdemonstrate the effectiveness of ReRF for compactly representing dynamicradiance fields, enabling an unprecedented free-viewpoint viewing experience inspeed and quality.", "output": "Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since the number of incident energies is limited, it is difficult to directlyacquire hyperspectral images (HSI) with high spatial resolution. Consideringthe high dimensionality and correlation of HSI, super-resolution (SR) of HSIremains a challenge in the absence of auxiliary high-resolution images.Furthermore, it is very important to extract the spatial features effectivelyand make full use of the spectral information. This paper proposes a novel HSIsuper-resolution algorithm, termed dual-domain network based on hybridconvolution (SRDNet). Specifically, a dual-domain network is designed to fullyexploit the spatial-spectral and frequency information among the hyper-spectraldata. To capture inter-spectral self-similarity, a self-attention learningmechanism (HSL) is devised in the spatial domain. Meanwhile the pyramidstructure is applied to increase the acceptance field of attention, whichfurther reinforces the feature representation ability of the network. Moreover,to further improve the perceptual quality of HSI, a frequency loss(HFL) isintroduced to optimize the model in the frequency domain. The dynamic weightingmechanism drives the network to gradually refine the generated frequency andexcessive smoothing caused by spatial loss. Finally, In order to better fullyobtain the mapping relationship between high-resolution space andlow-resolution space, a hybrid module of 2D and 3D units with progressiveupsampling strategy is utilized in our method. Experiments on a widely usedbenchmark dataset illustrate that the proposed SRDNet method enhances thetexture information of HSI and is superior to state-of-the-art methods.", "output": "Hyperspectral Image Super-Resolution via Dual-domain Network Based on Hybrid Convolution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Chen et al. [Chen2022] recently published the article 'Fast and scalablesearch of whole-slide images via self-supervised deep learning' in NatureBiomedical Engineering. The authors call their method 'self-supervised imagesearch for histology', short SISH. We express our concerns that SISH is anincremental modification of Yottixel, has used MinMax binarization but does notcite the original works, and is based on a misnomer 'self-supervised imagesearch'. As well, we point to several other concerns regarding experiments andcomparisons performed by Chen et al.", "output": "Comments on 'Fast and scalable search of whole-slide images via self-supervised deep learning'."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-definition (HD) semantic maps are crucial in enabling autonomousvehicles to navigate urban environments. The traditional method of creatingoffline HD maps involves labor-intensive manual annotation processes, which arenot only costly but also insufficient for timely updates. Recent studies haveproposed an alternative approach that generates local maps using online sensorobservations. However, this approach is limited by the sensor's perceptionrange and its susceptibility to occlusions. In this study, we propose NeuralMap Prior (NMP), a neural representation of global maps. This representationautomatically updates itself and improves the performance of local mapinference. Specifically, we utilize two approaches to achieve this. Firstly, tointegrate a strong map prior into local map inference, we applycross-attention, a mechanism that dynamically identifies correlations betweencurrent and prior features. Secondly, to update the global neural map prior, weutilize a learning-based fusion module that guides the network in fusingfeatures from previous traversals. Our experimental results, based on thenuScenes dataset, demonstrate that our framework is highly compatible withvarious map segmentation and detection architectures. It significantly improvesmap prediction performance, even in challenging weather conditions andsituations with a longer perception range. To the best of our knowledge, thisis the first learning-based system for creating a global map prior.", "output": "Neural Map Prior for Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision Transformers have shown promising progress in various object detectiontasks, including monocular 2D/3D detection and surround-view 3D detection.However, when used in essential and classic stereo 3D object detection,directly adopting those surround-view Transformers leads to slow convergenceand significant precision drops. We argue that one of the causes of this defectis that the surround-view Transformers do not consider the stereo-specificimage correspondence information. In a surround-view system, the overlappingareas are small, and thus correspondence is not a primary issue. In this paper,we explore the model design of vision Transformers in stereo 3D objectdetection, focusing particularly on extracting and encoding the task-specificimage correspondence information. To achieve this goal, we present TS3D, aTransformer-based Stereo-aware 3D object detector. In the TS3D, aDisparity-Aware Positional Encoding (DAPE) model is proposed to embed the imagecorrespondence information into stereo features. The correspondence is encodedas normalized disparity and is used in conjunction with sinusoidal 2Dpositional encoding to provide the location information of the 3D scene. Toextract enriched multi-scale stereo features, we propose a Stereo ReservingFeature Pyramid Network (SRFPN). The SRFPN is designed to reserve thecorrespondence information while fusing intra-scale and aggregating cross-scalestereo features. Our proposed TS3D achieves a 41.29% Moderate Car detectionaverage precision on the KITTI test set and takes 88 ms to detect objects fromeach binocular image pair. It is competitive with advanced counterparts interms of both precision and inference speed.", "output": "Transformer-based stereo-aware 3D object detection from binocular images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Face recognition (FR) systems continue to spread in our daily lives with anincreasing demand for higher explainability and interpretability of FR systemsthat are mainly based on deep learning. While bias across demographic groups inFR systems has already been studied, the bias of explainability tools has notyet been investigated. As such tools aim at steering further development andenabling a better understanding of computer vision problems, the possibleexistence of bias in their outcome can lead to a chain of biased decisions. Inthis paper, we explore the existence of bias in the outcome of explainabilitytools by investigating the use case of face presentation attack detection. Byutilizing two different explainability tools on models with different levels ofbias, we investigate the bias in the outcome of such tools. Our study showsthat these tools show clear signs of gender bias in the quality of theirexplanations.", "output": "Are Explainability Tools Gender Biased? A Case Study on Face Presentation Attack Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robotic perception requires the modeling of both 3D geometry and semantics.Existing methods typically focus on estimating 3D bounding boxes, neglectingfiner geometric details and struggling to handle general, out-of-vocabularyobjects. 3D occupancy prediction, which estimates the detailed occupancy statesand semantics of a scene, is an emerging task to overcome these limitations. Tosupport 3D occupancy prediction, we develop a label generation pipeline thatproduces dense, visibility-aware labels for any given scene. This pipelinecomprises three stages: voxel densification, occlusion reasoning, andimage-guided voxel refinement. We establish two benchmarks, derived from theWaymo Open Dataset and the nuScenes Dataset, namely Occ3D-Waymo andOcc3D-nuScenes benchmarks. Furthermore, we provide an extensive analysis of theproposed dataset with various baseline models. Lastly, we propose a new model,dubbed Coarse-to-Fine Occupancy (CTF-Occ) network, which demonstrates superiorperformance on the Occ3D benchmarks. The code, data, and benchmarks arereleased at ", "output": "Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion-based models have shown the merits of generating high-qualityvisual data while preserving better diversity in recent studies. However, suchobservation is only justified with curated data distribution, where the datasamples are nicely pre-processed to be uniformly distributed in terms of theirlabels. In practice, a long-tailed data distribution appears more common andhow diffusion models perform on such class-imbalanced data remains unknown. Inthis work, we first investigate this problem and observe significantdegradation in both diversity and fidelity when the diffusion model is trainedon datasets with class-imbalanced distributions. Especially in tail classes,the generations largely lose diversity and we observe severe mode-collapseissues. To tackle this problem, we set from the hypothesis that the datadistribution is not class-balanced, and propose Class-Balancing DiffusionModels (CBDM) that are trained with a distribution adjustment regularizer as asolution. Experiments show that images generated by CBDM exhibit higherdiversity and quality in both quantitative and qualitative ways. Our methodbenchmarked the generation results on CIFAR100/CIFAR100LT dataset and showsoutstanding performance on the downstream recognition task.", "output": "Class-Balancing Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "NeRF acquisition typically requires careful choice of near planes for thedifferent cameras or suffers from background collapse, creating floatingartifacts on the edges of the captured scene. The key insight of this work isthat background collapse is caused by a higher density of samples in regionsnear cameras. As a result of this sampling imbalance, near-camera volumesreceive significantly more gradients, leading to incorrect density buildup. Wepropose a gradient scaling approach to counter-balance this sampling imbalance,removing the need for near planes, while preventing background collapse. Ourmethod can be implemented in a few lines, does not induce any significantoverhead, and is compatible with most NeRF implementations.", "output": "Floaters No More: Radiance Field Gradient Scaling for Improved Near-Camera Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately segmenting blood vessels in retinal fundus images is crucial inthe early screening, diagnosing, and evaluating some ocular diseases, yet itposes a nontrivial uncertainty for the segmentation task due to various factorssuch as significant light variations, uneven curvilinear structures, andnon-uniform contrast. As a result, a useful approach based on multipleattention mechanisms and deep learning is proposed to accurately detect bloodvessels in retinal fundus images. To enrich contextual information for the lossof scene information compensation, an attention fusion mechanism that combinesthe channel attention with spatial attention mechanisms constructed byTransformer is employed to extract various features of blood vessels fromretinal fundus images in both spatial and channel dimensions. Subsequently, aunique spatial attention mechanism is introduced in the skip connection tofilter out redundant information and noise from low-level features, thusenabling better integration with high-level features. In addition, a DropOutlayer is employed to randomly discard some neurons, which can preventoverfitting of the deep learning network and improve its generalizationperformance.", "output": "Fundus vascular image segmentation based on multiple attention mechanisms and deep learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents an unsupervised approach for writer retrieval based onclustering SIFT descriptors detected at keypoint locations resulting inpseudo-cluster labels. With those cluster labels, a residual network followedby our proposed NetRVLAD, an encoding layer with reduced complexity compared toNetVLAD, is trained on 32x32 patches at keypoint locations. Additionally, wesuggest a graph-based reranking algorithm called SGR to exploit similarities ofthe page embeddings to boost the retrieval performance. Our approach isevaluated on two historical datasets (Historical-WI and HisIR19). We include anevaluation of different backbones and NetRVLAD. It competes with related workon historical datasets without using explicit encodings. We set a newState-of-the-art on both datasets by applying our reranking scheme and showthat our approach achieves comparable performance on a modern dataset as well.", "output": "Towards Writer Retrieval for Historical Datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale pre-training and instruction tuning have been successful atcreating general-purpose language models with broad competence. However,building general-purpose vision-language models is challenging due to the richinput distributions and task diversity resulting from the additional visualinput. Although vision-language pretraining has been widely studied,vision-language instruction tuning remains under-explored. In this paper, weconduct a systematic and comprehensive study on vision-language instructiontuning based on the pretrained BLIP-2 models. We gather 26 publicly availabledatasets, covering a wide variety of tasks and capabilities, and transform theminto instruction tuning format. Additionally, we introduce an instruction-awareQuery Transformer, which extracts informative features tailored to the giveninstruction. Trained on 13 held-in datasets, InstructBLIP attainsstate-of-the-art zero-shot performance across all 13 held-out datasets,substantially outperforming BLIP-2 and larger Flamingo models. Our models alsolead to state-of-the-art performance when finetuned on individual downstreamtasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts).Furthermore, we qualitatively demonstrate the advantages of InstructBLIP overconcurrent multimodal models. All InstructBLIP models are open-sourced at", "output": "InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual information extraction (VIE), which aims to simultaneously perform OCRand information extraction in a unified framework, has drawn increasingattention due to its essential role in various applications like understandingreceipts, goods, and traffic signs. However, as existing benchmark datasets forVIE mainly consist of document images without the adequate diversity of layoutstructures, background disturbs, and entity categories, they cannot fullyreveal the challenges of real-world applications. In this paper, we propose alarge-scale dataset consisting of camera images for VIE, which contains notonly the larger variance of layout, backgrounds, and fonts but also much moretypes of entities. Besides, we propose a novel framework for end-to-end VIEthat combines the stages of OCR and information extraction in an end-to-endlearning fashion. Different from the previous end-to-end approaches thatdirectly adopt OCR features as the input of an information extraction module,we propose to use contrastive learning to narrow the semantic gap caused by thedifference between the tasks of OCR and information extraction. We evaluate theexisting end-to-end methods for VIE on the proposed dataset and observe thatthe performance of these methods has a distinguishable drop from SROIE (awidely used English dataset) to our proposed dataset due to the larger varianceof layout and entities. These results demonstrate our dataset is more practicalfor promoting advanced VIE algorithms. In addition, experiments demonstratethat the proposed VIE method consistently achieves the obvious performancegains on the proposed and SROIE datasets.", "output": "Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamics prediction, which is the problem of predicting future states ofscene objects based on current and prior states, is drawing increasingattention as an instance of learning physics. To solve this problem, RegionProposal Convolutional Interaction Network (RPCIN), a vision-based model, wasproposed and achieved state-of-the-art performance in long-term prediction.RPCIN only takes raw images and simple object descriptions, such as thebounding box and segmentation mask of each object, as input. However, despiteits success, the model's capability can be compromised under conditions ofenvironment misalignment. In this paper, we investigate two challengingconditions for environment misalignment: Cross-Domain and Cross-Context byproposing four datasets that are designed for these challenges: SimB-Border,SimB-Split, BlenB-Border, and BlenB-Split. The datasets cover two domains andtwo contexts. Using RPCIN as a probe, experiments conducted on the combinationsof the proposed datasets reveal potential weaknesses of the vision-basedlong-term dynamics prediction model. Furthermore, we propose a promisingdirection to mitigate the Cross-Domain challenge and provide concrete evidencesupporting such a direction, which provides dramatic alleviation of thechallenge on the proposed datasets.", "output": "A Critical View of Vision-Based Long-Term Dynamics Prediction Under Environment Misalignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, polyp segmentation has gained significant importance, andmany methods have been developed using CNN, Vision Transformer, and Transformertechniques to achieve competitive results. However, these methods often facedifficulties when dealing with out-of-distribution datasets, missingboundaries, and small polyps. In 2022, Meta-Former was introduced as a newbaseline for vision, which not only improved the performance of multi-taskcomputer vision but also addressed the limitations of the Vision Transformerand CNN family backbones. To further enhance segmentation, we propose a fusionof Meta-Former with UNet, along with the introduction of a Multi-scaleUpsampling block with a level-up combination in the decoder stage to enhancethe texture, also we propose the Convformer block base on the idea of theMeta-former to enhance the crucial information of the local feature. Theseblocks enable the combination of global information, such as the overall shapeof the polyp, with local information and boundary information, which is crucialfor the decision of the medical segmentation. Our proposed approach achievedcompetitive performance and obtained the top result in the State of the Art onthe CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG,others are out-of-distribution datasets. The implementation can be found at:", "output": "Meta-Polyp: a baseline for efficient Polyp segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Online surgical phase recognition plays a significant role towards buildingcontextual tools that could quantify performance and oversee the execution ofsurgical workflows. Current approaches are limited since they train spatialfeature extractors using frame-level supervision that could lead to incorrectpredictions due to similar frames appearing at different phases, and poorlyfuse local and global features due to computational constraints which canaffect the analysis of long videos commonly encountered in surgicalinterventions. In this paper, we present a two-stage method, called Long VideoTransformer (LoViT) for fusing short- and long-term temporal information thatcombines a temporally-rich spatial feature extractor and a multi-scale temporalaggregator consisting of two cascaded L-Trans modules based on self-attention,followed by a G-Informer module based on ProbSparse self-attention forprocessing global temporal information. The multi-scale temporal head thencombines local and global features and classifies surgical phases using phasetransition-aware supervision. Our approach outperforms state-of-the-art methodson the Cholec80 and AutoLaparo datasets consistently. Compared to Trans-SVNet,LoViT achieves a 2.4 pp (percentage point) improvement in video-level accuracyon Cholec80 and a 3.1 pp improvement on AutoLaparo. Moreover, it achieves a 5.3pp improvement in phase-level Jaccard on AutoLaparo and a 1.55 pp improvementon Cholec80. Our results demonstrate the effectiveness of our approach inachieving state-of-the-art performance of surgical phase recognition on twodatasets of different surgical procedures and temporal sequencingcharacteristics whilst introducing mechanisms that cope with long videos.", "output": "LoViT: Long Video Transformer for Surgical Phase Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training medical AI algorithms requires large volumes of accurately labeleddatasets, which are difficult to obtain in the real world. Synthetic imagesgenerated from deep generative models can help alleviate the data scarcityproblem, but their effectiveness relies on their fidelity to real-world images.Typically, researchers select synthesis models based on image qualitymeasurements, prioritizing synthetic images that appear realistic. However, ourempirical analysis shows that high-fidelity and visually appealing syntheticimages are not necessarily superior. In fact, we present a case wherelow-fidelity synthetic images outperformed their high-fidelity counterparts indownstream tasks. Our findings highlight the importance of comprehensiveanalysis before incorporating synthetic data into real-world applications. Wehope our results will raise awareness among the research community of the valueof low-fidelity synthetic images in medical AI algorithm training.", "output": "The Beauty or the Beast: Which Aspect of Synthetic Medical Images Deserves Our Focus?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models, either with or without text condition, have demonstratedimpressive capability in synthesizing photorealistic images given a few or evenno words. These models may not fully satisfy user need, as normal users orartists intend to control the synthesized images with specific guidance, likeoverall layout, color, structure, object shape, and so on. To adapt diffusionmodels for controllable image synthesis, several methods have been proposed toincorporate the required conditions as regularization upon the intermediatefeatures of the diffusion denoising network. These methods, known asearly-constraint ones in this paper, have difficulties in handling multipleconditions with a single solution. They intend to train separate models foreach specific condition, which require much training cost and result innon-generalizable solutions. To address these difficulties, we propose a newapproach namely late-constraint: we leave the diffusion networks unchanged, butconstrain its output to be aligned with the required conditions. Specifically,we train a lightweight condition adapter to establish the correlation betweenexternal conditions and internal representations of diffusion models. Duringthe iterative denoising process, the conditional guidance is sent intocorresponding condition adapter to manipulate the sampling process with theestablished correlation. We further equip the introduced late-constraintstrategy with a timestep resampling method and an early stopping technique,which boost the quality of synthesized image meanwhile complying with theguidance. Our method outperforms the existing early-constraint methods andgeneralizes better to unseen condition. Our code would be available.", "output": "Late-Constraint Diffusion Guidance for Controllable Image Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Flow field segmentation and classification help researchers to understandvortex structure and thus turbulent flow. Existing deep learning methods mainlybased on global information and focused on 2D circumstance. Based on flow fieldtheory, we propose novel flow field segmentation and classification deeplearning methods in three-dimensional space. We construct segmentationcriterion based on local velocity information and classification criterionbased on the relationship between local vorticity and vortex wake, to identifyvortex structure in 3D flow field, and further classify the type of vortexwakes accurately and rapidly. Simulation experiment results showed that,compared with existing methods, our segmentation method can identify the vortexarea more accurately, while the time consumption is reduced more than 50%; ourclassification method can reduce the time consumption by more than 90% whilemaintaining the same classification accuracy level.", "output": "Novel deep learning methods for 3D flow field segmentation and classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Biogenic Volatile Organic Compounds (BVOCs) emitted from the terrestrialecosystem into the Earth's atmosphere are an important component of atmosphericchemistry. Due to the scarcity of measurement, a reliable enhancement of BVOCsemission maps can aid in providing denser data for atmospheric chemical,climate, and air quality models. In this work, we propose a strategy tosuper-resolve coarse BVOC emission maps by simultaneously exploiting thecontributions of different compounds. To this purpose, we first accuratelyinvestigate the spatial inter-connections between several BVOC species. Then,we exploit the found similarities to build a Multi-Image Super-Resolution(MISR) system, in which a number of emission maps associated with diversecompounds are aggregated to boost Super-Resolution (SR) performance. We comparedifferent configurations regarding the species and the number of joined BVOCs.Our experimental results show that incorporating BVOCs' relationship into theprocess can substantially improve the accuracy of the super-resolved maps.Interestingly, the best results are achieved when we aggregate the emissionmaps of strongly uncorrelated compounds. This peculiarity seems to confirm whatwas already guessed for other data-domains, i.e., joined uncorrelatedinformation are more helpful than correlated ones to boost MISR performance.Nonetheless, the proposed work represents the first attempt in SR of BVOCemissions through the fusion of multiple different compounds.", "output": "Multi-BVOC Super-Resolution Exploiting Compounds Inter-Connection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "After discovering that Language Models (LMs) can be good in-context few-shotlearners, numerous strategies have been proposed to optimize in-contextsequence configurations. Recently, researchers in Vision-Language (VL) domainsalso develop their few-shot learners, while they only use the simplest way,i.e., randomly sampling, to configure in-context image-text pairs. In order toexplore the effects of varying configurations on VL in-context learning, wedevised four strategies for image selection and four for caption assignment toconfigure in-context image-text pairs for image captioning. Here ImageCaptioning is used as the case study since it can be seen as thevisually-conditioned LM. Our comprehensive experiments yield twocounter-intuitive but valuable insights, highlighting the distinctcharacteristics of VL in-context learning due to multi-modal synergy, ascompared to the NLP case.", "output": "Exploring Diverse In-Context Configurations for Image Captioning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, growing interest has been aroused in extending the multimodalcapability of large language models (LLMs), e.g., vision-language (VL)learning, which is regarded as the next milestone of artificial generalintelligence. However, existing solutions are prohibitively expensive, whichnot only need to optimize excessive parameters, but also require anotherlarge-scale pre-training before VL instruction tuning. In this paper, wepropose a novel and affordable solution for the effective VL adaption of LLMs,called Mixture-of-Modality Adaptation (MMA). Instead of using large neuralnetworks to connect the image encoder and LLM, MMA adopts lightweight modules,i.e., adapters, to bridge the gap between LLMs and VL tasks, which also enablesthe joint optimization of the image and language models. Meanwhile, MMA is alsoequipped with a routing algorithm to help LLMs achieve an automatic shiftbetween single- and multi-modal instructions without compromising their abilityof natural language understanding. To validate MMA, we apply it to a recent LLMcalled LLaMA and term this formed large vision-language instructed model asLaVIN. To validate MMA and LaVIN, we conduct extensive experiments under twosetups, namely multimodal science question answering and multimodal dialogue.The experimental results not only demonstrate the competitive performance andthe superior training efficiency of LaVIN than existing multimodal LLMs, butalso confirm its great potential as a general-purpose chatbot. Moreimportantly, the actual expenditure of LaVIN is extremely cheap, e.g., only 1.4training hours with 3.8M trainable parameters, greatly confirming theeffectiveness of MMA. Our project is released at", "output": "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic occupancy prediction aims to infer dense geometry and semantics ofsurroundings for an autonomous agent to operate safely in the 3D environment.Existing occupancy prediction methods are almost entirely trained onhuman-annotated volumetric data. Although of high quality, the generation ofsuch 3D annotations is laborious and costly, restricting them to a few specificobject categories in the training dataset. To address this limitation, thispaper proposes Open Vocabulary Occupancy (OVO), a novel approach that allowssemantic occupancy prediction of arbitrary classes but without the need for 3Dannotations during training. Keys to our approach are (1) knowledgedistillation from a pre-trained 2D open-vocabulary segmentation model to the 3Doccupancy network, and (2) pixel-voxel filtering for high-quality training datageneration. The resulting framework is simple, compact, and compatible withmost state-of-the-art semantic occupancy prediction models. On NYUv2 andSemanticKITTI datasets, OVO achieves competitive performance compared tosupervised semantic occupancy prediction approaches. Furthermore, we conductextensive analyses and ablation studies to offer insights into the design ofthe proposed framework. Our code is publicly available at", "output": "OVO: Open-Vocabulary Occupancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In search of robust and generalizable machine learning models, DomainGeneralization (DG) has gained significant traction during the past few years.The goal in DG is to produce models which continue to perform well whenpresented with data distributions different from the ones available duringtraining. While deep convolutional neural networks (CNN) have been able toachieve outstanding performance on downstream computer vision tasks, they stilloften fail to generalize on previously unseen data Domains. Therefore, in thiswork we focus on producing a model which is able to remain robust under datadistribution shift and propose an alternative regularization technique forconvolutional neural network architectures in the single-source DG imageclassification setting. To mitigate the problem caused by domain shift betweensource and target data, we propose augmenting intermediate feature maps ofCNNs. Specifically, we pass them through a novel Augmentation Layer} to preventmodels from overfitting on the training set and improve their cross-domaingeneralization. To the best of our knowledge, this is the first paper proposingsuch a setup for the DG image classification setting. Experiments on the DGbenchmark datasets of PACS, VLCS, Office-Home and TerraIncognita validate theeffectiveness of our method, in which our model surpasses state-of-the-artalgorithms in most cases.", "output": "CNN Feature Map Augmentation for Single-Source Domain Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to fuse frozen text-only large language models (LLMs)with pre-trained image encoder and decoder models, by mapping between theirembedding spaces. Our model demonstrates a wide suite of multimodalcapabilities: image retrieval, novel image generation, and multimodal dialogue.Ours is the first approach capable of conditioning on arbitrarily interleavedimage and text inputs to generate coherent image (and text) outputs. To achievestrong performance on image generation, we propose an efficient mapping networkto ground the LLM to an off-the-shelf text-to-image generation model. Thismapping network translates hidden representations of text into the embeddingspace of the visual models, enabling us to leverage the strong textrepresentations of the LLM for visual outputs. Our approach outperformsbaseline generation models on tasks with longer and more complex language. Inaddition to novel image generation, our model is also capable of imageretrieval from a prespecified dataset, and decides whether to retrieve orgenerate at inference time. This is done with a learnt decision module whichconditions on the hidden representations of the LLM. Our model exhibits a widerrange of capabilities compared to prior multimodal language models. It canprocess image-and-text inputs, and produce retrieved images, generated images,and generated text -- outperforming non-LLM based generation models acrossseveral text-to-image tasks that measure context dependence.", "output": "Generating Images with Multimodal Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compositional Zero-Shot Learning (CZSL) aims to train models to recognizenovel compositional concepts based on learned concepts such as attribute-objectcombinations. One of the challenges is to model attributes interacted withdifferent objects, e.g., the attribute ``wet\" in ``wet apple\" and ``wet cat\" isdifferent. As a solution, we provide analysis and argue that attributes areconditioned on the recognized object and input image and explore learningconditional attribute embeddings by a proposed attribute learning frameworkcontaining an attribute hyper learner and an attribute base learner. Byencoding conditional attributes, our model enables to generate flexibleattribute embeddings for generalization from seen to unseen compositions.Experiments on CZSL benchmarks, including the more challenging C-GQA dataset,demonstrate better performances compared with other state-of-the-art approachesand validate the importance of learning conditional attributes. Code isavailable at ", "output": "Learning Conditional Attributes for Compositional Zero-Shot Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One tough problem of image inpainting is to restore complex structures in thecorrupted regions. It motivates interactive image inpainting which leveragesadditional hints, e.g., sketches, to assist the inpainting process. Sketch issimple and intuitive to end users, but meanwhile has free forms with muchrandomness. Such randomness may confuse the inpainting models, and incur severeartifacts in completed images. To address this problem, we propose a two-stageimage inpainting method termed SketchRefiner. In the first stage, we proposeusing a cross-correlation loss function to robustly calibrate and refine theuser-provided sketches in a coarse-to-fine fashion. In the second stage, welearn to extract informative features from the abstracted sketches in thefeature space and modulate the inpainting process. We also propose an algorithmto simulate real sketches automatically and build a test protocol withdifferent applications. Experimental results on public datasets demonstratethat SketchRefiner effectively utilizes sketch information and eliminates theartifacts due to the free-form sketches. Our method consistently outperformsthe state-of-the-art ones both qualitatively and quantitatively, meanwhilerevealing great potential in real-world applications. Our code and dataset areavailable.", "output": "Towards Interactive Image Inpainting via Sketch Refinement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present DiffRoom, a novel framework for tackling the problem ofhigh-quality 3D indoor room reconstruction and generation, both of which arechallenging due to the complexity and diversity of the room geometry. Althoughdiffusion-based generative models have previously demonstrated impressiveperformance in image generation and object-level 3D generation, they have notyet been applied to room-level 3D generation due to their computationallyintensive costs. In DiffRoom, we propose a sparse 3D diffusion network that isefficient and possesses strong generative performance for Truncated SignedDistance Field (TSDF), based on a rough occupancy prior. Inspired byKinectFusion's incremental alignment and fusion of local SDFs, we propose adiffusion-based TSDF fusion approach that iteratively diffuses and fuses TSDFs,facilitating the reconstruction and generation of an entire room environment.Additionally, to ease training, we introduce a curriculum diffusion learningparadigm that speeds up the training convergence process and enableshigh-quality reconstruction. According to the user study, the mesh qualitygenerated by our DiffRoom can even outperform the ground truth mesh provided byScanNet. Please visit our project page for the latest progress anddemonstrations: ", "output": "DiffRoom: Diffusion-based High-Quality 3D Room Reconstruction and Generation with Occupancy Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We focus on the weakly-supervised audio-visual video parsing task (AVVP),which aims to identify and locate all the events in audio/visual modalities.Previous works only concentrate on video-level overall label denoising acrossmodalities, but overlook the segment-level label noise, where adjacent videosegments (i.e., 1-second video clips) may contain different events. However,recognizing events in the segment is challenging because its label could be anycombination of events that occur in the video. To address this issue, weconsider tackling AVVP from the language perspective, since language couldfreely describe how various events appear in each segment beyond fixed labels.Specifically, we design language prompts to describe all cases of eventappearance for each video. Then, the similarity between language prompts andsegments is calculated, where the event of the most similar prompt is regardedas the segment-level label. In addition, to deal with the mislabeled segments,we propose to perform dynamic re-weighting on the unreliable segments to adjusttheir labels. Experiments show that our simple yet effective approachoutperforms state-of-the-art methods by a large margin.", "output": "Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce MoviePuzzle, a novel challenge that targets visual narrativereasoning and holistic movie understanding. Despite the notable progress thathas been witnessed in the realm of video understanding, most prior works failto present tasks and models to address holistic video understanding and theinnate visual narrative structures existing in long-form videos. To tackle thisquandary, we put forth MoviePuzzle task that amplifies the temporal featurelearning and structure learning of video models by reshuffling the shot, frame,and clip layers of movie segments in the presence of video-dialogueinformation. We start by establishing a carefully refined dataset based onMovieNet by dissecting movies into hierarchical layers and randomly permutingthe orders. Besides benchmarking the MoviePuzzle with prior arts on movieunderstanding, we devise a Hierarchical Contrastive Movie Clustering (HCMC)model that considers the underlying structure and visual semantic orders formovie reordering. Specifically, through a pairwise and contrastive learningapproach, we train models to predict the correct order of each layer. Thisequips them with the knack for deciphering the visual narrative structure ofmovies and handling the disorder lurking in video data. Experiments show thatour approach outperforms existing state-of-the-art methods on the MoviePuzzlebenchmark, underscoring its efficacy.", "output": "MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthesizing novel view images from a few views is a challenging butpractical problem. Existing methods often struggle with producing high-qualityresults or necessitate per-object optimization in such few-view settings due tothe insufficient information provided. In this work, we explore leveraging thestrong 2D priors in pre-trained diffusion models for synthesizing novel viewimages. 2D diffusion models, nevertheless, lack 3D awareness, leading todistorted image synthesis and compromising the identity. To address theseproblems, we propose DreamSparse, a framework that enables the frozenpre-trained diffusion model to generate geometry and identity-consistent novelview image. Specifically, DreamSparse incorporates a geometry module designedto capture 3D features from sparse views as a 3D prior. Subsequently, a spatialguidance model is introduced to convert these 3D feature maps into spatialinformation for the generative process. This information is then used to guidethe pre-trained diffusion model, enabling it to generate geometricallyconsistent images without tuning it. Leveraging the strong image priors in thepre-trained diffusion models, DreamSparse is capable of synthesizinghigh-quality novel views for both object and scene-level images andgeneralising to open-set images. Experimental results demonstrate that ourframework can effectively synthesize novel view images from sparse views andoutperforms baselines in both trained and open-set category images. Moreresults can be found on our project page:", "output": "DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is an important need for methods to process myocardial perfusionimaging (MPI) SPECT images acquired at lower radiation dose and/or acquisitiontime such that the processed images improve observer performance on theclinical task of detecting perfusion defects. To address this need, we buildupon concepts from model-observer theory and our understanding of the humanvisual system to propose a Detection task-specific deep-learning-based approachfor denoising MPI SPECT images (DEMIST). The approach, while performingdenoising, is designed to preserve features that influence observer performanceon detection tasks. We objectively evaluated DEMIST on the task of detectingperfusion defects using a retrospective study with anonymized clinical data inpatients who underwent MPI studies across two scanners (N = 338). Theevaluation was performed at low-dose levels of 6.25%, 12.5% and 25% and usingan anthropomorphic channelized Hotelling observer. Performance was quantifiedusing area under the receiver operating characteristics curve (AUC). Imagesdenoised with DEMIST yielded significantly higher AUC compared to correspondinglow-dose images and images denoised with a commonly used task-agnostic DL-baseddenoising method. Similar results were observed with stratified analysis basedon patient sex and defect type. Additionally, DEMIST improved visual fidelityof the low-dose images as quantified using root mean squared error andstructural similarity index metric. A mathematical analysis revealed thatDEMIST preserved features that assist in detection tasks while improving thenoise properties, resulting in improved observer performance. The resultsprovide strong evidence for further clinical evaluation of DEMIST to denoiselow-count images in MPI SPECT.", "output": "DEMIST: A deep-learning-based task-specific denoising approach for myocardial perfusion SPECT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present a simple but performant semi-supervised semanticsegmentation approach, termed CorrMatch. Our goal is to mine more high-qualityregions from the unlabeled images to leverage the unlabeled data moreefficiently via consistency regularization. The key contributions of ourCorrMatch are two novel and complementary strategies. First, we introduce anadaptive threshold updating strategy with a relaxed initialization to expandthe high-quality regions. Furthermore, we propose to propagate high-confidencepredictions through measuring the pairwise similarities between pixels. Despiteits simplicity, we show that CorrMatch achieves great performance on popularsemi-supervised semantic segmentation benchmarks. Taking the DeepLabV3+framework with ResNet-101 backbone as our segmentation model, we receive a 76%+mIoU score on the Pascal VOC 2012 segmentation benchmark with only 92 annotatedimages provided. We also achieve a consistent improvement over previoussemi-supervised semantic segmentation models. Code will be made publiclyavailable.", "output": "CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a volume rendering-based neural surface reconstruction method thattakes as few as three disparate RGB images as input. Our key idea is toregularize the reconstruction, which is severely ill-posed and leavingsignificant gaps between the sparse views, by learning a set of neuraltemplates that act as surface priors. Our method coined DiViNet, operates intwo stages. The first stage learns the templates, in the form of 3D Gaussianfunctions, across different scenes, without 3D supervision. In thereconstruction stage, our predicted templates serve as anchors to help \"stitch\"the surfaces over sparse regions. We demonstrate that our approach is not onlyable to complete the surface geometry but also reconstructs surface details toa reasonable extent from few disparate input views. On the DTU and BlendedMVSdatasets, our approach achieves the best reconstruction quality among existingmethods in the presence of such sparse views, and performs on par, if notbetter, with competing methods when dense views are employed as inputs.", "output": "DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The corticospinal tract (CST) is a critically important white matter fibertract in the human brain that enables control of voluntary movements of thebody. Diffusion MRI tractography is the only method that enables the study ofthe anatomy and variability of the CST pathway in human health. In this work,we explored the performance of six widely used tractography methods forreconstructing the CST and its somatotopic organization. We perform experimentsusing diffusion MRI data from the Human Connectome Project. Four quantitativemeasurements including reconstruction rate, the WM-GM interface coverage,anatomical distribution of streamlines, and correlation with cortical volumesto assess the advantages and limitations of each method. Overall, we concludethat while current tractography methods have made progress toward thewell-known challenge of improving the reconstruction of the lateral projectionsof the CST, the overall problem of performing a comprehensive CSTreconstruction, including clinically important projections in the lateral (handand face area) and medial portions (leg area), remains an important challengefor diffusion MRI tractography.", "output": "Reconstructing the somatotopic organization of the corticospinal tract remains a challenge for modern tractography methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel adversarial defence mechanism for imageclassification -- CARSO -- inspired by cues from cognitive neuroscience. Themethod is synergistically complementary to adversarial training and relies onknowledge of the internal representation of the attacked classifier. Exploitinga generative model for adversarial purification, conditioned on suchrepresentation, it samples reconstructions of inputs to be finally classified.Experimental evaluation by a well-established benchmark of varied, strongadaptive attacks, across diverse image datasets and classifier architectures,shows that CARSO is able to defend the classifier significantly better thanstate-of-the-art adversarial training alone -- with a tolerable clean accuracytoll. Furthermore, the defensive architecture succeeds in effectively shieldingitself from unforeseen threats, and end-to-end attacks adapted to foolstochastic defences. Code and pre-trained models are available at .", "output": "CARSO: Counter-Adversarial Recall of Synthetic Observations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a new real and synthetic dataset called NeRFBKspecifically designed for testing and comparing NeRF-based 3D reconstructionalgorithms. High-quality 3D reconstruction has significant potential in variousfields, and advancements in image-based algorithms make it essential toevaluate new advanced techniques. However, gathering diverse data with preciseground truth is challenging and may not encompass all relevant applications.The NeRFBK dataset addresses this issue by providing multi-scale, indoor andoutdoor datasets with high-resolution images and videos and camera parametersfor testing and comparing NeRF-based algorithms. This paper presents the designand creation of the NeRFBK benchmark, various examples and applicationscenarios, and highlights its potential for advancing the field of 3Dreconstruction.", "output": "NERFBK: A High-Quality Benchmark for NERF-Based 3D Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-time 3D fluorescence microscopy is crucial for the spatiotemporalanalysis of live organisms, such as neural activity monitoring. The eXtendedfield-of-view light field microscope (XLFM), also known as Fourier light fieldmicroscope, is a straightforward, single snapshot solution to achieve this. TheXLFM acquires spatial-angular information in a single camera exposure. In asubsequent step, a 3D volume can be algorithmically reconstructed, making itexceptionally well-suited for real-time 3D acquisition and potential analysis.Unfortunately, traditional reconstruction methods (like deconvolution) requirelengthy processing times (0.0220 Hz), hampering the speed advantages of theXLFM. Neural network architectures can overcome the speed constraints at theexpense of lacking certainty metrics, which renders them untrustworthy for thebiomedical realm. This work proposes a novel architecture to perform fast 3Dreconstructions of live immobilized zebrafish neural activity based on aconditional normalizing flow. It reconstructs volumes at 8 Hz spanning512x512x96 voxels, and it can be trained in under two hours due to the smalldataset requirements (10 image-volume pairs). Furthermore, normalizing flowsallow for exact Likelihood computation, enabling distribution monitoring,followed by out-of-distribution detection and retraining of the system when anovel sample is detected. We evaluate the proposed method on a cross-validationapproach involving multiple in-distribution samples (genetically identicalzebrafish) and various out-of-distribution ones.", "output": "Fast light-field 3D microscopy with out-of-distribution detection and adaptation through Conditional Normalizing Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advent of 2019 Coronavirus (COVID-19) has engendered a momentous globalhealth crisis, necessitating the identification of the ailment in individualsthrough diverse diagnostic modalities. Radiological imaging, particularly thedeployment of X-ray imaging, has been recognized as a pivotal instrument in thedetection and characterization of COVID-19. Recent investigations have unveiledinvaluable insights pertaining to the virus within X-ray images, instigatingthe exploration of methodologies aimed at augmenting diagnostic accuracythrough the utilization of artificial intelligence (AI) techniques. The currentresearch endeavor posits an innovative framework for the automated diagnosis ofCOVID-19, harnessing raw chest X-ray images, specifically by means offine-tuning pre-trained Vision Transformer (ViT) models. The developed modelswere appraised in terms of their binary classification performance, discerningCOVID-19 from Normal cases, as well as their ternary classificationperformance, discriminating COVID-19 from Pneumonia and Normal instances, andlastly, their quaternary classification performance, discriminating COVID-19from Bacterial Pneumonia, Viral Pneumonia, and Normal conditions, employingdistinct datasets. The proposed model evinced extraordinary precision,registering results of 99.92% and 99.84% for binary classification, 97.95% and86.48% for ternary classification, and 86.81% for quaternary classification,respectively, on the respective datasets.", "output": "Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperspectral Image (HSI)s cover hundreds or thousands of narrow spectralbands, conveying a wealth of spatial and spectral information. However, due tothe instrumental errors and the atmospheric changes, the HSI obtained inpractice are often contaminated by noise and dead pixels(lines), resulting inmissing information that may severely compromise the subsequent applications.We introduce here a novel HSI missing pixel prediction algorithm, called LowRank and Sparsity Constraint Plug-and-Play (LRS-PnP). It is shown that LRS-PnPis able to predict missing pixels and bands even when all spectral bands of theimage are missing. The proposed LRS-PnP algorithm is further extended to aself-supervised model by combining the LRS-PnP with the Deep Image Prior (DIP),called LRS-PnP-DIP. In a series of experiments with real data, It is shown thatthe LRS-PnP-DIP either achieves state-of-the-art inpainting performancecompared to other learning-based methods, or outperforms them.", "output": "Self-Supervised Hyperspectral Inpainting with the Optimisation inspired Deep Neural Network Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we present a robust approach for joint part and objectsegmentation. Specifically, we reformulate object and part segmentation as anoptimization problem and build a hierarchical feature representation includingpixel, part, and object-level embeddings to solve it in a bottom-up clusteringmanner. Pixels are grouped into several clusters where the part-levelembeddings serve as cluster centers. Afterwards, object masks are obtained bycompositing the part proposals. This bottom-up interaction is shown to beeffective in integrating information from lower semantic levels to highersemantic levels. Based on that, our novel approach Compositor produces part andobject segmentation masks simultaneously while improving the mask quality.Compositor achieves state-of-the-art performance on PartImageNet andPascal-Part by outperforming previous methods by around 0.9% and 1.3% onPartImageNet, 0.4% and 1.7% on Pascal-Part in terms of part and object mIoU anddemonstrates better robustness against occlusion by around 4.4% and 7.1% onpart and object respectively. Code will be available at", "output": "Compositor: Bottom-up Clustering and Compositing for Robust Part and Object Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly supervised grounded image captioning (WSGIC) aims to generate thecaption and ground (localize) predicted object words in the input image withoutusing bounding box supervision. Recent two-stage solutions mostly apply abottom-up pipeline: (1) first apply an off-the-shelf object detector to encodethe input image into multiple region features; (2) and then leverage asoft-attention mechanism for captioning and grounding. However, objectdetectors are mainly designed to extract object semantics (i.e., the objectcategory). Besides, they break down the structural images into pieces ofindividual proposals. As a result, the subsequent grounded captioner is oftenoverfitted to find the correct object words, while overlooking the relationbetween objects (e.g., what is the person doing?), and selecting incompatibleproposal regions for grounding. To address these difficulties, we propose aone-stage weakly supervised grounded captioner that directly takes the RGBimage as input to perform captioning and grounding at the top-down image level.In addition, we explicitly inject a relation module into our one-stageframework to encourage the relation understanding through multi-labelclassification. The relation semantics aid the prediction of relation words inthe caption. We observe that the relation words not only assist the groundedcaptioner in generating a more accurate caption but also improve the groundingperformance. We validate the effectiveness of our proposed method on twochallenging datasets (Flick30k Entities captioning and MSCOCO captioning). Theexperimental results demonstrate that our method achieves state-of-the-artgrounding performance.", "output": "Top-Down Viewing for Weakly Supervised Grounded Image Captioning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One-shot medical landmark detection gains much attention and achieves greatsuccess for its label-efficient training process. However, existing one-shotlearning methods are highly specialized in a single domain and suffer domainpreference heavily in the situation of multi-domain unlabeled data. Moreover,one-shot learning is not robust that it faces performance drop when annotatinga sub-optimal image. To tackle these issues, we resort to developing adomain-adaptive one-shot landmark detection framework for handling multi-domainmedical images, named Universal One-shot Detection (UOD). UOD consists of twostages and two corresponding universal models which are designed ascombinations of domain-specific modules and domain-shared modules. In the firststage, a domain-adaptive convolution model is self-supervised learned togenerate pseudo landmark labels. In the second stage, we design adomain-adaptive transformer to eliminate domain preference and build the globalcontext for multi-domain data. Even though only one annotated sample from eachdomain is available for training, the domain-shared modules help UOD aggregateall one-shot samples to detect more robust and accurate landmarks. Weinvestigated both qualitatively and quantitatively the proposed UOD on threewidely-used public X-ray datasets in different anatomical domains (i.e., head,hand, chest) and obtained state-of-the-art performances in each domain.", "output": "UOD: Universal One-shot Detection of Anatomical Landmarks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel learning-based method that achieves state-of-the-artperformance on several heart rate estimation benchmarks extracted fromphotoplethysmography signals (PPG). We consider the evolution of the heart ratein the context of a discrete-time stochastic process that we represent as ahidden Markov model. We derive a distribution over possible heart rate valuesfor a given PPG signal window through a trained neural network. Using beliefpropagation, we incorporate the statistical distribution of heart rate changesto refine these estimates in a temporal context. From this, we obtain aquantized probability distribution over the range of possible heart rate valuesthat captures a meaningful and well-calibrated estimate of the inherentpredictive uncertainty. We show the robustness of our method on eight publicdatasets with three different cross-validation experiments.", "output": "BeliefPPG: Uncertainty-aware Heart Rate Estimation from PPG signals via Belief Propagation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Object detection on Lidar point cloud data is a promising technology forautonomous driving and robotics which has seen a significant rise inperformance and accuracy during recent years. Particularly uncertaintyestimation is a crucial component for down-stream tasks and deep neuralnetworks remain error-prone even for predictions with high confidence.Previously proposed methods for quantifying prediction uncertainty tend toalter the training scheme of the detector or rely on prediction sampling whichresults in vastly increased inference time. In order to address these twoissues, we propose LidarMetaDetect (LMD), a light-weight post-processing schemefor prediction quality estimation. Our method can easily be added to anypre-trained Lidar object detector without altering anything about the basemodel and is purely based on post-processing, therefore, only leading to anegligible computational overhead. Our experiments show a significant increaseof statistical reliability in separating true from false predictions. Wepropose and evaluate an additional application of our method leading to thedetection of annotation errors. Explicit samples and a conservative count ofannotation error proposals indicates the viability of our method forlarge-scale datasets like KITTI and nuScenes. On the widely-used nuScenes testdataset, 43 out of the top 100 proposals of our method indicate, in fact,erroneous annotations.", "output": "LMD: Light-weight Prediction Quality Estimation for Object Detection in Lidar Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Simultaneous localization and mapping (SLAM) stands as one of the criticalchallenges in robot navigation. Recent advancements suggest that methods basedon supervised learning deliver impressive performance in front-end odometry,while traditional optimization-based methods still play a vital role in theback-end for minimizing estimation drift. In this paper, we found that suchdecoupled paradigm can lead to only sub-optimal performance, consequentlycurtailing system capabilities and generalization potential. To solve thisproblem, we proposed a novel self-supervised learning framework, imperativeSLAM (iSLAM), which fosters reciprocal correction between the front-end andback-end, thus enhancing performance without necessitating any externalsupervision. Specifically, we formulate a SLAM system as a bi-leveloptimization problem so that the two components are bidirectionally connected.As a result, the front-end model is able to learn global geometric knowledgeobtained through pose graph optimization by back-propagating the residuals fromthe back-end. This significantly improves the generalization ability of theentire system and thus achieves the accuracy improvement up to 45%. To the bestof our knowledge, iSLAM is the first SLAM system showing that the front-end andback-end can learn jointly and mutually contribute to each other in aself-supervised manner.", "output": "iSLAM: Imperative SLAM."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Since the inception of permissionless blockchains with Bitcoin in 2008, itbecame apparent that their most well-suited use case is related to making thefinancial system and its advantages available to everyone seamlessly withoutdepending on any trusted intermediaries. Smart contracts across chains providean ecosystem of decentralized finance (DeFi), where users can interact withlending pools, Automated Market Maker (AMM) exchanges, stablecoins,derivatives, etc. with a cumulative locked value which had exceeded 160B USD.While DeFi comes with high rewards, it also carries plenty of risks. Manyfinancial crimes have occurred over the years making the early detection ofmalicious activity an issue of high priority. The proposed framework introducesan effective method for extracting a set of features from different chains,including the largest one, Ethereum and it is evaluated over an extensivedataset we gathered with the transactions of the most widely used DeFiprotocols (23 in total, including Aave, Compound, Curve, Lido, and Yearn) basedon a novel dataset in collaboration with Covalent. Different Machine Learningmethods were employed, such as XGBoost and a Neural Network for identifyingfraud accounts detection interacting with DeFi and we demonstrate that theintroduction of novel DeFi-related features, significantly improves theevaluation results, where Accuracy, Precision, Recall, F1-score and F2-scorewhere utilized.", "output": "Leveraging Machine Learning for Multichain DeFi Fraud Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Edge-cloud collaborative inference empowers resource-limited IoT devices tosupport deep learning applications without disclosing their raw data to thecloud server, thus preserving privacy. Nevertheless, prior research has shownthat collaborative inference still results in the exposure of data andpredictions from edge devices. To enhance the privacy of collaborativeinference, we introduce a defense strategy called PrivaScissors, which isdesigned to reduce the mutual information between a model's intermediateoutcomes and the device's data and predictions. We evaluate PrivaScissors'sperformance on several datasets in the context of diverse attacks and offer atheoretical robustness guarantee.", "output": "PrivaScissors: Enhance the Privacy of Collaborative Inference through the Lens of Mutual Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rise of cryptocurrencies like Bitcoin, which enable transactions with adegree of pseudonymity, has led to a surge in various illicit activities,including ransomware payments and transactions on darknet markets. Theseillegal activities often utilize Bitcoin as the preferred payment method.However, current tools for detecting illicit behavior either rely on a fewheuristics and laborious data collection processes or employ computationallyinefficient graph neural network (GNN) models that are challenging tointerpret.To overcome the computational and interpretability limitations of existingtechniques, we introduce an effective solution called Chainlet Orbits. Thisapproach embeds Bitcoin addresses by leveraging their topologicalcharacteristics in transactions. By employing our innovative address embedding,we investigate e-crime in Bitcoin networks by focusing on distinctivesubstructures that arise from illicit behavior.The results of our node classification experiments demonstrate superiorperformance compared to state-of-the-art methods, including both topologicaland GNN-based approaches. Moreover, our approach enables the use ofinterpretable and explainable machine learning models in as little as 15minutes for most days on the Bitcoin transaction network.", "output": "Chainlet Orbits: Topological Address Embedding for the Bitcoin Blockchain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the most significant challenges in the field of software code auditingis the presence of vulnerabilities in software source code. Every year, moreand more software flaws are discovered, either internally in proprietary codeor publicly disclosed. These flaws are highly likely to be exploited and canlead to system compromise, data leakage, or denial of service. To create alarge-scale machine learning system for function level vulnerabilityidentification, we utilized a sizable dataset of C and C++ open-source codecontaining millions of functions with potential buffer overflow exploits. Wehave developed an efficient and scalable vulnerability detection method basedon neural network models that learn features extracted from the source codes.The source code is first converted into an intermediate representation toremove unnecessary components and shorten dependencies. We maintain thesemantic and syntactic information using state of the art word embeddingalgorithms such as GloVe and fastText. The embedded vectors are subsequentlyfed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec,BERT, and GPT2 to classify the possible vulnerabilities. We maintain thesemantic and syntactic information using state of the art word embeddingalgorithms such as GloVe and fastText. The embedded vectors are subsequentlyfed into neural networks such as LSTM, BiLSTM, LSTM Autoencoder, word2vec,BERT, and GPT2 to classify the possible vulnerabilities. Furthermore, we haveproposed a neural network model that can overcome issues associated withtraditional neural networks. We have used evaluation metrics such as F1 score,precision, recall, accuracy, and total execution time to measure theperformance. We have conducted a comparative analysis between results derivedfrom features containing a minimal text representation and semantic andsyntactic information.", "output": "Feature Engineering-Based Detection of Buffer Overflow Vulnerability in Source Code Using Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the increase in non-Windows malware threats had turned thefocus of the cybersecurity community. Research works on hunting WindowsPE-based malwares are maturing, whereas the developments on Linux malwarethreat hunting are relatively scarce. With the advent of the Internet of Things(IoT) era, smart devices that are getting integrated into human life havebecome a hackers highway for their malicious activities. The IoT devices employvarious Unix-based architectures that follow ELF (Executable and LinkableFormat) as their standard binary file specification. This study aims atproviding a comprehensive survey on the latest developments incross-architectural IoT malware detection and classification approaches. Aidedby a modern taxonomy, we discuss the feature representations, featureextraction techniques, and machine learning models employed in the surveyedworks. We further provide more insights on the practical challenges involved incross-architectural IoT malware threat hunting and discuss various avenues toinstill potential future research.", "output": "A Survey on Cross-Architectural IoT Malware Threat Hunting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With rich visual data, such as images, becoming readily associated withitems, visually-aware recommendation systems (VARS) have been widely used indifferent applications. Recent studies have shown that VARS are vulnerable toitem-image adversarial attacks, which add human-imperceptible perturbations tothe clean images associated with those items. Attacks on VARS pose new securitychallenges to a wide range of applications such as e-Commerce and socialnetworks where VARS are widely used. How to secure VARS from such adversarialattacks becomes a critical problem. Currently, there is still a lack ofsystematic study on how to design secure defense strategies against visualattacks on VARS. In this paper, we attempt to fill this gap by proposing anadversarial image reconstruction and detection framework to secure VARS. Ourproposed method can simultaneously (1) secure VARS from adversarial attackscharacterized by local perturbations by image reconstruction based on globalvision transformers; and (2) accurately detect adversarial examples using anovel contrastive learning approach. Meanwhile, our framework is designed to beused as both a filter and a detector so that they can be jointly trained toimprove the flexibility of our defense strategy to a variety of attacks andVARS models. We have conducted extensive experimental studies with two popularattack methods (FGSM and PGD). Our experimental results on two real-worlddatasets show that our defense strategy against visual attacks is effective andoutperforms existing methods on different attacks. Moreover, our method candetect adversarial examples with high accuracy.", "output": "Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid growth of distributed energy resources (DERs), such as renewableenergy sources, generators, consumers, and prosumers in the smart gridinfrastructure, poses significant cybersecurity and trust challenges to thegrid controller. Consequently, it is crucial to identify adversarial tacticsand measure the strength of the attacker's DER. To enable a trustworthy smartgrid controller, this work investigates a trustworthy artificial intelligence(AI) mechanism for proactive identification and explanation of the cyber riskcaused by the control/status message of DERs. Thus, proposing and developing atrustworthy AI framework to facilitate the deployment of any AI algorithms fordetecting potential cyber threats and analyzing root causes based on Shapleyvalue interpretation while dynamically quantifying the risk of an attack basedon Ward's minimum variance formula. The experiment with a state-of-the-artdataset establishes the proposed framework as a trustworthy AI by fulfillingthe capabilities of reliability, fairness, explainability, transparency,reproducibility, and accountability.", "output": "Trustworthy Artificial Intelligence Framework for Proactive Detection and Risk Explanation of Cyber Attacks in Smart Grid."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Unsupervised text style transfer task aims to rewrite a text into targetstyle while preserving its main content. Traditional methods rely on the use ofa fixed-sized vector to regulate text style, which is difficult to accuratelyconvey the style strength for each individual token. In fact, each token of atext contains different style intensity and makes different contribution to theoverall style. Our proposed method addresses this issue by assigning individualstyle vector to each token in a text, allowing for fine-grained control andmanipulation of the style strength. Additionally, an adversarial trainingframework integrated with teacher-student learning is introduced to enhancetraining stability and reduce the complexity of high-dimensional optimization.The results of our experiments demonstrate the efficacy of our method in termsof clearly improved style transfer accuracy and content preservation in bothtwo-style transfer and multi-style transfer settings.", "output": "MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, neural networks have spread into numerous fields including manysafety-critical systems. Neural networks are built (and trained) by programmingin frameworks such as TensorFlow and PyTorch. Developers apply a rich set ofpre-defined layers to manually program neural networks or to automaticallygenerate them (e.g., through AutoML). Composing neural networks with differentlayers is error-prone due to the non-trivial constraints that must be satisfiedin order to use those layers. In this work, we propose an approach toautomatically repair erroneous neural networks. The challenge is in identifyinga minimal modification to the network so that it becomes valid. Modifying alayer might have cascading effects on subsequent layers and thus our approachmust search recursively to identify a \"globally\" minimal modification. Ourapproach is based on an executable semantics of deep learning layers andfocuses on four kinds of errors which are common in practice. We evaluate ourapproach for two usage scenarios, i.e., repairing automatically generatedneural networks and manually written ones suffering from common model bugs. Theresults show that we are able to repair 100% of a set of randomly generatedneural networks (which are produced with an existing AI framework testingapproach) effectively and efficiently (with an average repair time of 21.08s)and 93.75% of a collection of real neural network bugs (with an average time of3min 40s).", "output": "Semantic-Based Neural Network Repair."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Firewalls are critical components in securing communication networks byscreening all incoming (and occasionally exiting) data packets. Filtering iscarried out by comparing incoming data packets to a set of rules designed toprevent malicious code from entering the network. To regulate the flow of datapackets entering and leaving a network, an Internet firewall keeps a track ofall activity. While the primary function of log files is to aid introubleshooting and diagnostics, the information they contain is also veryrelevant to system audits and forensics. Firewalls primary function is toprevent malicious data packets from being sent. In order to better defendagainst cyberattacks and understand when and how malicious actions areinfluencing the internet, it is necessary to examine log files. As a result,the firewall decides whether to 'allow,' 'deny,' 'drop,' or 'reset-both' theincoming and outgoing packets. In this research, we apply variouscategorization algorithms to make sense of data logged by a firewall device.Harmonic mean F1 score, recall, and sensitivity measurement data with a 99%accuracy score in the random forest technique are used to compare theclassifier's performance. To be sure, the proposed characteristics didsignificantly contribute to enhancing the firewall classification rate, as seenby the high accuracy rates generated by the other methods.", "output": "Machine Learning Approach on Multiclass Classification of Internet Firewall Log Files."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in zero-shot learning have enabled the use of pairedimage-text data to replace structured labels, replacing the need for expertannotated datasets. Models such as CLIP-based CheXzero utilize theseadvancements in the domain of chest X-ray interpretation. We hypothesize thatdomain pre-trained models such as CXR-BERT, BlueBERT, and ClinicalBERT offerthe potential to improve the performance of CLIP-like models with specificdomain knowledge by replacing BERT weights at the cost of breaking the originalmodel's alignment. We evaluate the performance of zero-shot classificationmodels with domain-specific pre-training for detecting low-prevalencepathologies. Even though replacing the weights of the original CLIP-BERTdegrades model performance on commonly found pathologies, we show thatpre-trained text towers perform exceptionally better on low-prevalencediseases. This motivates future ensemble models with a combination ofdifferently trained language models for maximal performance.", "output": "Improving Zero-Shot Detection of Low Prevalence Chest Pathologies using Domain Pre-trained Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Active learning algorithms have been an integral part of recent advances inartificial intelligence. However, the research in the field is widely varyingand lacks an overall organizing leans. We outline a Markovian formalism for thefield of active learning and survey the literature to demonstrate theorganizing capability of our proposed formalism. Our formalism takes apartially observable Markovian system approach to the active learning processas a whole. We specifically outline how querying, dataset augmentation, rewardupdates, and other aspects of active learning can be viewed as a transitionbetween meta-states in a Markovian system, and give direction into how otheraspects of active learning can fit into our formalism.", "output": "A Markovian Formalism for Active Querying."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increase in the use of photovoltaic (PV) energy in the world has shownthat the useful life and maintenance of a PV plant directly depend ontheability to quickly detect severe faults on a PV plant. To solve this problemof detection, data based approaches have been proposed in theliterature.However, these previous solutions consider only specific behavior ofone or few faults. Most of these approaches can be qualified as supervised,requiring an enormous labelling effort (fault types clearly identified in eachtechnology). In addition, most of them are validated in PV cells or one PVmodule. That is hardly applicable in large-scale PV plants considering theircomplexity. Alternatively, some unsupervised well-known approaches based ondata try to detect anomalies but are not able to identify precisely the type offault. The most performant of these methods do manage to efficiently grouphealthy panels and separate them from faulty panels. In that way, this articlepresents an unsupervised approach called DTW K-means. This approach takesadvantages of both the dynamic time warping (DWT) metric and the Kmeansclustering algorithm as a data-driven approach. The results of this mixedmethod in a PV string are compared to diagnostic labels established by visualinspection of the panels.", "output": "DTW k-means clustering for fault detection in photovoltaic modules."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diagnosis in PV systems aims to detect, locate and identify faults.Diagnosing these faults is vital to guarantee energy production and extend theuseful life of PV power plants. In the literature, multiple machine learningapproaches have been proposed for this purpose. However, few of these workshave paid special attention to the detection of fine faults and the specializedprocess of extraction and selection of features for their classification. Afine fault is one whose characteristic signature is difficult to distinguish tothat of a healthy panel. As a contribution to the detection of fine faults(especially of the snail trail type), this article proposes an innovativeapproach based on the Random Forest (RF) algorithm. This approach uses acomplex feature extraction and selection method that improves the computationaltime of fault classification while maintaining high accuracy.", "output": "Detection and classification of faults aimed at preventive maintenance of PV systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The brain is a remarkably capable and efficient system. It can process andstore huge amounts of noisy and unstructured information using minimal energy.In contrast, current artificial intelligence (AI) systems require vastresources for training while still struggling to compete in tasks that aretrivial for biological agents. Thus, brain-inspired engineering has emerged asa promising new avenue for designing sustainable, next-generation AI systems.Here, we describe how dendritic mechanisms of biological neurons have inspiredinnovative solutions for significant AI problems, including credit assignmentin multilayer networks, catastrophic forgetting, and high energy consumption.These findings provide exciting alternatives to existing architectures, showinghow dendritic research can pave the way for building more powerful andenergy-efficient artificial learning systems.", "output": "Leveraging dendritic properties to advance machine learning and neuro-inspired computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep reinforcement learning algorithms typically act on the same set ofactions. However, this is not sufficient for a wide range of real-worldapplications where different subsets are available at each step. In thisthesis, we consider the problem of interval restrictions as they occur inpathfinding with dynamic obstacles. When actions that lead to collisions areavoided, the continuous action space is split into variable parts. Recentresearch learns with strong assumptions on the number of intervals, is limitedto convex subsets, and the available actions are learned from the observations.Therefore, we propose two approaches that are independent of the state of theenvironment by extending parameterized reinforcement learning and ConstraintNetto handle an arbitrary number of intervals. We demonstrate their performance inan obstacle avoidance task and compare the methods to penalties, projection,replacement, as well as discrete and continuous masking from the literature.The results suggest that discrete masking of action-values is the onlyeffective method when constraints did not emerge during training. Whenrestrictions are learned, the decision between projection, masking, and ourConstraintNet modification seems to depend on the task at hand. We compare theresults with varying complexity and give directions for future work.", "output": "Dynamic Interval Restrictions on Action Spaces in Deep Reinforcement Learning for Obstacle Avoidance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Backdoor attacks have emerged as an urgent threat to Deep Neural Networks(DNNs), where victim DNNs are furtively implanted with malicious neurons thatcould be triggered by the adversary. To defend against backdoor attacks, manyworks establish a staged pipeline to remove backdoors from victim DNNs:inspecting, locating, and erasing. However, in a scenario where a few cleandata can be accessible, such pipeline is fragile and cannot erase backdoorscompletely without sacrificing model accuracy. To address this issue, in thispaper, we propose a novel data-free holistic backdoor erasing (DHBE) framework.Instead of the staged pipeline, the DHBE treats the backdoor erasing task as aunified adversarial procedure, which seeks equilibrium between two differentcompeting processes: distillation and backdoor regularization. In distillation,the backdoored DNN is distilled into a proxy model, transferring its knowledgeabout clean data, yet backdoors are simultaneously transferred. In backdoorregularization, the proxy model is holistically regularized to prevent frominfecting any possible backdoor transferred from distillation. These twoprocesses jointly proceed with data-free adversarial optimization until aclean, high-accuracy proxy model is obtained. With the novel adversarialdesign, our framework demonstrates its superiority in three aspects: 1) minimaldetriment to model accuracy, 2) high tolerance for hyperparameters, and 3) nodemand for clean data. Extensive experiments on various backdoor attacks anddatasets are performed to verify the effectiveness of the proposed framework.Code is available at url{", "output": "DHBE: Data-free Holistic Backdoor Erasing in Deep Neural Networks via Restricted Adversarial Distillation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) naturally faces the problem of data heterogeneity inreal-world scenarios, but this is often overlooked by studies on FL securityand privacy. On the one hand, the effectiveness of backdoor attacks on FL maydrop significantly under non-IID scenarios. On the other hand, maliciousclients may steal private data through privacy inference attacks. Therefore, itis necessary to have a comprehensive perspective of data heterogeneity,backdoor, and privacy inference. In this paper, we propose a novel privacyinference-empowered stealthy backdoor attack (PI-SBA) scheme for FL undernon-IID scenarios. Firstly, a diverse data reconstruction mechanism based ongenerative adversarial networks (GANs) is proposed to produce a supplementarydataset, which can improve the attacker's local data distribution and supportmore sophisticated strategies for backdoor attacks. Based on this, we design asource-specified backdoor learning (SSBL) strategy as a demonstration, allowingthe adversary to arbitrarily specify which classes are susceptible to thebackdoor trigger. Since the PI-SBA has an independent poisoned data synthesisprocess, it can be integrated into existing backdoor attacks to improve theireffectiveness and stealthiness in non-IID scenarios. Extensive experimentsbased on MNIST, CIFAR10 and Youtube Aligned Face datasets demonstrate that theproposed PI-SBA scheme is effective in non-IID FL and stealthy againststate-of-the-art defense methods.", "output": "Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a robust and reliable evaluation metric for generative models byintroducing topological and statistical treatments for rigorous supportestimation. Existing metrics, such as Inception Score (IS), Fr'echet InceptionDistance (FID), and the variants of Precision and Recall (P&amp;R), heavily relyon supports that are estimated from sample features. However, the reliabilityof their estimation has not been seriously discussed (and overlooked) eventhough the quality of the evaluation entirely depends on it. In this paper, wepropose Topological Precision and Recall (TopP&amp;R, pronounced 'topper'), whichprovides a systematic approach to estimating supports, retaining onlytopologically and statistically important features with a certain level ofconfidence. This not only makes TopP&amp;R strong for noisy features, but alsoprovides statistical consistency. Our theoretical and experimental results showthat TopP&amp;R is robust to outliers and non-independent and identicallydistributed (Non-IID) perturbations, while accurately capturing the true trendof change in samples. To the best of our knowledge, this is the firstevaluation metric focused on the robust estimation of the support and providesits statistical consistency under noise.", "output": "TopP\\&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Free Energy Principle (FEP) is a theoretical framework for describing how(intelligent) systems self-organise into coherent, stable structures byminimising a free energy functional. Active Inference (AIF) is a corollary ofthe FEP that specifically details how systems that are able to plan for thefuture (agents) function by minimising particular free energy functionals thatincorporate information seeking components. This paper is the first in a seriesof two where we derive a synthetic version of AIF on free form factor graphs.The present paper focuses on deriving a local version of the free energyfunctionals used for AIF. This enables us to construct a version of AIF whichapplies to arbitrary graphical models and interfaces with prior work on messagepassing algorithms. The resulting messages are derived in our companion paper.We also identify a gap in the graphical notation used for factor graphs. Whilefactor graphs are great at expressing a generative model, they have so far beenunable to specify the full optimisation problem including constraints. To solvethis problem we develop Constrained Forney-style Factor Graph (CFFG) notationwhich permits a fully graphical description of variational inferenceobjectives. We then proceed to show how CFFG's can be used to reconstruct prioralgorithms for AIF as well as derive new ones. The latter is demonstrated byderiving an algorithm that permits direct policy inference for AIF agents,circumventing a long standing scaling issue that has so far hindered theapplication of AIF in industrial settings. We demonstrate our algorithm on theclassic T-maze task and show that it reproduces the information seekingbehaviour that is a hallmark feature of AIF.", "output": "Realising Synthetic Active Inference Agents, Part I: Epistemic Objectives and Graphical Specification Language."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs), with their remarkable task-handlingcapabilities and innovative outputs, have catalyzed significant advancementsacross a spectrum of fields. However, their proficiency within specializeddomains such as biomolecular studies remains limited. To address thischallenge, we introduce Mol-Instructions, a meticulously curated, comprehensiveinstruction dataset expressly designed for the biomolecular realm.Mol-Instructions is composed of three pivotal components: molecule-orientedinstructions, protein-oriented instructions, and biomolecular textinstructions, each curated to enhance the understanding and predictioncapabilities of LLMs concerning biomolecular features and behaviors. Throughextensive instruction tuning experiments on the representative LLM, weunderscore the potency of Mol-Instructions to enhance the adaptability andcognitive acuity of large models within the complex sphere of biomolecularstudies, thereby promoting advancements in the biomolecular research community.Mol-Instructions is made publicly accessible for future research endeavors andwill be subjected to continual updates for enhanced applicability.", "output": "Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Finding optimal channel dimensions (i.e., the number of filters in DNNlayers) is essential to design DNNs that perform well under computationalresource constraints. Recent work in neural architecture search aims atautomating the optimization of the DNN model implementation. However, existingneural architecture search methods for channel dimensions rely on fixed searchspaces, which prevents achieving an efficient and fully automated solution. Inthis work, we propose a novel differentiable neural architecture search methodwith an efficient dynamic channel allocation algorithm to enable a flexiblesearch space for channel dimensions. We show that the proposed framework isable to find DNN architectures that are equivalent to previous methods in taskaccuracy and inference latency for the CIFAR-10 dataset with an improvement of$1.3-1.7times$ in GPU-hours and $1.5-1.7times$ in the memory requirementsduring the architecture search stage. Moreover, the proposed frameworks do notrequire a well-engineered search space a priori, which is an important steptowards fully automated design of DNN architectures.", "output": "Flexible Channel Dimensions for Differentiable Architecture Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We characterize offline data poisoning attacks on Multi-Agent ReinforcementLearning (MARL), where an attacker may change a data set in an attempt toinstall a (potentially fictitious) unique Markov-perfect Nash equilibrium. Wepropose the unique Nash set, namely the set of games, specified by their Qfunctions, with a specific joint policy being the unique Nash equilibrium. Theunique Nash set is central to poisoning attacks because the attack issuccessful if and only if data poisoning pushes all plausible games inside it.The unique Nash set generalizes the reward polytope commonly used in inversereinforcement learning to MARL. For zero-sum Markov games, both the inverseNash set and the set of plausible games induced by data are polytopes in the Qfunction space. We exhibit a linear program to efficiently compute the optimalpoisoning attack. Our work sheds light on the structure of data poisoningattacks on offline MARL, a necessary step before one can design more robustMARL algorithms.", "output": "On Faking a Nash Equilibrium."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most medical treatment decisions are sequential in nature. Hence, there issubstantial hope that reinforcement learning may make it possible to formulateprecise data-driven treatment plans. However, a key challenge for mostapplications in this field is the sparse nature of primarily mortality-basedreward functions, leading to decreased stability of offline estimates. In thiswork, we introduce a deep Q-learning approach able to obtain more reliablecritical care policies. This method integrates relevant but noisy intermediatebiomarker signals into the reward specification, without compromising theoptimization of the main outcome of interest (e.g. patient survival). Weachieve this by first pruning the action set based on all available rewards,and second training a final model based on the sparse main reward but with arestricted action set. By disentangling accurate and approximated rewardsthrough action pruning, potential distortions of the main objective areminimized, all while enabling the extraction of valuable information fromintermediate signals that can guide the learning process. We evaluate ourmethod in both off-policy and offline settings using simulated environments andreal health records of patients in intensive care units. Our empirical resultsindicate that pruning significantly reduces the size of the action space whilestaying mostly consistent with the actions taken by physicians, outperformingthe current state-of-the-art offline reinforcement learning method conservativeQ-learning. Our work is a step towards developing reliable policies byeffectively harnessing the wealth of available information in data-intensivecritical care environments.", "output": "Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperparameter tuning of deep learning models can lead to order-of-magnitudeperformance gains for the same amount of compute. Despite this, systematictuning is uncommon, particularly for large models, which are expensive toevaluate and tend to have many hyperparameters, necessitating difficultjudgment calls about tradeoffs, budgets, and search bounds. To address theseissues and propose a practical method for robustly tuning large models, wepresent Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesianoptimization algorithm that performs local search around the performance-costPareto frontier. CARBS does well even in unbounded search spaces with manyhyperparameters, learns scaling relationships so that it can tune models evenas they are scaled up, and automates much of the \"black magic\" of tuning. Amongour results, we effectively solve the entire ProcGen benchmark just by tuning asimple baseline (PPO, as provided in the original ProcGen paper). We alsoreproduce the model size vs. training tokens scaling result from the Chinchillaproject (Hoffmann et al. 2022), while simultaneously discovering scaling lawsfor every other hyperparameter, via an easy automated process that usessignificantly less compute and is applicable to any deep learning problem (notjust language models).", "output": "Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning symbolic expressions directly from experiment data is a vital stepin AI-driven scientific discovery. Nevertheless, state-of-the-art approachesare limited to learning simple expressions. Regressing expressions involvingmany independent variables still remain out of reach. Motivated by the controlvariable experiments widely utilized in science, we propose Control VariableGenetic Programming (CVGP) for symbolic regression over many independentvariables. CVGP expedites symbolic expression discovery via customizedexperiment design, rather than learning from a fixed dataset collected apriori. CVGP starts by fitting simple expressions involving a small set ofindependent variables using genetic programming, under controlled experimentswhere other variables are held as constants. It then extends expressionslearned in previous generations by adding new independent variables, using newcontrol variable experiments in which these variables are allowed to vary.Theoretically, we show CVGP as an incremental building approach can yield anexponential reduction in the search space when learning a class of expressions.Experimentally, CVGP outperforms several baselines in learning symbolicexpressions involving multiple independent variables.", "output": "Symbolic Regression via Control Variable Genetic Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The software supply chain (SSC) attack has become one of the crucial issuesthat are being increased rapidly with the advancement of the softwaredevelopment domain. In general, SSC attacks execute during the softwaredevelopment processes lead to vulnerabilities in software products targetingdownstream customers and even involved stakeholders. Machine Learningapproaches are proven in detecting and preventing software securityvulnerabilities. Besides, emerging quantum machine learning can be promising inaddressing SSC attacks. Considering the distinction between traditional andquantum machine learning, performance could be varies based on the proportionsof the experimenting dataset. In this paper, we conduct a comparative analysisbetween quantum neural networks (QNN) and conventional neural networks (NN)with a software supply chain attack dataset known as ClaMP. Our goal is todistinguish the performance between QNN and NN and to conduct the experiment,we develop two different models for QNN and NN by utilizing Pennylane forquantum and TensorFlow and Keras for traditional respectively. We evaluated theperformance of both models with different proportions of the ClaMP dataset toidentify the f1 score, recall, precision, and accuracy. We also measure theexecution time to check the efficiency of both models. The demonstration resultindicates that execution time for QNN is slower than NN with a higherpercentage of datasets. Due to recent advancements in QNN, a large level ofexperiments shall be carried out to understand both models accurately in ourfuture research.", "output": "Software Supply Chain Vulnerabilities Detection in Source Code: Performance Comparison between Traditional and Quantum Machine Learning Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent progress in 3D scene understanding enables scalable learning ofrepresentations across large datasets of diverse scenes. As a consequence,generalization to unseen scenes and objects, rendering novel views from just asingle or a handful of input images, and controllable scene generation thatsupports editing, is now possible. However, training jointly on a large numberof scenes typically compromises rendering quality when compared to single-sceneoptimized models such as NeRFs. In this paper, we leverage recent progress indiffusion models to equip 3D scene representation learning models with theability to render high-fidelity novel views, while retaining benefits such asobject-level scene editing to a large degree. In particular, we propose DORSal,which adapts a video diffusion architecture for 3D scene generation conditionedon object-centric slot-based representations of scenes. On both complexsynthetic multi-object scenes and on the real-world large-scale Street Viewdataset, we show that DORSal enables scalable neural rendering of 3D sceneswith object-level editing and improves upon existing approaches.", "output": "DORSal: Diffusion for Object-centric Representations of Scenes $\\textit{et al.}$."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Out-of-distribution (OOD) generalization deals with the prevalent learningscenario where test distribution shifts from training distribution. With risingapplication demands and inherent complexity, graph OOD problems call forspecialized solutions. While data-centric methods exhibit performanceenhancements on many generic machine learning tasks, there is a notable absenceof data augmentation methods tailored for graph OOD generalization. In thiswork, we propose to achieve graph OOD generalization with the novel design ofnon-Euclidean-space linear extrapolation. The proposed augmentation strategyextrapolates both structure and feature spaces to generate OOD graph data. Ourdesign tailors OOD samples for specific shifts without corrupting underlyingcausal mechanisms. Theoretical analysis and empirical results evidence theeffectiveness of our method in solving target shifts, showing substantial andconstant improvements across various graph OOD tasks.", "output": "Graph Structure and Feature Extrapolation for Out-of-Distribution Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks in modern communication systems can be susceptible tointernal numerical errors that can drastically effect decision results. Suchstructures are composed of many sections each of which generally containweighting operations and activation function evaluations. The safe use comesfrom methods employing number based codes that can detect arithmetic errors inthe network's processing steps. Each set of operations generates parity valuesdictated by a code in two ways. One set of parities is obtained from asection's outputs while a second comparable set is developed directly from theoriginal inputs. The parity values protecting the activation functions involvea Taylor series approximation to the activation functions. We focus on usinglong numerically based convolutional codes because of the large size of datasets. The codes are based on Discrete Fourier Transform kernels and there aremany design options available. Mathematical program simulations show ourerror-detecting techniques are effective and efficient.", "output": "Safe Use of Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The surge in Reinforcement Learning (RL) applications in IntelligentTransportation Systems (ITS) has contributed to its growth as well ashighlighted key challenges. However, defining objectives of RL agents intraffic control and management tasks, as well as aligning policies with thesegoals through an effective formulation of Markov Decision Process (MDP), can bechallenging and often require domain experts in both RL and ITS. Recentadvancements in Large Language Models (LLMs) such as GPT-4 highlight theirbroad general knowledge, reasoning capabilities, and commonsense priors acrossvarious domains. In this work, we conduct a large-scale user study involving 70participants to investigate whether novices can leverage ChatGPT to solvecomplex mixed traffic control problems. Three environments are tested,including ring road, bottleneck, and intersection. We find ChatGPT has mixedresults. For intersection and bottleneck, ChatGPT increases number ofsuccessful policies by 150% and 136% compared to solely beginner capabilities,with some of them even outperforming experts. However, ChatGPT does not provideconsistent improvements across all scenarios.", "output": "Can ChatGPT Enable ITS? The Case of Mixed Traffic Control via Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Speckle noise has long been an extensively studied problem in medicalimaging. In recent years, there have been significant advances in leveragingdeep learning methods for noise reduction. Nevertheless, adaptation ofsupervised learning models to unseen domains remains a challenging problem.Specifically, deep neural networks (DNNs) trained for computational imagingtasks are vulnerable to changes in the acquisition system's physicalparameters, such as: sampling space, resolution, and contrast. Even within thesame acquisition system, performance degrades across datasets of differentbiological tissues. In this work, we propose a few-shot supervised learningframework for optical coherence tomography (OCT) noise reduction, that offers adramatic increase in training speed and requires only a single image, or partof an image, and a corresponding speckle suppressed ground truth, for training.Furthermore, we formulate the domain shift problem for OCT diverse imagingsystems, and prove that the output resolution of a despeckling trained model isdetermined by the source domain resolution. We also provide possible remedies.We propose different practical implementations of our approach, verify andcompare their applicability, robustness, and computational efficiency. Ourresults demonstrate significant potential for generally improving samplecomplexity, generalization, and time efficiency, for coherent and non-coherentnoise reduction via supervised learning models, that can also be leveraged forother real-time computer vision applications.", "output": "Domain-Aware Few-Shot Learning for Optical Coherence Tomography Noise Reduction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Crowding is widely regarded as one of the most important risk factors indesigning portfolio strategies. In this paper, we analyze stock crowding usingnetwork analysis of fund holdings, which is used to compute crowding scores forstocks. These scores are used to construct costless long-short portfolios,computed in a distribution-free (model-free) way and without using anynumerical optimization, with desirable properties of hedge portfolios. Morespecifically, these long-short portfolios provide protection for both small andlarge market price fluctuations, due to their negative correlation with themarket and positive convexity as a function of market returns. By adding ourlong-short portfolio to a baseline portfolio such as a traditional 60/40portfolio, our method provides an alternative way to hedge portfolio riskincluding tail risk, which does not require costly option-based strategies orcomplex numerical optimization. The total cost of such hedging amounts to thetotal cost of rebalancing the hedge portfolio.", "output": "Model-Free Market Risk Hedging Using Crowding Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The fields of both Natural Language Processing (NLP) and Automated MachineLearning (AutoML) have achieved remarkable results over the past years. In NLP,especially Large Language Models (LLMs) have experienced a rapid series ofbreakthroughs very recently. We envision that the two fields can radically pushthe boundaries of each other through tight integration. To showcase thisvision, we explore the potential of a symbiotic relationship between AutoML andLLMs, shedding light on how they can benefit each other. In particular, weinvestigate both the opportunities to enhance AutoML approaches with LLMs fromdifferent perspectives and the challenges of leveraging AutoML to furtherimprove LLMs. To this end, we survey existing work, and we critically assessrisks. We strongly believe that the integration of the two fields has thepotential to disrupt both fields, NLP and AutoML. By highlighting conceivablesynergies, but also risks, we aim to foster further exploration at theintersection of AutoML and LLMs.", "output": "AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current state-of-the-art analyses on the convergence of gradient descent fortraining neural networks focus on characterizing properties of the losslandscape, such as the Polyak-Lojaciewicz (PL) condition and the restrictedstrong convexity. While gradient descent converges linearly under suchconditions, it remains an open question whether Nesterov's momentum enjoysaccelerated convergence under similar settings and assumptions. In this work,we consider a new class of objective functions, where only a subset of theparameters satisfies strong convexity, and show Nesterov's momentum achievesacceleration in theory for this objective class. We provide two realizations ofthe problem class, one of which is deep ReLU networks, which --to the best ofour knowledge--constitutes this work the first that proves acceleratedconvergence rate for non-trivial neural network architectures.", "output": "Accelerated Convergence of Nesterov's Momentum for Deep Neural Networks under Partial Strong Convexity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ciphers are a powerful tool for encrypting communication. There are manydifferent cipher types, which makes it computationally expensive to solve acipher using brute force. In this paper, we frame the decryption task as aclassification problem. We first create a dataset of transpositions,substitutions, text reversals, word reversals, sentence shifts, and unencryptedtext. Then, we evaluate the performance of various tokenizer-model combinationson this task.", "output": "CipherSniffer: Classifying Cipher Types."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training good representations for items is critical in recommender models.Typically, an item is assigned a unique randomly generated ID, and is commonlyrepresented by learning an embedding corresponding to the value of the randomID. Although widely used, this approach have limitations when the number ofitems are large and items are power-law distributed -- typical characteristicsof real-world recommendation systems. This leads to the item cold-startproblem, where the model is unable to make reliable inferences for tail andpreviously unseen items. Removing these ID features and their learnedembeddings altogether to combat cold-start issue severely degrades therecommendation quality. Content-based item embeddings are more reliable, butthey are expensive to store and use, particularly for users' past iteminteraction sequence. In this paper, we use Semantic IDs, a compact discreteitem representations learned from content embeddings using RQ-VAE that captureshierarchy of concepts in items. We showcase how we use them as a replacement ofitem IDs in a resource-constrained ranking model used in an industrial-scalevideo sharing platform. Moreover, we show how Semantic IDs improves thegeneralization ability of our system, without sacrificing top-level metrics.", "output": "Better Generalization with Semantic IDs: A case study in Ranking for Recommendations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasing reliance on large language models (LLMs) in academic writinghas led to a rise in plagiarism. Existing AI-generated text classifiers havelimited accuracy and often produce false positives. We propose a novel approachusing natural language processing (NLP) techniques, offering quantifiablemetrics at both sentence and document levels for easier interpretation by humanevaluators. Our method employs a multi-faceted approach, generating multipleparaphrased versions of a given question and inputting them into the LLM togenerate answers. By using a contrastive loss function based on cosinesimilarity, we match generated sentences with those from the student'sresponse. Our approach achieves up to 94% accuracy in classifying human and AItext, providing a robust and adaptable solution for plagiarism detection inacademic settings. This method improves with LLM advancements, reducing theneed for new model training or reconfiguration, and offers a more transparentway of evaluating and detecting AI-generated text.", "output": "Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural network compression has been an increasingly important subject, due toits practical implications in terms of reducing the computational requirementsand its theoretical implications, as there is an explicit connection betweencompressibility and the generalization error. Recent studies have shown thatthe choice of the hyperparameters of stochastic gradient descent (SGD) can havean effect on the compressibility of the learned parameter vector. Even thoughthese results have shed some light on the role of the training dynamics overcompressibility, they relied on unverifiable assumptions and the resultingtheory does not provide a practical guideline due to its implicitness. In thisstudy, we propose a simple modification for SGD, such that the outputs of thealgorithm will be provably compressible without making any nontrivialassumptions. We consider a one-hidden-layer neural network trained with SGD andwe inject additive heavy-tailed noise to the iterates at each iteration. Wethen show that, for any compression rate, there exists a level ofoverparametrization (i.e., the number of hidden units), such that the output ofthe algorithm will be compressible with high probability. To achieve thisresult, we make two main technical contributions: (i) we build on a recentstudy on stochastic analysis and prove a 'propagation of chaos' result withimproved rates for a class of heavy-tailed stochastic differential equations,and (ii) we derive strong-error estimates for their Euler discretization. Wefinally illustrate our approach on experiments, where the results suggest thatthe proposed approach achieves compressibility with a slight compromise fromthe training and test error.", "output": "Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperspectral images are typically composed of hundreds of narrow andcontiguous spectral bands, each containing information about the materialcomposition of the imaged scene. However, these images can be affected byvarious sources of noise, distortions, or data losses, which can significantlydegrade their quality and usefulness. To address these problems, we introducetwo novel self-supervised Hyperspectral Images (HSI) inpainting algorithms: LowRank and Sparsity Constraint Plug-and-Play (LRS-PnP), and its extensionLRS-PnP-DIP, which features the strong learning capability, but is still freeof external training data. We conduct the stability analysis under some mildassumptions which guarantees the algorithm to converge. It is specifically veryhelpful for the practical applications. Extensive experiments demonstrate thatthe proposed solution is able to produce visually and qualitatively superiorinpainting results, achieving state-of-the-art performance. The code forreproducing the results is available aturl{", "output": "Self-supervised Deep Hyperspectral Inpainting with the Sparsity and Low-Rank Considerations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As generative AI becomes more prevalent, it is important to study how humanusers interact with such models. In this work, we investigate how people usetext-to-image models to generate desired target images. To study thisinteraction, we created ArtWhisperer, an online game where users are given atarget image and are tasked with iteratively finding a prompt that creates asimilar-looking image as the target. Through this game, we recorded over 50,000human-AI interactions; each interaction corresponds to one text prompt createdby a user and the corresponding generated image. The majority of these arerepeated interactions where a user iterates to find the best prompt for theirtarget image, making this a unique sequential dataset for studying human-AIcollaborations. In an initial analysis of this dataset, we identify severalcharacteristics of prompt interactions and user strategies. People submitdiverse prompts and are able to discover a variety of text descriptions thatgenerate similar images. Interestingly, prompt diversity does not decrease asusers find better prompts. We further propose to a new metric the study thesteerability of AI using our dataset. We define steerability as the expectednumber of interactions required to adequately complete a task. We estimate thisvalue by fitting a Markov chain for each target task and calculating theexpected time to reach an adequate score in the Markov chain. We quantify andcompare AI steerability across different types of target images and twodifferent models, finding that images of cities and natural world images aremore steerable than artistic and fantasy images. These findings provideinsights into human-AI interaction behavior, present a concrete method ofassessing AI steerability, and demonstrate the general utility of theArtWhisperer dataset.", "output": "ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces a deep reinforcement learning (RL) framework foroptimizing the operations of power plants pairing renewable energy withstorage. The objective is to maximize revenue from energy markets whileminimizing storage degradation costs and renewable curtailment. The frameworkhandles complexities such as time coupling by storage devices, uncertainty inrenewable generation and energy prices, and non-linear storage models. Thestudy treats the problem as a hierarchical Markov Decision Process (MDP) anduses component-level simulators for storage. It utilizes RL to incorporatecomplex storage models, overcoming restrictions of optimization-based methodsthat require convex and differentiable component models. A significant aspectof this approach is ensuring policy actions respect system constraints,achieved via a novel method of projecting potentially infeasible actions onto asafe state-action set. The paper demonstrates the efficacy of this approachthrough extensive experiments using data from US and Indian electricitymarkets, comparing the learned RL policies with a baseline control policy and aretrospective optimal control policy. It validates the adaptability of thelearning framework with various storage models and shows the effectiveness ofRL in a complex energy optimization setting, in the context of multi-marketbidding, probabilistic forecasts, and accurate storage component models.", "output": "Multi-market Energy Optimization with Renewables via Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized prediction is a machine learning approach that predicts aperson's future observations based on their past labeled observations and istypically used for sequential tasks, e.g., to predict daily mood ratings. Whenmaking personalized predictions, a model can combine two types of trends: (a)trends shared across people, i.e., person-generic trends, such as being happieron weekends, and (b) unique trends for each person, i.e., person-specifictrends, such as a stressful weekly meeting. Mixed effect models are popularstatistical models to study both trends by combining person-generic andperson-specific parameters. Though linear mixed effect models are gainingpopularity in machine learning by integrating them with neural networks, theseintegrations are currently limited to linear person-specific parameters: rulingout nonlinear person-specific trends. In this paper, we propose Neural MixedEffect (NME) models to optimize nonlinear person-specific parameters anywherein a neural network in a scalable manner. NME combines the efficiency of neuralnetwork optimization with nonlinear mixed effects modeling. Empirically, weobserve that NME improves performance across six unimodal and multimodaldatasets, including a smartphone dataset to predict daily mood and amother-adolescent dataset to predict affective state sequences where half themothers experience at least moderate symptoms of depression. Furthermore, weevaluate NME for two model architectures, including for neural conditionalrandom fields (CRF) to predict affective state sequences where the CRF learnsnonlinear person-specific temporal transitions between affective states.Analysis of these person-specific transitions on the mother-adolescent datasetshows interpretable trends related to the mother's depression symptoms.", "output": "Neural Mixed Effects for Nonlinear Personalized Predictions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Matrix factorization (MF) mechanisms for differential privacy (DP) havesubstantially improved the state-of-the-art in privacy-utility-computationtradeoffs for ML applications in a variety of scenarios, but in both thecentralized and federated settings there remain instances where either MFcannot be easily applied, or other algorithms provide better tradeoffs(typically, as $epsilon$ becomes small).In this work, we show how MF can subsume prior state-of-the-art algorithms inboth federated and centralized training settings, across all privacy budgets.The key technique throughout is the construction of MF mechanisms with bandedmatrices. For cross-device federated learning (FL), this enablesmultiple-participations with a relaxed device participation schema compatiblewith practical FL infrastructure (as demonstrated by a production deployment).In the centralized setting, we prove that banded matrices enjoy the sameprivacy amplification results as for the ubiquitous DP-SGD algorithm, but canprovide strictly better performance in most scenarios -- this lets us always atleast match DP-SGD, and often outperform it even at $epsilonll2$. Finally,$hat{b}$-banded matrices substantially reduce the memory and time complexityof per-step noise generation from $mathcal{O}(n)$, $n$ the total number ofiterations, to a constant $mathcal{O}(hat{b})$, compared to general MFmechanisms.", "output": "(Amplified) Banded Matrix Factorization: A unified approach to private training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cryptocurrencies have gained popularity across various sectors, especially infinance and investment. The popularity is partly due to their uniquespecifications originating from blockchain-related characteristics such asprivacy, decentralisation, and untraceability. Despite their growingpopularity, cryptocurrencies remain a high-risk investment due to their pricevolatility and uncertainty. The inherent volatility in cryptocurrency prices,coupled with internal cryptocurrency-related factors and external influentialglobal economic factors makes predicting their prices and price movementdirections challenging. Nevertheless, the knowledge obtained from predictingthe direction of cryptocurrency prices can provide valuable guidance forinvestors in making informed investment decisions. To address this issue, thispaper proposes a dynamic Bayesian network (DBN) approach, which can modelcomplex systems in multivariate settings, to predict the price movementdirection of five popular altcoins (cryptocurrencies other than Bitcoin) in thenext trading day. The efficacy of the proposed model in predictingcryptocurrency price directions is evaluated from two perspectives. Firstly,our proposed approach is compared to two baseline models, namely anauto-regressive integrated moving average and support vector regression.Secondly, from a feature engineering point of view, the impact of twenty-threedifferent features, grouped into four categories, on the DBN's predictionperformance is investigated. The experimental results demonstrate that the DBNsignificantly outperforms the baseline models. In addition, among the groups offeatures, technical indicators are found to be the most effective predictors ofcryptocurrency price directions.", "output": "Causal Feature Engineering of Price Directions of Cryptocurrencies using Dynamic Bayesian Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks often learn unintended biases during training, whichmight have harmful effects when deployed in real-world settings. This papersurveys 209 papers on bias in NLP models, most of which addresssociodemographic bias. To better understand the distinction between bias andreal-world harm, we turn to ideas from psychology and behavioral economics topropose a definition for sociodemographic bias. We identify three maincategories of NLP bias research: types of bias, quantifying bias, anddebiasing. We conclude that current approaches on quantifying bias facereliability issues, that many of the bias metrics do not relate to real-worldbiases, and that current debiasing techniques are superficial and hide biasrather than removing it. Finally, we provide recommendations for future work.", "output": "Survey on Sociodemographic Bias in Natural Language Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Foundation Large Language Models (LLMs) such as GPT-4 represent a revolutionin AI due to their real-world applications though natural language processing.However, they also pose many significant risks such as the presence of biased,private, or harmful text, and the unauthorized inclusion of copyrightedmaterial.We introduce h2oGPT, a suite of open-source code repositories for thecreation and use of Large Language Models (LLMs) based on Generative PretrainedTransformers (GPTs). The goal of this project is to create the world's besttruly open-source alternative to closed-source GPTs. In collaboration with andas part of the incredible and unstoppable open-source community, we open-sourceseveral fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready forcommercial use under fully permissive Apache 2.0 licenses. Included in ourrelease is 100% private document search using natural language.Open-source language models help boost AI development and make it moreaccessible and trustworthy. They lower entry hurdles, allowing people andgroups to tailor these models to their needs. This openness increasesinnovation, transparency, and fairness. An open-source strategy is needed toshare AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.", "output": "h2oGPT: Democratizing Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a method that dramatically reduces fine-tuning VRAM requirementsand rectifies quantization errors in quantized Large Language Models. First, wedevelop an extremely memory-efficient fine-tuning (EMEF) method for quantizedmodels using Low-Rank Adaptation (LoRA), and drawing upon it, we construct anerror-correcting algorithm designed to minimize errors induced by thequantization process. Our method reduces the memory requirements by up to 5.6times, which enables fine-tuning a 7 billion parameter Large Language Model(LLM) on consumer laptops. At the same time, we propose a Low-Rank ErrorCorrection (LREC) method that exploits the added LoRA layers to ameliorate thegap between the quantized model and its float point counterpart. Our errorcorrection framework leads to a fully functional INT2 quantized LLM with thecapacity to generate coherent English text. To the best of our knowledge, thisis the first INT2 Large Language Model that has been able to reach such aperformance. The overhead of our method is merely a 1.05 times increase inmodel size, which translates to an effective precision of INT2.1. Also, ourmethod readily generalizes to other quantization standards, such as INT3, INT4,and INT8, restoring their lost performance, which marks a significant milestonein the field of model quantization. The strategies delineated in this paperhold promising implications for the future development and optimization ofquantized models, marking a pivotal shift in the landscape of low-resourcemachine learning computations.", "output": "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Proteolysis-Targeting Chimeras (PROTACs) represent a novel class of smallmolecules which are designed to act as a bridge between an E3 ligase and adisease-relevant protein, thereby promoting its subsequent degradation. PROTACsare composed of two protein binding \"active\" domains, linked by a \"linker\"domain. The design of the linker domain is challenging due to geometric andchemical constraints given by its interactions, and the need to maximizedrug-likeness. To tackle these challenges, we introduce ShapeLinker, a methodfor de novo design of linkers. It performs fragment-linking using reinforcementlearning on an autoregressive SMILES generator. The method optimizes for acomposite score combining relevant physicochemical properties and a novel,attention-based point cloud alignment score. This new method successfullygenerates linkers that satisfy both relevant 2D and 3D requirements, andachieves state-of-the-art results in producing novel linkers assuming a targetlinker conformation. This allows for more rational and efficient PROTAC designand optimization. Code and data are available at", "output": "Reinforcement Learning-Driven Linker Design via Fast Attention-based Point Cloud Alignment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) models that achieve high average accuracy can stillunderperform on semantically coherent subsets (i.e. \"slices\") of data. Thisbehavior can have significant societal consequences for the safety or bias ofthe model in deployment, but identifying these underperforming slices can bedifficult in practice, especially in domains where practitioners lack access togroup annotations to define coherent subsets of their data. Motivated by thesechallenges, ML researchers have developed new slice discovery algorithms thataim to group together coherent and high-error subsets of data. However, therehas been little evaluation focused on whether these tools help humans formcorrect hypotheses about where (for which groups) their model underperforms. Weconduct a controlled user study (N = 15) where we show 40 slices output by twostate-of-the-art slice discovery algorithms to users, and ask them to formhypotheses about where an object detection model underperforms. Our resultsprovide positive evidence that these tools provide some benefit over a naivebaseline, and also shed light on challenges faced by users during thehypothesis formation step. We conclude by discussing design opportunities forML and HCI researchers. Our findings point to the importance of centering userswhen designing and evaluating new tools for slice discovery.", "output": "Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The surge in multimodal AI's success has sparked concerns over data privacyin vision-and-language tasks. While CLIP has revolutionized multimodal learningthrough joint training on images and text, its potential to unintentionallydisclose sensitive information necessitates the integration ofprivacy-preserving mechanisms. We introduce a differentially private adaptationof the Contrastive Language-Image Pretraining (CLIP) model that effectivelyaddresses privacy concerns while retaining accuracy. Our proposed method,Dp-CLIP, is rigorously evaluated on benchmark datasets encompassing diversevision-and-language tasks such as image classification and visual questionanswering. We demonstrate that our approach retains performance on par with thestandard non-private CLIP model. Furthermore, we analyze our proposed algorithmunder linear representation settings. We derive the convergence rate of ouralgorithm and show a trade-off between utility and privacy when gradients areclipped per-batch and the loss function does not satisfy smoothness conditionsassumed in the literature for the analysis of DP-SGD.", "output": "Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Conformer-based end-to-end models have become ubiquitous these days and arecommonly used in both streaming and non-streaming automatic speech recognition(ASR). Techniques like dual-mode and dynamic chunk training helped unifystreaming and non-streaming systems. However, there remains a performance gapbetween streaming with a full and limited past context. To address this issue,we propose the integration of a novel dynamic contextual carry-over mechanismin a state-of-the-art (SOTA) unified ASR system. Our proposed dynamic contextConformer (DCTX-Conformer) utilizes a non-overlapping contextual carry-overmechanism that takes into account both the left context of a chunk and one ormore preceding context embeddings. We outperform the SOTA by a relative 25.0%word error rate, with a negligible latency impact due to the additional contextembeddings.", "output": "DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adobe Fonts has a rich library of over 20,000 unique fonts that Adobe usersutilize for creating graphics, posters, composites etc. Due to the nature ofthe large library, knowing what font to select can be a daunting task thatrequires a lot of experience. For most users in Adobe products, especiallycasual users of Adobe Express, this often means choosing the default fontinstead of utilizing the rich and diverse fonts available. In this work, wecreate an intent-driven system to provide contextual font recommendations tousers to aid in their creative journey. Our system takes in multilingual textinput and recommends suitable fonts based on the user's intent. Based on userentitlements, the mix of free and paid fonts is adjusted. The feature iscurrently used by millions of Adobe Express users with a CTR of &gt;25%.", "output": "Contextual Font Recommendations based on User Intent."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over the past decade, deep learning research has been accelerated byincreasingly powerful hardware, which facilitated rapid growth in the modelcomplexity and the amount of data ingested. This is becoming unsustainable andtherefore refocusing on efficiency is necessary. In this paper, we employtransfer learning to improve training efficiency for large-scale spatialproblems. We propose that a convolutional neural network (CNN) can be trainedon small windows of signals, but evaluated on arbitrarily large signals withlittle to no performance degradation, and provide a theoretical bound on theresulting generalization error. Our proof leverages shift-equivariance of CNNs,a property that is underexploited in transfer learning. The theoretical resultsare experimentally supported in the context of mobile infrastructure on demand(MID). The proposed approach is able to tackle MID at large scales withhundreds of agents, which was computationally intractable prior to this work.", "output": "Solving Large-scale Spatial Problems with Convolutional Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meta-learning has emerged as a powerful training strategy for few-shot nodeclassification, demonstrating its effectiveness in the transductive setting.However, the existing literature predominantly focuses on transductive few-shotnode classification, neglecting the widely studied inductive setting in thebroader few-shot learning community. This oversight limits our comprehensiveunderstanding of the performance of meta-learning based methods on graph data.In this work, we conduct an empirical study to highlight the limitations ofcurrent frameworks in the inductive few-shot node classification setting.Additionally, we propose a simple yet competitive baseline approachspecifically tailored for inductive few-shot node classification tasks. We hopeour work can provide a new path forward to better understand how themeta-learning paradigm works in the graph domain.", "output": "Inductive Linear Probing for Few-shot Node Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite its centrality in the philosophy of cognitive science, there has beenlittle prior philosophical work engaging with the notion of representation incontemporary NLP practice. This paper attempts to fill that lacuna: drawing onideas from cognitive science, I introduce a framework for evaluating therepresentational claims made about components of neural NLP models, proposingthree criteria with which to evaluate whether a component of a model representsa property and operationalising these criteria using probing classifiers, apopular analysis technique in NLP (and deep learning more broadly).The project of operationalising a philosophically-informed notion ofrepresentation should be of interest to both philosophers of science and NLPpractitioners. It affords philosophers a novel testing-ground for claims aboutthe nature of representation, and helps NLPers organise the large literature onprobing experiments, suggesting novel avenues for empirical research.", "output": "Operationalising Representation in Natural Language Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Node classification on graphs is a significant task with a wide range ofapplications, including social analysis and anomaly detection. Even thoughgraph neural networks (GNNs) have produced promising results on this task,current techniques often presume that label information of nodes is accurate,which may not be the case in real-world applications. To tackle this issue, weinvestigate the problem of learning on graphs with label noise and develop anovel approach dubbed Consistent Graph Neural Network (CGNN) to solve it.Specifically, we employ graph contrastive learning as a regularization term,which promotes two views of augmented nodes to have consistent representations.Since this regularization term cannot utilize label information, it can enhancethe robustness of node representations to label noise. Moreover, to detectnoisy labels on the graph, we present a sample selection technique based on thehomophily assumption, which identifies noisy nodes by measuring the consistencybetween the labels with their neighbors. Finally, we purify these confidentnoisy labels to permit efficient semantic graph learning. Extensive experimentson three well-known benchmark datasets demonstrate the superiority of our CGNNover competing approaches.", "output": "Learning on Graphs under Label Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Encoding whole slide images (WSI) as graphs is well motivated since it makesit possible for the gigapixel resolution WSI to be represented in its entiretyfor the purpose of graph learning. To this end, WSIs can be broken into smallerpatches that represent the nodes of the graph. Then, graph-based learningmethods can be utilized for the grading and classification of cancer. Messagepassing among neighboring nodes is the foundation of graph-based learningmethods. However, they do not take into consideration any positionalinformation for any of the patches, and if two patches are found intopologically isomorphic neighborhoods, their embeddings are nearly similar toone another. In this work, classification of cancer from WSIs is performed withpositional embedding and graph attention. In order to represent the positionalembedding of the nodes in graph classification, the proposed method makes useof spline convolutional neural networks (CNN). The algorithm is then testedwith the WSI dataset for grading prostate cancer and kidney cancer. Acomparison of the proposed method with leading approaches in cancer diagnosisand grading verify improved performance. The identification of cancerousregions in WSIs is another critical task in cancer diagnosis. In this work, theexplainability of the proposed model is also addressed. A gradient-basedexplainbility approach is used to generate the saliency mapping for the WSIs.This can be used to look into regions of WSI that are responsible for cancerdiagnosis thus rendering the proposed model explainable.", "output": "Explainable and Position-Aware Learning in Digital Pathology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual learning (CL) has attracted increasing attention in the recentpast. It aims to mimic the human ability to learn new concepts withoutcatastrophic forgetting. While existing CL methods accomplish this to someextent, they are still prone to semantic drift of the learned feature space.Foundation models, which are endowed with a robust feature representation,learned from very large datasets, provide an interesting substrate for thesolution of the CL problem. Recent work has also shown that they can be adaptedto specific tasks by prompt tuning techniques that leave the generality of therepresentation mostly unscathed. An open question is, however, how to learnboth prompts that are task specific and prompts that are global, i.e. capturecross-task information. In this work, we propose the Prompt Of Prompts (POP)model, which addresses this goal by progressively learning a group oftask-specified prompts and a group of global prompts, denoted as POP, tointegrate information from the former. We show that a foundation model equippedwith POP learning is able to outperform classic CL methods by a significantmargin. Moreover, as prompt tuning only requires a small set of trainingsamples, POP is able to perform CL in the few-shot setting, while stilloutperforming competing methods trained on the entire dataset.", "output": "POP: Prompt Of Prompts for Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A common challenge in applying graph machine learning methods is that theunderlying graph of a system is often unknown. Although different graphinference methods have been proposed for continuous graph signals, inferringthe graph structure underlying other types of data, such as discrete counts, isunder-explored. In this paper, we generalize a graph signal processing (GSP)framework for learning a graph from smooth graph signals to the exponentialfamily noise distribution to model various data types. We propose analternating algorithm that estimates the graph Laplacian as well as theunobserved smooth representation from the noisy signals. We demonstrate insynthetic and real-world data that our new algorithm outperforms competingLaplacian estimation methods under noise model mismatch.", "output": "Graph Laplacian Learning with Exponential Family Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the pursuit of artificial general intelligence (AGI), we tackleAbstraction and Reasoning Corpus (ARC) tasks using a novel two-prongedapproach. We employ the Decision Transformer in an imitation learning paradigmto model human problem-solving, and introduce an object detection algorithm,the Push and Pull clustering method. This dual strategy enhances AI's ARCproblem-solving skills and provides insights for AGI progression. Yet, our workreveals the need for advanced data collection tools, robust training datasets,and refined model structures. This study highlights potential improvements forDecision Transformers and propels future AGI research.", "output": "Unraveling the ARC Puzzle: Mimicking Human Solutions with Object-Centric Decision Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks have shown impressive capabilities in solving variousgraph learning tasks, particularly excelling in node classification. However,their effectiveness can be hindered by the challenges arising from thewidespread existence of noisy measurements associated with the topological ornodal information present in real-world graphs. These inaccuracies inobservations can corrupt the crucial patterns within the graph data, ultimatelyresulting in undesirable performance in practical applications. To addressthese issues, this paper proposes a novel uncertainty-aware graph learningframework motivated by distributionally robust optimization. Specifically, weuse a graph neural network-based encoder to embed the node features and findthe optimal node embeddings by minimizing the worst-case risk through a minimaxformulation. Such an uncertainty-aware learning process leads to improved noderepresentations and a more robust graph predictive model that effectivelymitigates the impact of uncertainty arising from data noise. Our experimentalresult shows that the proposed framework achieves superior predictiveperformance compared to the state-of-the-art baselines under various noisysettings.", "output": "Uncertainty-Aware Robust Learning on Noisy Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes the transition-net, a robust transition strategy thatexpands the versatility of robot locomotion in the real-world setting. To thisend, we start by distributing the complexity of different gaits into dedicatedlocomotion policies applicable to real-world robots. Next, we expand theversatility of the robot by unifying the policies with robust transitions intoa single coherent meta-controller by examining the latent staterepresentations. Our approach enables the robot to iteratively expand its skillrepertoire and robustly transition between any policy pair in a library. In ourframework, adding new skills does not introduce any process that alters thepreviously learned skills. Moreover, training of a locomotion policy takes lessthan an hour with a single consumer GPU. Our approach is effective in thereal-world and achieves a 19% higher average success rate for the mostchallenging transition pairs in our experiments compared to existingapproaches.", "output": "Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "By composing graphical models with deep learning architectures, we learngenerative models with the strengths of both frameworks. The structuredvariational autoencoder (SVAE) inherits structure and interpretability fromgraphical models, and flexible likelihoods for high-dimensional data from deeplearning, but poses substantial optimization challenges. We propose novelalgorithms for learning SVAEs, and are the first to demonstrate the SVAE'sability to handle multimodal uncertainty when data is missing by incorporatingdiscrete latent variables. Our memory-efficient implicit differentiation schememakes the SVAE tractable to learn via gradient descent, while demonstratingrobustness to incomplete optimization. To more rapidly learn accurate graphicalmodel parameters, we derive a method for computing natural gradients withoutmanual derivations, which avoids biases found in prior work. These optimizationinnovations enable the first comparisons of the SVAE to state-of-the-art timeseries models, where the SVAE performs competitively while learninginterpretable and structured discrete data representations.", "output": "Unbiased Learning of Deep Generative Models with Structured Discrete Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inverse Reinforcement Learning (IRL) aims to reconstruct the reward functionfrom expert demonstrations to facilitate policy learning, and has demonstratedits remarkable success in imitation learning. To promote expert-like behavior,existing IRL methods mainly focus on learning global reward functions tominimize the trajectory difference between the imitator and the expert.However, these global designs are still limited by the redundant noise anderror propagation problems, leading to the unsuitable reward assignment andthus downgrading the agent capability in complex multi-stage tasks. In thispaper, we propose a novel Curricular Subgoal-based Inverse ReinforcementLearning (CSIRL) framework, that explicitly disentangles one task with severallocal subgoals to guide agent imitation. Specifically, CSIRL firstly introducesdecision uncertainty of the trained agent over expert trajectories todynamically select subgoals, which directly determines the exploration boundaryof different task stages. To further acquire local reward functions for eachstage, we customize a meta-imitation objective based on these curricularsubgoals to train an intrinsic reward generator. Experiments on the D4RL andautonomous driving benchmarks demonstrate that the proposed methods yieldsresults superior to the state-of-the-art counterparts, as well as betterinterpretability. Our code is available at ", "output": "Curricular Subgoals for Inverse Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autism spectrum disorder (ASD) is a developmental disorder characterized bysignificant social communication impairments and difficulties perceiving andpresenting communication cues. Machine learning techniques have been broadlyadopted to facilitate autism studies and assessments. However, computationalmodels are primarily concentrated on specific analysis and validated on privatedatasets in the autism community, which limits comparisons across models due toprivacy-preserving data sharing complications. This work presents a novelprivacy-preserving open-source dataset, MMASD as a MultiModal ASD benchmarkdataset, collected from play therapy interventions of children with Autism.MMASD includes data from 32 children with ASD, and 1,315 data samples segmentedfrom over 100 hours of intervention recordings. To promote public access, eachdata sample consists of four privacy-preserving modalities of data: (1) opticalflow, (2) 2D skeleton, (3) 3D skeleton, and (4) clinician ASD evaluation scoresof children, e.g., ADOS scores. MMASD aims to assist researchers and therapistsin understanding children's cognitive status, monitoring their progress duringtherapy, and customizing the treatment plan accordingly. It also hasinspiration for downstream tasks such as action quality assessment andinterpersonal synchrony estimation. MMASD dataset can be easily accessed at", "output": "MMASD: A Multimodal Dataset for Autism Intervention Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Objective: Seizure prediction is of great importance to improve the life ofpatients. The focal point is to distinguish preictal states from interictalones. With the development of machine learning, seizure prediction methods haveachieved significant progress. However, the severe imbalance problem betweenpreictal and interictal data still poses a great challenge, restricting theperformance of classifiers. Data augmentation is an intuitive way to solve thisproblem. Existing data augmentation methods generate samples by overlapping orrecombining data. The distribution of generated samples is limited by originaldata, because such transformations cannot fully explore the feature space andoffer new information. As the epileptic EEG representation varies amongseizures, these generated samples cannot provide enough diversity to achievehigh performance on a new seizure. As a consequence, we propose a novel dataaugmentation method with diffusion model called DiffEEG. Methods: Diffusionmodels are a class of generative models that consist of two processes.Specifically, in the diffusion process, the model adds noise to the input EEGsample step by step and converts the noisy sample into output random noise,exploring the distribution of data by minimizing the loss between the outputand the noise added. In the denoised process, the model samples the syntheticdata by removing the noise gradually, diffusing the data distribution tooutward areas and narrowing the distance between different clusters. Results:We compared DiffEEG with existing methods, and integrated them into threerepresentative classifiers. The experiments indicate that DiffEEG could furtherimprove the performance and shows superiority to existing methods. Conclusion:This paper proposes a novel and effective method to solve the imbalancedproblem and demonstrates the effectiveness and generality of our method.", "output": "Data Augmentation for Seizure Prediction with Generative Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traffic forecasting plays a critical role in smart city initiatives and hasexperienced significant advancements thanks to the power of deep learning incapturing non-linear patterns of traffic data. However, the promising resultsachieved on current public datasets may not be applicable to practicalscenarios due to limitations within these datasets. First, the limited sizes ofthem may not reflect the real-world scale of traffic networks. Second, thetemporal coverage of these datasets is typically short, posing hurdles instudying long-term patterns and acquiring sufficient samples for training deepmodels. Third, these datasets often lack adequate metadata for sensors, whichcompromises the reliability and interpretability of the data. To mitigate theselimitations, we introduce the LargeST benchmark dataset. It encompasses a totalnumber of 8,600 sensors with a 5-year time coverage and includes comprehensivemetadata. Using LargeST, we perform in-depth data analysis to extract datainsights, benchmark well-known baselines in terms of their performance andefficiency, and identify challenges as well as opportunities for futureresearch. We release the datasets and baseline implementations at:", "output": "LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The sun is highly complex in nature and its observatory imagery features isone of the most important sources of information about the sun activity, spaceand Earth's weather conditions. The NASA, solar Dynamics Observatory capturesapproximately 70,000 images of the sun activity in a day and the continuousvisual inspection of this solar observatory images is challenging. In thisstudy, we developed a technique of tracking the sun's activity using 2Dcircular kernel time series transformation, statistical and entropy measures,with machine learning approaches. The technique involves transforming the solarobservatory image section into 1-Dimensional time series (1-DTS) while thestatistical and entropy measures (Approach 1) and direct classification(Approach 2) is used to capture the extraction features from the 1-DTS formachine learning classification into 'solar storm' and 'no storm'. We foundthat the potential accuracy of the model in tracking the activity of the sun isapproximately 0.981 for Approach 1 and 0.999 for Approach 2. The stability ofthe developed approach to rotational transformation of the solar observatoryimage is evident. When training on the original dataset for Approach 1, thematch index (T90) of the distribution of solar storm areas reaches T90 ~ 0.993,and T90 ~ 0.951 for Approach 2. In addition, when using the extended trainingbase, the match indices increased to T90 ~ 0.994 and T90 ~ 1, respectively.This model consistently classifies areas with swirling magnetic linesassociated with solar storms and is robust to image rotation, glare, andoptical artifacts.", "output": "Imagery Tracking of Sun Activity Using 2D Circular Kernel Time Series Transformation, Entropy Measures and Machine Learning Approaches."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Node classification is one of the hottest tasks in graph analysis. In thispaper, we focus on the choices of node representations (aggregated features vs.adjacency lists) and the edge direction of an input graph (directed vs.undirected), which have a large influence on classification results. We addressthe first empirical study to benchmark the performance of various GNNs that useeither combination of node representations and edge directions. Our experimentsdemonstrate that no single combination stably achieves state-of-the-art resultsacross datasets, which indicates that we need to select appropriatecombinations depending on the characteristics of datasets. In response, wepropose a simple yet holistic classification method A2DUG which leverages allcombinations of node representation variants in directed and undirected graphs.We demonstrate that A2DUG stably performs well on various datasets.Surprisingly, it largely outperforms the current state-of-the-art methods inseveral datasets. This result validates the importance of the adaptive effectcontrol on the combinations of node representations and edge directions.", "output": "Why Using Either Aggregated Features or Adjacency Lists in Directed or Undirected Graph? Empirical Study and Simple Classification Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modelling spatio-temporal processes on road networks is a task of growingimportance. While significant progress has been made on developingspatio-temporal graph neural networks (Gnns), existing works are built uponthree assumptions that are not practical on real-world road networks. First,they assume sensing on every node of a road network. In reality, due tobudget-constraints or sensor failures, all locations (nodes) may not beequipped with sensors. Second, they assume that sensing history is available atall installed sensors. This is unrealistic as well due to sensor failures, lossof packets during communication, etc. Finally, there is an assumption of staticroad networks. Connectivity within networks change due to road closures,constructions of new roads, etc. In this work, we develop FRIGATE to addressall these shortcomings. FRIGATE is powered by a spatio-temporal Gnn thatintegrates positional, topological, and temporal information into richinductive node representations. The joint fusion of this diverse information ismade feasible through a novel combination of gated Lipschitz embeddings withLstms. We prove that the proposed Gnn architecture is provably more expressivethan message-passing Gnns used in state-of-the-art algorithms. The higherexpressivity of FRIGATE naturally translates to superior empirical performanceconducted on real-world network-constrained traffic data. In addition, FRIGATEis robust to frugal sensor deployment, changes in road network connectivity,and temporal irregularity in sensing.", "output": "FRIGATE: Frugal Spatio-temporal Forecasting on Road Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel privacy-preserving uplink over-the-air computation(AirComp) method, termed FLORAS, for single-input single-output (SISO) wirelessfederated learning (FL) systems. From the communication design perspective,FLORAS eliminates the requirement of channel state information at thetransmitters (CSIT) by leveraging the properties of orthogonal sequences. Fromthe privacy perspective, we prove that FLORAS can offer both item-level andclient-level differential privacy (DP) guarantees. Moreover, by adjusting thesystem parameters, FLORAS can flexibly achieve different DP levels at noadditional cost. A novel FL convergence bound is derived which, combined withthe privacy guarantees, allows for a smooth tradeoff between convergence rateand differential privacy levels. Numerical results demonstrate the advantagesof FLORAS compared with the baseline AirComp method, and validate that ouranalytical results can guide the design of privacy-preserving FL with differenttradeoff requirements on the model convergence and privacy levels.", "output": "Differentially Private Wireless Federated Learning Using Orthogonal Sequences."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The importance of inference in Machine Learning (ML) has led to an explosivenumber of different proposals in ML, and particularly in Deep Learning. In anattempt to reduce the complexity of Convolutional Neural Networks, we propose aVolterra filter-inspired Network architecture. This architecture introducescontrolled non-linearities in the form of interactions between the delayedinput samples of data. We propose a cascaded implementation of VolterraFiltering so as to significantly reduce the number of parameters required tocarry out the same classification task as that of a conventional NeuralNetwork. We demonstrate an efficient parallel implementation of this VolterraNeural Network (VNN), along with its remarkable performance while retaining arelatively simpler and potentially more tractable structure. Furthermore, weshow a rather sophisticated adaptation of this network to nonlinearly fuse theRGB (spatial) information and the Optical Flow (temporal) information of avideo sequence for action recognition. The proposed approach is evaluated onUCF-101 and HMDB-51 datasets for action recognition, and is shown to outperformstate of the art CNN approaches.", "output": "Volterra Neural Networks (VNNs)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Designing compounds with desired properties is a key element of the drugdiscovery process. However, measuring progress in the field has beenchallenging due to the lack of realistic retrospective benchmarks, and thelarge cost of prospective validation. To close this gap, we propose a benchmarkbased on docking, a popular computational method for assessing molecule bindingto a protein. Concretely, the goal is to generate drug-like molecules that arescored highly by SMINA, a popular docking software. We observe that populargraph-based generative models fail to generate molecules with a high dockingscore when trained using a realistically sized training set. This suggests alimitation of the current incarnation of models for de novo drug design.Finally, we propose a simplified version of the benchmark based on a simplerscoring function, and show that the tested models are able to partially solveit. We release the benchmark as an easy to use package available at We hope that ourbenchmark will serve as a stepping stone towards the goal of automaticallygenerating promising drug candidates.", "output": "We Should at Least Be Able to Design Molecules That Dock Well."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Convolutional Neural Networks (DCNNs) have demonstrated impressiverobustness to recognize objects under transformations (eg. blur or noise) whenthese transformations are included in the training set. A hypothesis to explainsuch robustness is that DCNNs develop invariant neural representations thatremain unaltered when the image is transformed. However, to what extent thishypothesis holds true is an outstanding question, as robustness totransformations could be achieved with properties different from invariance,eg. parts of the network could be specialized to recognize either transformedor non-transformed images. This paper investigates the conditions under whichinvariant neural representations emerge by leveraging that they facilitaterobustness to transformations beyond the training distribution. Concretely, weanalyze a training paradigm in which only some object categories are seentransformed during training and evaluate whether the DCNN is robust totransformations across categories not seen transformed. Our results withstate-of-the-art DCNNs indicate that invariant neural representations do notalways drive robustness to transformations, as networks show robustness forcategories seen transformed during training even in the absence of invariantneural representations. Invariance only emerges as the number of transformedcategories in the training set is increased. This phenomenon is much moreprominent with local transformations such as blurring and high-pass filteringthan geometric transformations such as rotation and thinning, which entailchanges in the spatial arrangement of the object. Our results contribute to abetter understanding of invariant neural representations in deep learning andthe conditions under which it spontaneously emerges.", "output": "Robustness to Transformations Across Categories: Is Robustness To Transformations Driven by Invariant Neural Representations?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Navigating the complex landscape of single-cell transcriptomic data presentssignificant challenges. Central to this challenge is the identification of ameaningful representation of high-dimensional gene expression patterns thatsheds light on the structural and functional properties of cell types. Pursuingmodel interpretability and computational simplicity, we often look for a lineartransformation of the original data that aligns with key phenotypic features ofcells. In response to this need, we introduce factorized linear discriminantanalysis (FLDA), a novel method for linear dimensionality reduction. The cruxof FLDA lies in identifying a linear function of gene expression levels that ishighly correlated with one phenotypic feature while minimizing the influence ofothers. To augment this method, we integrate it with a sparsity-basedregularization algorithm. This integration is crucial as it selects a subset ofgenes pivotal to a specific phenotypic feature or a combination thereof. Toillustrate the effectiveness of FLDA, we apply it to transcriptomic datasetsfrom neurons in the Drosophila optic lobe. We demonstrate that FLDA not onlycaptures the inherent structural patterns aligned with phenotypic features butalso uncovers key genes associated with each phenotype.", "output": "Factorized linear discriminant analysis and its application in computational biology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training machine learning models on privacy-sensitive data has become apopular practice, driving innovation in ever-expanding fields. This has openedthe door to new attacks that can have serious privacy implications. One suchattack, the Membership Inference Attack (MIA), exposes whether or not aparticular data point was used to train a model. A growing body of literatureuses Differentially Private (DP) training algorithms as a defence against suchattacks. However, these works evaluate the defence under the restrictiveassumption that all members of the training set, as well as non-members, areindependent and identically distributed. This assumption does not hold for manyreal-world use cases in the literature. Motivated by this, we evaluatemembership inference with statistical dependencies among samples and explainwhy DP does not provide meaningful protection (the privacy parameter $epsilon$scales with the training set size $n$) in this more general case. We conduct aseries of empirical evaluations with off-the-shelf MIAs using training setsbuilt from real-world data showing different types of dependencies amongsamples. Our results reveal that training set dependencies can severelyincrease the performance of MIAs, and therefore assuming that data samples arestatistically independent can significantly underestimate the performance ofMIAs.", "output": "Investigating Membership Inference Attacks under Data Dependencies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Previous studies have confirmed that by augmenting acoustic features with theplace/manner of articulatory features, the speech enhancement (SE) process canbe guided to consider the broad phonetic properties of the input speech whenperforming enhancement to attain performance improvements. In this paper, weexplore the contextual information of articulatory attributes as additionalinformation to further benefit SE. More specifically, we propose to improve theSE performance by leveraging losses from an end-to-end automatic speechrecognition (E2E-ASR) model that predicts the sequence of broad phoneticclasses (BPCs). We also developed multi-objective training with ASR andperceptual losses to train the SE system based on a BPC-based E2E-ASR.Experimental results from speech denoising, speech dereverberation, andimpaired speech enhancement tasks confirmed that contextual BPC informationimproves SE performance. Moreover, the SE model trained with the BPC-basedE2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggestthat objectives with misclassification of phonemes by the ASR system may leadto imperfect feedback, and BPC could be a potentially better choice. Finally,it is noted that combining the most-confusable phonetic targets into the sameBPC when calculating the additional objective can effectively improve the SEperformance.", "output": "Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computing an AUC as a performance measure to compare the quality of differentmachine learning models is one of the final steps of many research projects.Many of these methods are trained on privacy-sensitive data and there areseveral different approaches like $epsilon$-differential privacy, federatedmachine learning and cryptography if the datasets cannot be shared or usedjointly at one place for training and/or testing. In this setting, it can alsobe a problem to compute the global AUC, since the labels might also containprivacy-sensitive information. There have been approaches based on$epsilon$-differential privacy to address this problem, but to the best of ourknowledge, no exact privacy preserving solution has been introduced. In thispaper, we propose an MPC-based solution, called ppAURORA, with private mergingof individually sorted lists from multiple sources to compute the exact AUC asone could obtain on the pooled original test samples. With ppAURORA, thecomputation of the exact area under precision-recall and receiver operatingcharacteristic curves is possible even when ties between prediction confidencevalues exist. We use ppAURORA to evaluate two different models predicting acutemyeloid leukemia therapy response and heart disease, respectively. We alsoassess its scalability via synthetic data experiments. All these experimentsshow that we efficiently and privately compute the exact same AUC with bothevaluation metrics as one can obtain on the pooled test samples in plaintextaccording to the semi-honest adversary setting.", "output": "ppAURORA: Privacy Preserving Area Under Receiver Operating Characteristic and Precision-Recall Curves."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Solving math word problems (MWPs) is an important and challenging problem innatural language processing. Existing approaches to solve MWPs require fullsupervision in the form of intermediate equations. However, labeling every MWPwith its corresponding equations is a time-consuming and expensive task. Inorder to address this challenge of equation annotation, we propose a weaklysupervised model for solving MWPs by requiring only the final answer assupervision. We approach this problem by first learning to generate theequation using the problem description and the final answer, which wesubsequently use to train a supervised MWP solver. We propose and comparevarious weakly supervised techniques to learn to generate equations directlyfrom the problem description and answer. Through extensive experiments, wedemonstrate that without using equations for supervision, our approach achievesaccuracy gains of 4.5% and 32% over the state-of-the-art weakly supervisedapproach, on the standard Math23K and AllArith datasets respectively.Additionally, we curate and release new datasets of roughly 10k MWPs each inEnglish and in Hindi (a low resource language).These datasets are suitable fortraining weakly supervised models. We also present an extension of WARMM tosemi-supervised learning and present further improvements on results, alongwith insights.", "output": "WARM: A Weakly (+Semi) Supervised Model for Solving Math word Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning individualized treatment rules (ITRs) is an important topic inprecision medicine. Current literature mainly focuses on deriving ITRs from asingle source population. We consider the observational data setting when thesource population differs from a target population of interest. Compared withcausal generalization for the average treatment effect which is a scalarquantity, ITR generalization poses new challenges due to the need to model andgeneralize the rules based on a prespecified class of functions which may notcontain the unrestricted true optimal ITR. The aim of this paper is to developa weighting framework to mitigate the impact of such misspecification and thusfacilitate the generalizability of optimal ITRs from a source population to atarget population. Our method seeks covariate balance over a non-parametricfunction class characterized by a reproducing kernel Hilbert space and canimprove many ITR learning methods that rely on weights. We show that theproposed method encompasses importance weights and overlap weights as twoextreme cases, allowing for a better bias-variance trade-off in between.Numerical examples demonstrate that the use of our weighting method can greatlyimprove ITR estimation for the target population compared with other weightingmethods.", "output": "Robust Sample Weighting to Facilitate Individualized Treatment Rule Learning for a Target Population."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Likelihood is a standard estimate for outlier detection. The specific role ofthe normalization constraint is to ensure that the out-of-distribution (OOD)regime has a small likelihood when samples are learned using maximumlikelihood. Because autoencoders do not possess such a process ofnormalization, they often fail to recognize outliers even when they areobviously OOD. We propose the Normalized Autoencoder (NAE), a normalizedprobabilistic model constructed from an autoencoder. The probability density ofNAE is defined using the reconstruction error of an autoencoder, which isdifferently defined in the conventional energy-based model. In our model,normalization is enforced by suppressing the reconstruction of negativesamples, significantly improving the outlier detection performance. Ourexperimental results confirm the efficacy of NAE, both in detecting outliersand in generating in-distribution samples.", "output": "Autoencoding Under Normalization Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies federated learning (FL)--especially cross-silo FL--withdata from people who do not trust the server or other silos. In this setting,each silo (e.g. hospital) has data from different people (e.g. patients) andmust maintain the privacy of each person's data (e.g. medical record), even ifthe server or other silos act as adversarial eavesdroppers. This requirementmotivates the study of Inter-Silo Record-Level Differential Privacy (ISRL-DP),which requires silo i's communications to satisfy record/item-leveldifferential privacy (DP). ISRL-DP ensures that the data of each person (e.g.patient) in silo i (e.g. hospital i) cannot be leaked. ISRL-DP is differentfrom well-studied privacy notions. Central and user-level DP assume that peopletrust the server/other silos. On the other end of the spectrum, local DPassumes that people do not trust anyone at all (even their own silo). Sittingbetween central and local DP, ISRL-DP makes the realistic assumption (incross-silo FL) that people trust their own silo, but not the server or othersilos. In this work, we provide tight (up to logarithms) upper and lower boundsfor ISRL-DP FL with convex/strongly convex loss functions and homogeneous(i.i.d.) silo data. Remarkably, we show that similar bounds are attainable forsmooth losses with arbitrary heterogeneous silo data distributions, via anaccelerated ISRL-DP algorithm. We also provide tight upper and lower bounds forISRL-DP federated empirical risk minimization, and use acceleration to attainthe optimal bounds in fewer rounds of communication than the state-of-the-art.Finally, with a secure \"shuffler\" to anonymize silo messages (but without atrusted server), our algorithm attains the optimal central DP rates under morepractical trust assumptions. Numerical experiments show favorableprivacy-accuracy tradeoffs for our algorithm in classification and regressiontasks.", "output": "Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep generative models trained by maximum likelihood remain very popularmethods for reasoning about data probabilistically. However, it has beenobserved that they can assign higher likelihoods to out-of-distribution (OOD)data than in-distribution data, thus calling into question the meaning of theselikelihood values. In this work we provide a novel perspective on thisphenomenon, decomposing the average likelihood into a KL divergence term and anentropy term. We argue that the latter can explain the curious OOD behaviourmentioned above, suppressing likelihood values on datasets with higher entropy.Although our idea is simple, we have not seen it explored yet in theliterature. This analysis provides further explanation for the success of OODdetection methods based on likelihood ratios, as the problematic entropy termcancels out in expectation. Finally, we discuss how this observation relates torecent success in OOD detection with manifold-supported models, for which theabove decomposition does not hold directly.", "output": "Entropic Issues in Likelihood-Based OOD Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Parameterized Quantum Circuits (PQC) are promising towards quantum advantageon near-term quantum hardware. However, due to the large quantum noises(errors), the performance of PQC models has a severe degradation on realquantum devices. Take Quantum Neural Network (QNN) as an example, the accuracygap between noise-free simulation and noisy results on IBMQ-Yorktown forMNIST-4 classification is over 60%. Existing noise mitigation methods aregeneral ones without leveraging unique characteristics of PQC; on the otherhand, existing PQC work does not consider noise effect. To this end, we presentQuantumNAT, a PQC-specific framework to perform noise-aware optimizations inboth training and inference stages to improve robustness. We experimentallyobserve that the effect of quantum noise to PQC measurement outcome is a linearmap from noise-free outcome with a scaling and a shift factor. Motivated bythat, we propose post-measurement normalization to mitigate the featuredistribution differences between noise-free and noisy scenarios. Furthermore,to improve the robustness against noise, we propose noise injection to thetraining process by inserting quantum error gates to PQC according to realisticnoise models of quantum hardware. Finally, post-measurement quantization isintroduced to quantize the measurement outcomes to discrete values, achievingthe denoising effect. Extensive experiments on 8 classification tasks using 6quantum devices demonstrate that QuantumNAT improves accuracy by up to 43%, andachieves over 94% 2-class, 80% 4-class, and 34% 10-class classificationaccuracy measured on real quantum computers. The code for construction andnoise-aware training of PQC is available in the TorchQuantum library.", "output": "QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Named entity recognition (NER) aims to identify mentions of named entities inan unstructured text and classify them into predefined named entity classes.While deep learning-based pre-trained language models help to achieve goodpredictive performances in NER, many domain-specific NER applications stillcall for a substantial amount of labeled data. Active learning (AL), a generalframework for the label acquisition problem, has been used for NER tasks tominimize the annotation cost without sacrificing model performance. However,the heavily imbalanced class distribution of tokens introduces challenges indesigning effective AL querying methods for NER. We propose several AL sentencequery evaluation functions that pay more attention to potential positivetokens, and evaluate these proposed functions with both sentence-based andtoken-based cost evaluation strategies. We also propose a better data-drivennormalization approach to penalize sentences that are too long or too short.Our experiments on three datasets from different domains reveal that theproposed approach reduces the number of annotated tokens while achieving betteror comparable prediction performance with conventional methods.", "output": "Focusing on Potential Named Entities During Active Label Acquisition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated machine learning (AutoML) strives for the automatic configurationof machine learning algorithms and their composition into an overall (software)solution - a machine learning pipeline - tailored to the learning task(dataset) at hand. Over the last decade, AutoML has developed into anindependent research field with hundreds of contributions. At the same time,AutoML is being criticised for its high resource consumption as many approachesrely on the (costly) evaluation of many machine learning pipelines, as well asthe expensive large scale experiments across many datasets and approaches. Inthe spirit of recent work on Green AI, this paper proposes Green AutoML, aparadigm to make the whole AutoML process more environmentally friendly.Therefore, we first elaborate on how to quantify the environmental footprint ofan AutoML tool. Afterward, different strategies on how to design and benchmarkan AutoML tool wrt. their \"greenness\", i.e. sustainability, are summarized.Finally, we elaborate on how to be transparent about the environmentalfootprint and what kind of research incentives could direct the community intoa more sustainable AutoML research direction. Additionally, we propose asustainability checklist to be attached to every AutoML paper featuring allcore aspects of Green AutoML.", "output": "Towards Green Automated Machine Learning: Status Quo and Future Directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The quasiparticle effective mass $m^ast$ of interacting electrons is afundamental quantity in the Fermi liquid theory. However, the precise value ofthe effective mass of uniform electron gas is still elusive after decades ofresearch. The newly developed neural canonical transformation approach [Xie etal., J. Mach. Learn. 1, (2022)] offers a principled way to extract theeffective mass of electron gas by directly calculating the thermal entropy atlow temperature. The approach models a variational many-electron density matrixusing two generative neural networks: an autoregressive model for momentumoccupation and a normalizing flow for electron coordinates. Our calculationreveals a suppression of effective mass in the two-dimensional spin-polarizedelectron gas, which is more pronounced than previous reports in the low-densitystrong-coupling region. This prediction calls for verification intwo-dimensional electron gas experiments.", "output": "$m^\\ast$ of two-dimensional electron gas: a neural canonical transformation study."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Cognitive Ledger Project is an effort to develop a modular system forturning users' personal data into structured information and machine learningmodels based on a blockchain-based infrastructure. In this work-in-progresspaper, we propose a cognitive architecture for cognitive digital twins. Thesuggested design embraces a cognitive blockchain (Cognitive ledger) at itscore. The architecture includes several modules that turn users' activities inthe digital environment into reusable knowledge objects and artificialintelligence that one day can work together to form the cognitive digital twinof users.", "output": "Cognitive Ledger Project: Towards Building Personal Digital Twins Through Cognitive Blockchain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the non-convex optimization landscape, over-parametrized shallownetworks are able to achieve global convergence under gradient descent. Thepicture can be radically different for narrow networks, which tend to get stuckin badly-generalizing local minima. Here we investigate the cross-over betweenthese two regimes in the high-dimensional setting, and in particularinvestigate the connection between the so-called mean-field/hydrodynamic regimeand the seminal approach of Saad &amp; Solla. Focusing on the case of Gaussiandata, we study the interplay between the learning rate, the time scale, and thenumber of hidden units in the high-dimensional dynamics of stochastic gradientdescent (SGD). Our work builds on a deterministic description of SGD inhigh-dimensions from statistical physics, which we extend and for which weprovide rigorous convergence rates.", "output": "Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FedL) has emerged as a popular technique for distributingmodel training over a set of wireless devices, via iterative local updates (atdevices) and global aggregations (at the server). In this paper, we developparallel successive learning (PSL), which expands the FedL architecture alongthree dimensions: (i) Network, allowing decentralized cooperation among thedevices via device-to-device (D2D) communications. (ii) Heterogeneity,interpreted at three levels: (ii-a) Learning: PSL considers heterogeneousnumber of stochastic gradient descent iterations with different mini-batchsizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with dataarrival and departure, where the distributions of local datasets evolve overtime, captured via a new metric for model/concept drift. (ii-c) Device: PSLconsiders devices with different computation and communication capabilities.(iii) Proximity, where devices have different distances to each other and theaccess point. PSL considers the realistic scenario where global aggregationsare conducted with idle times in-between them for resource efficiencyimprovements, and incorporates data dispersion and model dispersion with localmodel condensation into FedL. Our analysis sheds light on the notion of coldvs. warmed up models, and model inertia in distributed machine learning. Wethen propose network-aware dynamic model tracking to optimize the modellearning vs. resource efficiency tradeoff, which we show is an NP-hardsignomial programming problem. We finally solve this problem through proposinga general optimization solver. Our numerical results reveal new findings on theinterdependencies between the idle times in-between the global aggregations,model/concept drift, and D2D cooperation configuration.", "output": "Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Suppose we are given two datasets: a labeled dataset and unlabeled datasetwhich also has additional auxiliary features not present in the first dataset.What is the most principled way to use these datasets together to construct apredictor?The answer should depend upon whether these datasets are generated by thesame or different distributions over their mutual feature sets, and how similarthe test distribution will be to either of those distributions. In manyapplications, the two datasets will likely follow different distributions, butboth may be close to the test distribution. We introduce the problem ofbuilding a predictor which minimizes the maximum loss over all probabilitydistributions over the original features, auxiliary features, and binarylabels, whose Wasserstein distance is $r_1$ away from the empiricaldistribution over the labeled dataset and $r_2$ away from that of the unlabeleddataset. This can be thought of as a generalization of distributionally robustoptimization (DRO), which allows for two data sources, one of which isunlabeled and may contain auxiliary features.", "output": "Distributionally Robust Data Join."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual scenes are composed of visual concepts and have the property ofcombinatorial explosion. An important reason for humans to efficiently learnfrom diverse visual scenes is the ability of compositional perception, and itis desirable for artificial intelligence to have similar abilities.Compositional scene representation learning is a task that enables suchabilities. In recent years, various methods have been proposed to apply deepneural networks, which have been proven to be advantageous in representationlearning, to learn compositional scene representations via reconstruction,advancing this research direction into the deep learning era. Learning viareconstruction is advantageous because it may utilize massive unlabeled dataand avoid costly and laborious data annotation. In this survey, we firstoutline the current progress on reconstruction-based compositional scenerepresentation learning with deep neural networks, including developmenthistory and categorizations of existing methods from the perspectives of themodeling of visual scenes and the inference of scene representations; thenprovide benchmarks, including an open source toolbox to reproduce the benchmarkexperiments, of representative methods that consider the most extensivelystudied problem setting and form the foundation for other methods; and finallydiscuss the limitations of existing methods and future directions of thisresearch topic.", "output": "Compositional Scene Representation Learning via Reconstruction: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the loss landscape of training problems for deep artificial neuralnetworks with a one-dimensional real output whose activation functions containan affine segment and whose hidden layers have width at least two. It is shownthat such problems possess a continuum of spurious (i.e., not globally optimal)local minima for all target functions that are not affine. In contrast toprevious works, our analysis covers all sampling and parameterization regimes,general differentiable loss functions, arbitrary continuous nonpolynomialactivation functions, and both the finite- and infinite-dimensional setting. Itis further shown that the appearance of the spurious local minima in theconsidered training problems is a direct consequence of the universalapproximation theorem and that the underlying mechanisms also cause, e.g.,$L^p$-best approximation problems to be ill-posed in the sense of Hadamard forall networks that do not have a dense image. The latter result also holdswithout the assumption of local affine linearity and without any conditions onthe hidden layers.", "output": "On the Omnipresence of Spurious Local Minima in Certain Neural Network Training Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing demand for deep learning models on mobile devices,splitting neural network computation between the device and a more powerfuledge server has become an attractive solution. However, existing splitcomputing approaches often underperform compared to a naive baseline of remotecomputation on compressed data. Recent studies propose learning compressedrepresentations that contain more relevant information for superviseddownstream tasks, showing improved tradeoffs between compressed data size andsupervised performance. However, existing evaluation metrics only provide anincomplete picture of split computing. This study introduces supervisedcompression for split computing (SC2) and proposes new evaluation criteria:minimizing computation on the mobile device, minimizing transmitted data size,and maximizing model accuracy. We conduct a comprehensive benchmark study using10 baseline methods, three computer vision tasks, and over 180 trained models,and discuss various aspects of SC2. We also release sc2bench, a Python packagefor future research on SC2. Our proposed metrics and package will helpresearchers better understand the tradeoffs of supervised compression in splitcomputing.", "output": "SC2 Benchmark: Supervised Compression for Split Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "State-of-the-art deep Q-learning methods update Q-values using statetransition tuples sampled from the experience replay buffer. This strategyoften uniformly and randomly samples or prioritizes data sampling based onmeasures such as the temporal difference (TD) error. Such sampling strategiescan be inefficient at learning Q-function because a state's Q-value depends onthe Q-value of successor states. If the data sampling strategy ignores theprecision of the Q-value estimate of the next state, it can lead to useless andoften incorrect updates to the Q-values. To mitigate this issue, we organizethe agent's experience into a graph that explicitly tracks the dependencybetween Q-values of states. Each edge in the graph represents a transitionbetween two states by executing a single action. We perform value backups via abreadth-first search starting from that expands vertices in the graph startingfrom the set of terminal states and successively moving backward. Weempirically show that our method is substantially more data-efficient thanseveral baselines on a diverse range of goal-reaching tasks. Notably, theproposed method also outperforms baselines that consume more batches oftraining experience and operates from high-dimensional observational data suchas images.", "output": "Topological Experience Replay."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine Learning with Deep Neural Networks (DNNs) has become a successfultool in solving tasks across various fields of application. However, thecomplexity of DNNs makes it difficult to understand how they solve theirlearned task. To improve the explainability of DNNs, we adapt methods fromneuroscience that analyze complex and opaque systems. Here, we draw inspirationfrom how neuroscience uses topographic maps to visualize brain activity. Toalso visualize activations of neurons in DNNs as topographic maps, we researchtechniques to layout the neurons in a two-dimensional space such that neuronsof similar activity are in the vicinity of each other. In this work, weintroduce and compare methods to obtain a topographic layout of neurons in aDNN layer. Moreover, we demonstrate how to use topographic activation maps toidentify errors or encoded biases and to visualize training processes. Ournovel visualization technique improves the transparency of DNN-baseddecision-making systems and is interpretable without expert knowledge inMachine Learning.", "output": "Visualizing Deep Neural Networks with Topographic Activation Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sepsis is a life-threatening condition with organ dysfunction and is aleading cause of death and critical illness worldwide. Even a few hours ofdelay in the treatment of sepsis results in increased mortality. Earlydetection of sepsis during emergency department triage would allow earlyinitiation of lab analysis, antibiotic administration, and other sepsistreatment protocols. The purpose of this study was to compare sepsis detectionperformance at ED triage (prior to the use of laboratory diagnostics) of thestandard sepsis screening algorithm (SIRS with source of infection) and amachine learning algorithm trained on EHR triage data. A machine learning model(KATE Sepsis) was developed using patient encounters with triage data from16participating hospitals. KATE Sepsis and standard screening wereretrospectively evaluated on the adult population of 512,949 medical records.KATE Sepsis demonstrates an AUC of 0.9423 (0.9401 - 0.9441) with sensitivity of71.09% (70.12% - 71.98%) and specificity of 94.81% (94.75% - 94.87%). Standardscreening demonstrates an AUC of 0.6826 (0.6774 - 0.6878) with sensitivity of40.8% (39.71% - 41.86%) and specificity of 95.72% (95.68% - 95.78%). The KATESepsis model trained to detect sepsis demonstrates 77.67% (75.78% -79.42%)sensitivity in detecting severe sepsis and 86.95% (84.2% - 88.81%) sensitivityin detecting septic shock. The standard screening protocol demonstrates 43.06%(41% - 45.87%) sensitivity in detecting severe sepsis and40% (36.55% - 43.26%)sensitivity in detecting septic shock. Future research should focus on theprospective impact of KATE Sepsis on administration of antibiotics, readmissionrate, morbidity and mortality.", "output": "Detection of sepsis during emergency department triage using machine learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The dominant framework for off-policy multi-goal reinforcement learninginvolves estimating goal conditioned Q-value function. When learning to achievemultiple goals, data efficiency is intimately connected with the generalizationof the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a,g) using monolithic neural networks. To improve the generalization of theQ-function, we propose a bilinear decomposition that represents the Q-value viaa low-rank approximation in the form of a dot product between two vectorfields. The first vector field, f(s, a), captures the environment's localdynamics at the state s; whereas the second component, {phi}(s, g), capturesthe global relationship between the current state and the goal. We show thatour bilinear decomposition scheme substantially improves data efficiency, andhas superior transfer to out-of-distribution goals compared to prior methods.Empirical evidence is provided on the simulated Fetch robot task-suite anddexterous manipulation with a Shadow hand.", "output": "Bilinear value networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The transition to a fully renewable energy grid requires better forecastingof demand at the low-voltage level to increase efficiency and ensure reliablecontrol. However, high fluctuations and increasing electrification cause hugeforecast variability, not reflected in traditional point estimates.Probabilistic load forecasts take future uncertainties into account and thusallow more informed decision-making for the planning and operation oflow-carbon energy systems. We propose an approach for flexible conditionaldensity forecasting of short-term load based on Bernstein polynomialnormalizing flows, where a neural network controls the parameters of the flow.In an empirical study with 363 smart meter customers, our density predictionscompare favorably against Gaussian and Gaussian mixture densities. Also, theyoutperform a non-parametric approach based on the pinball loss for 24h-aheadload forecasting for two different neural network architectures.", "output": "Short-Term Density Forecasting of Low-Voltage Load using Bernstein-Polynomial Normalizing Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated scoring of open-ended student responses has the potential tosignificantly reduce human grader effort. Recent advances in automated scoringoften leverage textual representations based on pre-trained language modelssuch as BERT and GPT as input to scoring models. Most existing approaches traina separate model for each item/question, which is suitable for scenarios suchas essay scoring where items can be quite different from one another. However,these approaches have two limitations: 1) they fail to leverage item linkagefor scenarios such as reading comprehension where multiple items may share areading passage; 2) they are not scalable since storing one model per itembecomes difficult when models have a large number of parameters. In this paper,we report our (grand prize-winning) solution to the National Assessment ofEducation Progress (NAEP) automated scoring challenge for readingcomprehension. Our approach, in-context BERT fine-tuning, produces a singleshared scoring model for all items with a carefully-designed input structure toprovide contextual information on each item. We demonstrate the effectivenessof our approach via local evaluations using the training dataset provided bythe challenge. We also discuss the biases, common error types, and limitationsof our approach.", "output": "Automated Scoring for Reading Comprehension via In-context BERT Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, neural models have been leveraged to significantly improve theperformance of information extraction from semi-structured websites. However, abarrier for continued progress is the small number of datasets large enough totrain these models. In this work, we introduce the PLAtE (Pages of ListsAttribute Extraction) benchmark dataset as a challenging new web extractiontask. PLAtE focuses on shopping data, specifically extractions from productreview pages with multiple items encompassing the tasks of: (1) findingproduct-list segmentation boundaries and (2) extracting attributes for eachproduct. PLAtE is composed of 52, 898 items collected from 6, 694 pages and156, 014 attributes, making it the first largescale list page web extractiondataset. We use a multi-stage approach to collect and annotate the dataset andadapt three state-of-the-art web extraction models to the two tasks comparingtheir strengths and weaknesses both quantitatively and qualitatively.", "output": "PLAtE: A Large-scale Dataset for List Page Web Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Due to the popularity of Graph Neural Networks (GNNs), various GNN-basedmethods have been designed to reason on knowledge graphs (KGs). An importantdesign component of GNN-based KG reasoning methods is called the propagationpath, which contains a set of involved entities in each propagation step.Existing methods use hand-designed propagation paths, ignoring the correlationbetween the entities and the query relation. In addition, the number ofinvolved entities will explosively grow at larger propagation steps. In thiswork, we are motivated to learn an adaptive propagation path in order to filterout irrelevant entities while preserving promising targets. First, we design anincremental sampling mechanism where the nearby targets and layer-wiseconnections can be preserved with linear complexity. Second, we design alearning-based sampling distribution to identify the semantically relatedentities. Extensive experiments show that our method is powerful, efficient,and semantic-aware. The code is available at", "output": "AdaProp: Learning Adaptive Propagation for Graph Neural Network based Knowledge Graph Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The logit outputs of a feedforward neural network at initialization areconditionally Gaussian, given a random covariance matrix defined by thepenultimate layer. In this work, we study the distribution of this randommatrix. Recent work has shown that shaping the activation function as networkdepth grows large is necessary for this covariance matrix to be non-degenerate.However, the current infinite-width-style understanding of this shaping methodis unsatisfactory for large depth: infinite-width analyses ignore themicroscopic fluctuations from layer to layer, but these fluctuations accumulateover many layers.To overcome this shortcoming, we study the random covariance matrix in theshaped infinite-depth-and-width limit. We identify the precise scaling of theactivation function necessary to arrive at a non-trivial limit, and show thatthe random covariance matrix is governed by a stochastic differential equation(SDE) that we call the Neural Covariance SDE. Using simulations, we show thatthe SDE closely matches the distribution of the random covariance matrix offinite networks. Additionally, we recover an if-and-only-if condition forexploding and vanishing norms of large shaped networks based on the activationfunction.", "output": "The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper considers subject level privacy in the FL setting, where a subjectis an individual whose private information is embodied by several data itemseither confined within a single federation user or distributed across multiplefederation users. We propose two new algorithms that enforce subject level DPat each federation user locally. Our first algorithm, called LocalGroupDP, is astraightforward application of group differential privacy in the popular DP-SGDalgorithm. Our second algorithm is based on a novel idea of hierarchicalgradient averaging (HiGradAvgDP) for subjects participating in a trainingmini-batch. We also show that user level Local Differential Privacy (LDP)naturally guarantees subject level DP. We observe the problem of horizontalcomposition of subject level privacy loss in FL - subject level privacy lossincurred at individual users composes across the federation. We formally provethe subject level DP guarantee for our algorithms, and also show their effecton model utility loss. Our empirical evaluation on FEMNIST and Shakespearedatasets shows that LocalGroupDP delivers the best performance among ouralgorithms. However, its model utility lags behind that of models trained usinga DP-SGD based algorithm that provides a weaker item level privacy guarantee.Privacy loss amplification due to subject sampling fractions and horizontalcomposition remain key challenges for model utility.", "output": "Subject Granular Differential Privacy in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A final exam in machine learning at a top institution such as MIT, Harvard,or Cornell typically takes faculty days to write, and students hours to solve.We demonstrate that large language models pass machine learning finals at ahuman level on a corpus drawn from MIT, Harvard, and Cornell and automaticallygenerate new human-quality final exam questions in seconds. Previous work hasdeveloped program synthesis and few-shot learning methods to solveuniversity-level problem set questions in mathematics and STEM courses. In thiswork, we develop and compare methods that solve final exams, which differ fromproblem sets in several ways: the questions are longer, have multiple parts,are more complicated, and span a broader set of topics. We provide a newdataset and benchmark of questions from machine learning final exams and codefor answering these questions and generating new questions. We show how togenerate new questions from other questions and course notes. We evaluate alarge open language model, Meta's OPT, and compare the results with OpenAI'sclosed models. A student survey comparing the quality, appropriateness, anddifficulty of machine-generated questions with human-written questions showsthat across multiple aspects, machine-generated questions are indistinguishablefrom human-generated questions and are suitable for final exams. We performablation studies comparing zero-shot learning with few-shot learning andchain-of-thought prompting using GPT-3, OPT, Codex, and ChatGPT across machinelearning topics and find that few-shot learning methods perform best. Wehighlight the transformative potential of language models to streamline thewriting and solution of large-scale assessments, significantly reducing theworkload from human days to machine seconds.", "output": "From Human Days to Machine Seconds: Automatically Answering and Generating Machine Learning Final Exams."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep convolutional autoencoders provide an effective tool for learningnon-linear dimensionality reduction in an unsupervised way. Recently, they havebeen used for the task of anomaly detection in the visual domain. By optimisingfor the reconstruction error using anomaly-free examples, the common belief isthat a corresponding network should fail to accurately reconstruct anomalousregions in the application phase. This goal is typically addressed bycontrolling the capacity of the network by either reducing the size of thebottleneck layer or enforcing sparsity constraints on its activations. However,neither of these techniques does explicitly penalize reconstruction ofanomalous signals often resulting in poor detection. We tackle this problem byadapting a self-supervised learning regime, which allows to use discriminativeinformation during training focusing on the data manifold by means of amodified reconstruction error. This regularizes the model to produce locallyconsistent reconstructions, while replacing irregularities by acting as afilter for anomalous patterns. In contrast to related approaches, inferencewith our method is very efficient during training and prediction processing theentire input image in one single step. Our experiments on the MVTec AD datasetdemonstrate high recognition and localization performance of the proposedmethod. On the texture-subset, in particular, our approach consistentlyoutperforms a bunch of recent anomaly detection methods by a big margin.", "output": "Self-Supervised Training with Autoencoders for Visual Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Besides standard cameras, autonomous vehicles typically include multipleadditional sensors, such as lidars and radars, which help acquire richerinformation for perceiving the content of the driving scene. While severalrecent works focus on fusing certain pairs of sensors - such as camera withlidar or radar - by using architectural components specific to the examinedsetting, a generic and modular sensor fusion architecture is missing from theliterature. In this work, we propose HRFuser, a modular architecture formulti-modal 2D object detection. It fuses multiple sensors in amulti-resolution fashion and scales to an arbitrary number of input modalities.The design of HRFuser is based on state-of-the-art high-resolution networks forimage-only dense prediction and incorporates a novel multi-windowcross-attention block as the means to perform fusion of multiple modalities atmultiple resolutions. We demonstrate via extensive experiments on nuScenes andthe adverse conditions DENSE datasets that our model effectively leveragescomplementary features from additional modalities, substantially improving uponcamera-only performance and consistently outperforming state-of-the-art 3D and2D fusion methods evaluated on 2D object detection metrics. The source code ispublicly available.", "output": "HRFuser: A Multi-resolution Sensor Fusion Architecture for 2D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Novel test selectors used in simulation-based verification have been shown tosignificantly accelerate coverage closure regardless of the number of coverageholes. This paper presents a configurable and highly-automated framework fornovel test selection based on neural networks. Three configurations of thisframework are tested with a commercial signal processing unit. All threeconvincingly outperform random test selection with the largest saving ofsimulation being 49.37% to reach 99.5% coverage. The computational expense ofthe configurations is negligible compared to the simulation reduction. Wecompare the experimental results and discuss important characteristics relatedto the performance of the configurations.", "output": "Using Neural Networks for Novelty-based Test Selection to Accelerate Functional Coverage Closure."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the introduction of machine learning in high-stakes decision making,ensuring algorithmic fairness has become an increasingly important problem tosolve. In response to this, many mathematical definitions of fairness have beenproposed, and a variety of optimisation techniques have been developed, alldesigned to maximise a defined notion of fairness. However, fair solutions arereliant on the quality of the training data, and can be highly sensitive tonoise. Recent studies have shown that robustness (the ability for a model toperform well on unseen data) plays a significant role in the type of strategythat should be used when approaching a new problem and, hence, measuring therobustness of these strategies has become a fundamental problem. In this work,we therefore propose a new criterion to measure the robustness of variousfairness optimisation strategies - the robustness ratio. We conduct multipleextensive experiments on five bench mark fairness data sets using three of themost popular fairness strategies with respect to four of the most populardefinitions of fairness. Our experiments empirically show that fairness methodsthat rely on threshold optimisation are very sensitive to noise in all theevaluated data sets, despite mostly outperforming other methods. This is incontrast to the other two methods, which are less fair for low noise scenariosbut fairer for high noise ones. To the best of our knowledge, we are the firstto quantitatively evaluate the robustness of fairness optimisation strategies.This can potentially can serve as a guideline in choosing the most suitablefairness strategy for various data sets.", "output": "How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Increasing the size of overparameterized neural networks has been a key inachieving state-of-the-art performance. This is captured by the double descentphenomenon, where the test loss follows a decreasing-increasing-decreasingpattern as model width increases. However, the effect of label noise on thetest loss curve has not been fully explored. In this work, we uncover anintriguing phenomenon where label noise leads to a textit{final ascent} in theoriginally observed double descent curve. Specifically, under a sufficientlylarge noise-to-sample-size ratio, optimal generalization is achieved atintermediate widths. Through theoretical analysis, we attribute this phenomenonto the shape transition of test loss variance induced by label noise.Furthermore, we extend the final ascent phenomenon to model density and providethe first theoretical characterization showing that reducing density byrandomly dropping trainable parameters improves generalization under labelnoise. We also thoroughly examine the roles of regularization and sample size.Surprisingly, we find that larger $ell_2$ regularization and robust learningmethods against label noise exacerbate the final ascent. We confirm thevalidity of our findings through extensive experiments on ReLu networks trainedon MNIST, ResNets trained on CIFAR-10/100, and InceptionResNet-v2 trained onStanford Cars with real-world noisy labels.", "output": "Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rate of heart morbidity and heart mortality increases significantly whichaffect the global public health and world economy. Early prediction of heartdisease is crucial for reducing heart morbidity and mortality. This paperproposes two quantum machine learning methods i.e. hybrid quantum neuralnetwork and hybrid random forest quantum neural network for early detection ofheart disease. The methods are applied on the Cleveland and Statlog datasets.The results show that hybrid quantum neural network and hybrid random forestquantum neural network are suitable for high dimensional and low dimensionalproblems respectively. The hybrid quantum neural network is sensitive tooutlier data while hybrid random forest is robust on outlier data. A comparisonbetween different machine learning methods shows that the proposed quantummethods are more appropriate for early heart disease prediction where 96.43%and 97.78% area under curve are obtained for Cleveland and Statlog datasetrespectively.", "output": "Early heart disease prediction using hybrid quantum classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In reinforcement learning (RL), adversarial policies can be developed bytraining an adversarial agent to minimize a target agent's rewards. Prior workhas studied black-box versions of these attacks where the adversary onlyobserves the world state and treats the target agent as any other part of theenvironment. However, this does not take into account additional structure inthe problem. In this work, we take inspiration from the literature on white-boxattacks to train more effective adversarial policies. We study white-boxadversarial policies and show that having access to a target agent's internalstate can be useful for identifying its vulnerabilities. We make twocontributions. (1) We introduce white-box adversarial policies where anattacker observes both a target's internal state and the world state at eachtimestep. We formulate ways of using these policies to attack agents in2-player games and text-generating language models. (2) We demonstrate thatthese policies can achieve higher initial and asymptotic performance against atarget agent than black-box controls. Code is available at", "output": "White-Box Adversarial Policies in Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A treatment policy defines when and what treatments are applied to affectsome outcome of interest. Data-driven decision-making requires the ability topredict what happens if a policy is changed. Existing methods that predict howthe outcome evolves under different scenarios assume that the tentativesequences of future treatments are fixed in advance, while in practice thetreatments are determined stochastically by a policy and may depend, forexample, on the efficiency of previous treatments. Therefore, the currentmethods are not applicable if the treatment policy is unknown or acounterfactual analysis is needed. To handle these limitations, we model thetreatments and outcomes jointly in continuous time, by combining Gaussianprocesses and point processes. Our model enables the estimation of a treatmentpolicy from observational sequences of treatments and outcomes, and it canpredict the interventional and counterfactual progression of the outcome afteran intervention on the treatment policy (in contrast with the causal effect ofa single treatment). We show with real-world and semi-synthetic data on bloodglucose progression that our method can answer causal queries more accuratelythan existing alternatives.", "output": "Causal Modeling of Policy Interventions From Sequences of Treatments and Outcomes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A variety of different performance metrics are commonly used in the machinelearning literature for the evaluation of classification systems. Some of themost common ones for measuring quality of hard decisions are standard andbalanced accuracy, standard and balanced error rate, F-beta score, and Matthewscorrelation coefficient (MCC). In this document, we review the definition ofthese and other metrics and compare them with the expected cost (EC), a metricintroduced in every statistical learning course but rarely used in the machinelearning literature. We show that both the standard and balanced error ratesare special cases of the EC. Further, we show its relation with F-score and MCCand argue that EC is superior to these traditional metrics, being more elegant,general, and intuitive, as well as being based on basic principles fromstatistics.The metrics above measure the quality of hard decisions. Yet, most modernclassification systems output continuous scores for the classes which we maywant to evaluate directly. Metrics for measuring the quality of system scoresinclude the area under the ROC curve, equal error rate, cross-entropy, Brierscore, and Bayes EC or Bayes risk, among others. The last three metrics arespecial cases of a family of metrics given by the expected value of properscoring rules (PSRs). We review the theory behind these metrics and argue thatthey are the most principled way to measure the quality of the posteriorprobabilities produced by a system. Finally, we show how to use these metricsto compute the system's calibration loss and compare this metric with thestandard expected calibration error (ECE), arguing that calibration loss basedon PSRs is superior to the ECE for a variety of reasons.", "output": "Analysis and Comparison of Classification Metrics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The manifold hypothesis, which assumes that data lies on or close to anunknown manifold of low intrinsic dimension, is a staple of modern machinelearning research. However, recent work has shown that real-world data exhibitsdistinct non-manifold structures, i.e. singularities, that can lead toerroneous findings. Detecting such singularities is therefore crucial as aprecursor to interpolation and inference tasks. We address this issue bydeveloping a topological framework that (i) quantifies the local intrinsicdimension, and (ii) yields a Euclidicity score for assessing the 'manifoldness'of a point along multiple scales. Our approach identifies singularities ofcomplex spaces, while also capturing singular structures and local geometriccomplexity in image data.", "output": "Topological Singularity Detection at Multiple Scales."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-world deployment of machine learning models is challenging because dataevolves over time. While no model can work when data evolves in an arbitraryfashion, if there is some pattern to these changes, we might be able to designmethods to address it. This paper addresses situations when data evolvesgradually. We introduce a time-varying propensity score that can detect gradualshifts in the distribution of data which allows us to selectively sample pastdata to update the model -- not just similar data from the past like that of astandard propensity score but also data that evolved in a similar fashion inthe past. The time-varying propensity score is quite general: we demonstratedifferent ways of implementing it and evaluate it on a variety of problemsranging from supervised learning (e.g., image classification problems) wheredata undergoes a sequence of gradual shifts, to reinforcement learning tasks(e.g., robotic manipulation and continuous control) where data shifts as thepolicy or the task changes.", "output": "Learning under Data Drift with Time-Varying Importance Weights."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Reinforcement Learning (RL) has emerged as a powerful paradigm fortraining neural policies to solve complex control tasks. However, thesepolicies tend to be overfit to the exact specifications of the task andenvironment they were trained on, and thus do not perform well when conditionsdeviate slightly or when composed hierarchically to solve even more complextasks. Recent work has shown that training a mixture of policies, as opposed toa single one, that are driven to explore different regions of the state-actionspace can address this shortcoming by generating a diverse set of behaviors,referred to as skills, that can be collectively used to great effect inadaptation tasks or for hierarchical planning. This is typically realized byincluding a diversity term - often derived from information theory - in theobjective function optimized by RL. However these approaches often requirecareful hyperparameter tuning to be effective. In this work, we demonstratethat less widely-used neuroevolution methods, specifically Quality Diversity(QD), are a competitive alternative to information-theory-augmented RL forskill discovery. Through an extensive empirical evaluation comparing eightstate-of-the-art algorithms (four flagship algorithms from each line of work)on the basis of (i) metrics directly evaluating the skills' diversity, (ii) theskills' performance on adaptation tasks, and (iii) the skills' performance whenused as primitives for hierarchical planning; QD methods are found to provideequal, and sometimes improved, performance whilst being less sensitive tohyperparameters and more scalable. As no single method is found to providenear-optimal performance across all environments, there is a rich scope forfurther research which we support by proposing future directions and providingoptimized open-source implementations.", "output": "Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal representation learning has shown promising improvements onvarious vision-language tasks. Most existing methods excel at buildingglobal-level alignment between vision and language while lacking effectivefine-grained image-text interaction. In this paper, we propose a jointly maskedmultimodal modeling method to learn fine-grained multimodal representations.Our method performs joint masking on image-text input and integrates bothimplicit and explicit targets for the masked signals to recover. The implicittarget provides a unified and debiased objective for vision and language, wherethe model predicts latent multimodal representations of the unmasked input. Theexplicit target further enriches the multimodal representations by recoveringhigh-level and semantically meaningful information: momentum visual features ofimage patches and concepts of word tokens. Through such a masked modelingprocess, our model not only learns fine-grained multimodal interaction, butalso avoids the semantic gap between high-level representations and low- ormid-level prediction targets (e.g. image pixels), thus producing semanticallyrich multimodal representations that perform well on both zero-shot andfine-tuned settings. Our pre-trained model (named MAMO) achievesstate-of-the-art performance on various downstream vision-language tasks,including image-text retrieval, visual question answering, visual reasoning,and weakly-supervised visual grounding.", "output": "MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Score-based generative models (SGMs) learn a family of noise-conditionalscore functions corresponding to the data density perturbed with increasinglylarge amounts of noise. These perturbed data densities are linked together bythe Fokker-Planck equation (FPE), a partial differential equation (PDE)governing the spatial-temporal evolution of a density undergoing a diffusionprocess. In this work, we derive a corresponding equation called the score FPEthat characterizes the noise-conditional scores of the perturbed data densities(i.e., their gradients). Surprisingly, despite the impressive empiricalperformance, we observe that scores learned through denoising score matching(DSM) fail to fulfill the underlying score FPE, which is an inherentself-consistency property of the ground truth score. We prove that satisfyingthe score FPE is desirable as it improves the likelihood and the degree ofconservativity. Hence, we propose to regularize the DSM objective to enforcesatisfaction of the score FPE, and we show the effectiveness of this approachacross various datasets.", "output": "FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "CORL is an open-source library that provides thoroughly benchmarkedsingle-file implementations of both deep offline and offline-to-onlinereinforcement learning algorithms. It emphasizes a simple developing experiencewith a straightforward codebase and a modern analysis tracking tool. In CORL,we isolate methods implementation into separate single files, makingperformance-relevant details easier to recognize. Additionally, an experimenttracking feature is available to help log metrics, hyperparameters,dependencies, and more to the cloud. Finally, we have ensured the reliabilityof the implementations by benchmarking commonly employed D4RL datasetsproviding a transparent source of results that can be reused for robustevaluation tools such as performance profiles, probability of improvement, orexpected online performance.", "output": "CORL: Research-oriented Deep Offline Reinforcement Learning Library."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We provide an exact characterization of the expected generalization error(gen-error) for semi-supervised learning (SSL) with pseudo-labeling via theGibbs algorithm. The gen-error is expressed in terms of the symmetrized KLinformation between the output hypothesis, the pseudo-labeled dataset, and thelabeled dataset. Distribution-free upper and lower bounds on the gen-error canalso be obtained. Our findings offer new insights that the generalizationperformance of SSL with pseudo-labeling is affected not only by the informationbetween the output hypothesis and input training data but also by theinformation {em shared} between the {em labeled} and {em pseudo-labeled}data samples. This serves as a guideline to choose an appropriatepseudo-labeling method from a given family of methods. To deepen ourunderstanding, we further explore two examples -- mean estimation and logisticregression. In particular, we analyze how the ratio of the number of unlabeledto labeled data $lambda$ affects the gen-error under both scenarios. As$lambda$ increases, the gen-error for mean estimation decreases and thensaturates at a value larger than when all the samples are labeled, and the gapcan be quantified {em exactly} with our analysis, and is dependent on theemph{cross-covariance} between the labeled and pseudo-labeled data samples.For logistic regression, the gen-error and the variance component of the excessrisk also decrease as $lambda$ increases.", "output": "How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a new perspective on time series forecasting. In existingtime series forecasting methods, the models take a sequence of numerical valuesas input and yield numerical values as output. The existing SOTA models arelargely based on the Transformer architecture, modified with multiple encodingmechanisms to incorporate the context and semantics around the historical data.Inspired by the successes of pre-trained language foundation models, we pose aquestion about whether these models can also be adapted to solve time-seriesforecasting. Thus, we propose a new forecasting paradigm: prompt-based timeseries forecasting (PromptCast). In this novel task, the numerical input andoutput are transformed into prompts and the forecasting task is framed in asentence-to-sentence manner, making it possible to directly apply languagemodels for forecasting purposes. To support and facilitate the research of thistask, we also present a large-scale dataset (PISA) that includes threereal-world forecasting scenarios. We evaluate different SOTA numerical-basedforecasting methods and language generation models. The benchmark results withvarious forecasting settings demonstrate the proposed PromptCast with languagegeneration models is a promising research direction. Additionally, incomparison to conventional numerical-based forecasting, PromptCast shows a muchbetter generalization ability under the zero-shot setting.", "output": "PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In Multi-Task Learning (MTL), tasks may compete and limit the performanceachieved on each other, rather than guiding the optimization to a solution,superior to all its single-task trained counterparts. Since there is often nota unique solution optimal for all tasks, practitioners have to balancetradeoffs between tasks' performance, and resort to optimality in the Paretosense. Most MTL methodologies either completely neglect this aspect, andinstead of aiming at learning a Pareto Front, produce one solution predefinedby their optimization schemes, or produce diverse but discrete solutions.Recent approaches parameterize the Pareto Front via neural networks, leading tocomplex mappings from tradeoff to objective space. In this paper, we conjecturethat the Pareto Front admits a linear parameterization in parameter space,which leads us to propose textit{Pareto Manifold Learning}, an ensemblingmethod in weight space. Our approach produces a continuous Pareto Front in asingle training run, that allows to modulate the performance on each taskduring inference. Experiments on multi-task learning benchmarks, ranging fromimage classification to tabular datasets and scene understanding, show thattextit{Pareto Manifold Learning} outperforms state-of-the-art single-pointalgorithms, while learning a better Pareto parameterization than multi-pointbaselines.", "output": "Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given a user's input text, text-matching recommender systems output relevantitems by comparing the input text to available items' description, such asproduct-to-product recommendation on e-commerce platforms. As users' interestsand item inventory are expected to change, it is important for a text-matchingsystem to generalize to data shifts, a task known as out-of-distribution (OOD)generalization. However, we find that the popular approach of fine-tuning alarge, base language model on paired item relevance data (e.g., user clicks)can be counter-productive for OOD generalization. For a product recommendationtask, fine-tuning obtains worse accuracy than the base model when recommendingitems in a new category or for a future time period. To explain thisgeneralization failure, we consider an intervention-based importance metric,which shows that a fine-tuned model captures spurious correlations and fails tolearn the causal features that determine the relevance between any two textinputs. Moreover, standard methods for causal regularization do not apply inthis setting, because unlike in images, there exist no universally spuriousfeatures in a text-matching task (the same token may be spurious or causaldepending on the text it is being matched to). For OOD generalization on textinputs, therefore, we highlight a different goal: avoiding high importancescores for certain features. We do so using an intervention-based regularizerthat constraints the causal effect of any token on the model's relevance scoreto be similar to the base model. Results on Amazon product and 3 questionrecommendation datasets show that our proposed regularizer improvesgeneralization for both in-distribution and OOD evaluation, especially indifficult scenarios when the base model is not accurate.", "output": "Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Recommendation Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement Learning (RL) algorithms can solve challenging control problemsdirectly from image observations, but they often require millions ofenvironment interactions to do so. Recently, model-based RL algorithms havegreatly improved sample-efficiency by concurrently learning an internal modelof the world, and supplementing real environment interactions with imaginedrollouts for policy improvement. However, learning an effective model of theworld from scratch is challenging, and in stark contrast to humans that relyheavily on world understanding and visual cues for learning new skills. In thiswork, we investigate whether internal models learned by modern model-based RLalgorithms can be leveraged to solve new, distinctly different tasks faster. Wepropose Model-Based Cross-Task Transfer (XTRA), a framework forsample-efficient online RL with scalable pretraining and finetuning of learnedworld models. By offline multi-task pretraining and online cross-taskfinetuning, we achieve substantial improvements over a baseline trained fromscratch; we improve mean performance of model-based algorithm EfficientZero by23%, and by as much as 71% in some instances.", "output": "On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual reinforcement learning aims to sequentially learn a variety oftasks, retaining the ability to perform previously encountered tasks whilesimultaneously developing new policies for novel tasks. However, currentcontinual RL approaches overlook the fact that certain tasks are identicalunder basic group operations like rotations or translations, especially withvisual inputs. They may unnecessarily learn and maintain a new policy for eachsimilar task, leading to poor sample efficiency and weak generalizationcapability. To address this, we introduce a unique Continual Vision-basedReinforcement Learning method that recognizes Group Symmetries, called COVERS,cultivating a policy for each group of equivalent tasks rather than individualtasks. COVERS employs a proximal policy optimization-based RL algorithm with anequivariant feature extractor and a novel task grouping mechanism that relieson the extracted invariant features. We evaluate COVERS on sequences oftable-top manipulation tasks that incorporate image observations and robotproprioceptive information in both simulations and on real robot platforms. Ourresults show that COVERS accurately assigns tasks to their respective groupsand significantly outperforms existing methods in terms of generalizationcapability.", "output": "Continual Vision-based Reinforcement Learning with Group Symmetries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data-driven approaches recently achieved remarkable success in magneticresonance imaging (MRI) reconstruction, but integration into clinical routineremains challenging due to a lack of generalizability and interpretability. Inthis paper, we address these challenges in a unified framework based ongenerative image priors. We propose a novel deep neural network basedregularizer which is trained in a generative setting on reference magnitudeimages only. After training, the regularizer encodes higher-level domainstatistics which we demonstrate by synthesizing images without data. Embeddingthe trained model in a classical variational approach yields high-qualityreconstructions irrespective of the sub-sampling pattern. In addition, themodel shows stable behavior when confronted with out-of-distribution data inthe form of contrast variation. Furthermore, a probabilistic interpretationprovides a distribution of reconstructions and hence allows uncertaintyquantification. To reconstruct parallel MRI, we propose a fast algorithm tojointly estimate the image and the sensitivity maps. The results demonstratecompetitive performance, on par with state-of-the-art end-to-end deep learningmethods, while preserving the flexibility with respect to sub-sampling patternsand allowing for uncertainty quantification.", "output": "Stable Deep MRI Reconstruction using Generative Priors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a smoothly broken power law functional form (that we refer to as aBroken Neural Scaling Law (BNSL)) that accurately models &amp; extrapolates thescaling behaviors of deep neural networks (i.e. how the evaluation metric ofinterest varies as amount of compute used for training (or inference), numberof model parameters, training dataset size, model input size, number oftraining steps, or upstream performance varies) for various architectures &amp; foreach of various tasks within a large &amp; diverse set of upstream &amp; downstreamtasks, in zero-shot, prompted, &amp; finetuned settings. This set includeslarge-scale vision, language, audio, video, diffusion, generative modeling,multimodal learning, contrastive learning, AI alignment, AI capabilities,robotics, out-of-distribution (OOD) generalization, continual learning,transfer learning, uncertainty estimation / calibration, OOD detection,adversarial robustness, distillation, sparsity, retrieval, quantization,pruning, fairness, molecules, computer programming/coding, math word problems,\"emergent phase transitions\", arithmetic, supervised learning,unsupervised/self-supervised learning, &amp; reinforcement learning (single agent &amp;multi-agent). When compared to other functional forms for neural scaling, thisfunctional form yields extrapolations of scaling behavior that are considerablymore accurate on this set. Moreover, this functional form accurately models &amp;extrapolates scaling behavior that other functional forms are incapable ofexpressing such as the nonmonotonic transitions present in the scaling behaviorof phenomena such as double descent &amp; the delayed, sharp inflection pointspresent in the scaling behavior of tasks such as arithmetic. Lastly, we usethis functional form to glean insights about the limit of the predictability ofscaling behavior. Code is available at", "output": "Broken Neural Scaling Laws."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although the security of automatic speaker verification (ASV) is seriouslythreatened by recently emerged adversarial attacks, there have been somecountermeasures to alleviate the threat. However, many defense approaches notonly require the prior knowledge of the attackers but also possess weakinterpretability. To address this issue, in this paper, we propose anattacker-independent and interpretable method, named learnable mask detector(LMD), to separate adversarial examples from the genuine ones. It utilizesscore variation as an indicator to detect adversarial examples, where the scorevariation is the absolute discrepancy between the ASV scores of an originalaudio recording and its transformed audio synthesized from its masked complexspectrogram. A core component of the score variation detector is to generatethe masked spectrogram by a neural network. The neural network needs onlygenuine examples for training, which makes it an attacker-independent approach.Its interpretability lies that the neural network is trained to minimize thescore variation of the targeted ASV, and maximize the number of the maskedspectrogram bins of the genuine training examples. Its foundation is based onthe observation that, masking out the vast majority of the spectrogram binswith little speaker information will inevitably introduce a large scorevariation to the adversarial example, and a small score variation to thegenuine example. Experimental results with 12 attackers and two representativeASV systems show that our proposed method outperforms five state-of-the-artbaselines. The extensive experimental results can also be a benchmark for thedetection-based ASV defenses.", "output": "LMD: A Learnable Mask Network to Detect Adversarial Examples for Speaker Verification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Differentially Private Stochastic Gradient Descent (DP-SGD) is a key methodfor applying privacy in the training of deep learning models. This appliesisotropic Gaussian noise to gradients during training, which can perturb thesegradients in any direction, damaging utility. Metric DP, however, can providealternative mechanisms based on arbitrary metrics that might be more suitablefor preserving utility. In this paper, we apply textit{directional privacy},via a mechanism based on the von Mises-Fisher (VMF) distribution, to perturbgradients in terms of textit{angular distance} so that gradient direction isbroadly preserved. We show that this provides both $epsilon$-DP and $epsilond$-privacy for deep learning training, rather than the $(epsilon,delta)$-privacy of the Gaussian mechanism; we observe that the $epsilond$-privacy guarantee does not require a $delta&gt;0$ term but degrades smoothlyaccording to the dissimilarity of the input gradients.As $epsilon$s between these different frameworks cannot be directlycompared, we examine empirical privacy calibration mechanisms that go beyondprevious work on empirically calibrating privacy within standard DP frameworksusing membership inference attacks (MIA); we show that a combination ofenhanced MIA and reconstruction attacks provides a suitable method for privacycalibration. Experiments on key datasets then indicate that the VMF mechanismcan outperform the Gaussian in the utility-privacy trade-off. In particular,our experiments provide a direct comparison of privacy between the twoapproaches in terms of their ability to defend against reconstruction andmembership inference.", "output": "Directional Privacy for Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Leveraging the fact that speaker identity and content vary on different timescales, acrlong{fhvae} (acrshort{fhvae}) uses different latent variables tosymbolize these two attributes. Disentanglement of these attributes is carriedout by different prior settings of the corresponding latent variables. For theprior of speaker identity variable, acrshort{fhvae} assumes it is a Gaussiandistribution with an utterance-scale varying mean and a fixed variance. Bysetting a small fixed variance, the training process promotes identityvariables within one utterance gathering close to the mean of their prior.However, this constraint is relatively weak, as the mean of the prior changesbetween utterances. Therefore, we introduce contrastive learning into theacrshort{fhvae} framework, to make the speaker identity variables gatheringwhen representing the same speaker, while distancing themselves as far aspossible from those of other speakers. The model structure has not been changedin this work but only the training process, thus no additional cost is neededduring testing. Voice conversion has been chosen as the application in thispaper. Latent variable evaluations include speaker verification andidentification for the speaker identity variable, and speech recognition forthe content variable. Furthermore, assessments of voice conversion performanceare on the grounds of fake speech detection experiments. Results show that theproposed method improves both speaker identity and content feature extractioncompared to acrshort{fhvae}, and has better performance than baseline onconversion.", "output": "Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Predicting the evolution of diseases is challenging, especially when the dataavailability is scarce and incomplete. The most popular tools for modelling andpredicting infectious disease epidemics are compartmental models. They stratifythe population into compartments according to health status and model thedynamics of these compartments using dynamical systems. However, thesepredefined systems may not capture the true dynamics of the epidemic due to thecomplexity of the disease transmission and human interactions. In order toovercome this drawback, we propose Sparsity and Delay Embedding basedForecasting (SPADE4) for predicting epidemics. SPADE4 predicts the futuretrajectory of an observable variable without the knowledge of the othervariables or the underlying system. We use random features model with sparseregression to handle the data scarcity issue and employ Takens' delay embeddingtheorem to capture the nature of the underlying system from the observedvariable. We show that our approach outperforms compartmental models whenapplied to both simulated and real data.", "output": "SPADE4: Sparsity and Delay Embedding based Forecasting of Epidemics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fixed-budget best-arm identification (BAI) is a bandit problem where theagent maximizes the probability of identifying the optimal arm within a fixedbudget of observations. In this work, we study this problem in the Bayesiansetting. We propose a Bayesian elimination algorithm and derive an upper boundon its probability of misidentifying the optimal arm. The bound reflects thequality of the prior and is the first distribution-dependent bound in thissetting. We prove it using a frequentist-like argument, where we carry theprior through, and then integrate out the bandit instance at the end. We alsoprovide a lower bound on the probability of misidentification in a $2$-armedBayesian bandit and show that our upper bound (almost) matches it for anybudget. Our experiments show that Bayesian elimination is superior tofrequentist methods and competitive with the state-of-the-art Bayesianalgorithms that have no guarantees in our setting.", "output": "Bayesian Fixed-Budget Best-Arm Identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The present work investigates the use of physics-informed neural networks(PINNs) for the 3D reconstruction of unsteady gravity currents from limiteddata. In the PINN context, the flow fields are reconstructed by training aneural network whose objective function penalizes the mismatch between thenetwork predictions and the observed data and embeds the underlying equationsusing automatic differentiation. This study relies on a high-fidelity numericalexperiment of the canonical lock-exchange configuration. This allows us tobenchmark quantitatively the PINNs reconstruction capabilities on severaltraining databases that mimic state-of-the-art experimental measurementtechniques for density and velocity. Notably, spatially averaged densitymeasurements by light attenuation technique (LAT) are employed for the trainingprocedure. An optimal experimental setup for flow reconstruction by PINNs isproposed according to two criteria : the implementation complexity and theaccuracy of the inferred fields.", "output": "Physics-informed neural networks for gravity currents reconstruction from limited data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The traditional Neural Network-development process requires substantialexpert knowledge and relies heavily on intuition and trial-and-error. NeuralArchitecture Search (NAS) frameworks were introduced to robustly search fornetwork topologies, as well as facilitate the automated development of NeuralNetworks. While some optimization approaches -- such as Genetic Algorithms --have been extensively explored in the NAS context, other MetaheuristicOptimization algorithms have not yet been investigated. In this study, weevaluate the viability of Artificial Bee Colony optimization for NeuralArchitecture Search. Our proposed framework, HiveNAS, outperforms existingstate-of-the-art Swarm Intelligence-based NAS frameworks in a fraction of thetime.", "output": "HiveNAS: Neural Architecture Search using Artificial Bee Colony Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks in reinforcement learning (RL) often assumehighly-privileged access to the victim's parameters, environment, or data.Instead, this paper proposes a novel adversarial setting called a Cheap TalkMDP in which an Adversary can merely append deterministic messages to theVictim's observation, resulting in a minimal range of influence. The Adversarycannot occlude ground truth, influence underlying environment dynamics orreward signals, introduce non-stationarity, add stochasticity, see the Victim'sactions, or access their parameters. Additionally, we present a simplemeta-learning algorithm called Adversarial Cheap Talk (ACT) to trainAdversaries in this setting. We demonstrate that an Adversary trained with ACTstill significantly influences the Victim's training and testing performance,despite the highly constrained setting. Affecting train-time performancereveals a new attack vector and provides insight into the success and failuremodes of existing RL algorithms. More specifically, we show that an ACTAdversary is capable of harming performance by interfering with the learner'sfunction approximation, or instead helping the Victim's performance byoutputting useful features. Finally, we show that an ACT Adversary canmanipulate messages during train-time to directly and arbitrarily control theVictim at test-time. Project video and code are available at", "output": "Adversarial Cheap Talk."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deployed models decay over time due to shifting inputs, changing user needs,or emergent knowledge gaps. When harmful behaviors are identified, targetededits are required. However, current model editors, which adjust specificbehaviors of pre-trained models, degrade model performance over multiple edits.We propose GRACE, a Lifelong Model Editing method, which implements spot-fixeson streaming errors of a deployed model, ensuring minimal impact on unrelatedinputs. GRACE writes new mappings into a pre-trained model's latent space,creating a discrete, local codebook of edits without altering model weights.This is the first method enabling thousands of sequential edits using onlystreaming errors. Our experiments on T5, BERT, and GPT models show GRACE'sstate-of-the-art performance in making and retaining edits, while generalizingto unseen inputs. Our code is available at", "output": "Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adapters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sports analytics has captured increasing attention since analysis of thevarious data enables insights for training strategies, player evaluation, etc.In this paper, we focus on predicting what types of returning strokes will bemade, and where players will move to based on previous strokes. As this problemhas not been addressed to date, movement forecasting can be tackled throughsequence-based and graph-based models by formulating as a sequence predictiontask. However, existing sequence-based models neglect the effects ofinteractions between players, and graph-based models still suffer frommultifaceted perspectives on the next movement. Moreover, there is no existingwork on representing strategic relations among players' shot types andmovements. To address these challenges, we first introduce the procedure of thePlayer Movements (PM) graph to exploit the structural movements of players withstrategic relations. Based on the PM graph, we propose a novel Dynamic Graphsand Hierarchical Fusion for Movement Forecasting model (DyMF) with interactionstyle extractors to capture the mutual interactions of players themselves andbetween both players within a rally, and dynamic players' tactics across time.In addition, hierarchical fusion modules are designed to incorporate the styleinfluence of both players and rally interactions. Extensive experiments showthat our model empirically outperforms both sequence- and graph-based methodsand demonstrate the practical usage of movement forecasting.", "output": "Where Will Players Move Next? Dynamic Graphs and Hierarchical Fusion for Movement Forecasting in Badminton."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) are one of the most popular research topics fordeep learning. GNN methods typically have been designed on top of the graphsignal processing theory. In particular, diffusion equations have been widelyused for designing the core processing layer of GNNs, and therefore they areinevitably vulnerable to the notorious oversmoothing problem. Recently, acouple of papers paid attention to reaction equations in conjunctions withdiffusion equations. However, they all consider limited forms of reactionequations. To this end, we present a reaction-diffusion equation-based GNNmethod that considers all popular types of reaction equations in addition toone special reaction equation designed by us. To our knowledge, our paper isone of the most comprehensive studies on reaction-diffusion equation-basedGNNs. In our experiments with 9 datasets and 28 baselines, our method, calledGREAD, outperforms them in a majority of cases. Further synthetic dataexperiments show that it mitigates the oversmoothing problem and works well forvarious homophily rates.", "output": "GREAD: Graph Neural Reaction-Diffusion Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual Learning (CL) is the process of learning ceaselessly a sequence oftasks. Most existing CL methods deal with independent data (e.g., images andtext) for which many benchmark frameworks and results under standardexperimental settings are available. However, CL methods for graph data (graphCL) are surprisingly underexplored because of (a) the lack of standardexperimental settings, especially regarding how to deal with the dependencybetween instances, (b) the lack of benchmark datasets and scenarios, and (c)high complexity in implementation and evaluation due to the dependency. In thispaper, regarding (a), we define four standard incremental settings (task-,class-, domain-, and time-incremental) for graph data, which are naturallyapplied to many node-, link-, and graph-level problems. Regarding (b), weprovide 25 benchmark scenarios based on 15 real-world graphs. Regarding (c), wedevelop BeGin, an easy and fool-proof framework for graph CL. BeGin is easilyextended since it is modularized with reusable modules for data processing,algorithm design, and evaluation. Especially, the evaluation module iscompletely separated from user code to eliminate potential mistakes. Using allthe above, we report extensive benchmark results of 10 graph CL methods.Compared to the latest benchmark for graph CL, using BeGin, we cover 3x morecombinations of incremental settings and levels of problems. All assets for thebenchmark framework are available at ", "output": "BeGin: Extensive Benchmark Scenarios and An Easy-to-use Framework for Graph Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Powerful hardware services and software libraries are vital tools for quicklyand affordably designing, testing, and executing quantum algorithms. A robustlarge-scale study of how the performance of these platforms scales with thenumber of qubits is key to providing quantum solutions to challenging industryproblems. This work benchmarks the runtime and accuracy for a representativesample of specialized high-performance simulated and physical quantumprocessing units. Results show the QMware simulator can reduce the runtime forexecuting a quantum circuit by up to 78% compared to the next fastest optionfor algorithms with fewer than 27 qubits. The AWS SV1 simulator offers aruntime advantage for larger circuits, up to the maximum 34 qubits availablewith SV1. Beyond this limit, QMware can execute circuits as large as 40 qubits.Physical quantum devices, such as Rigetti's Aspen-M2, can provide anexponential runtime advantage for circuits with more than 30 qubits. However,the high financial cost of physical quantum processing units presents a seriousbarrier to practical use. Moreover, only IonQ's Harmony quantum device achieveshigh fidelity with more than four qubits. This study paves the way tounderstanding the optimal combination of available software and hardware forexecuting practical quantum algorithms.", "output": "Benchmarking simulated and physical quantum processing units using quantum and hybrid algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal effect estimation from observational data is a fundamental task inempirical sciences. It becomes particularly challenging when unobservedconfounders are involved in a system. This paper focuses on front-dooradjustment -- a classic technique which, using observed mediators allows toidentify causal effects even in the presence of unobserved confounding. Whilethe statistical properties of the front-door estimation are quite wellunderstood, its algorithmic aspects remained unexplored for a long time.Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the firstpolynomial-time algorithm for finding sets satisfying the front-door criterionin a given directed acyclic graph (DAG), with an $O(n^3(n+m))$ run time, where$n$ denotes the number of variables and $m$ the number of edges of the causalgraph. In our work, we give the first linear-time, i.e., $O(n+m)$, algorithmfor this task, which thus reaches the asymptotically optimal time complexity.This result implies an $O(n(n+m))$ delay enumeration algorithm of allfront-door adjustment sets, again improving previous work by Jeong et al. by afactor of $n^3$. Moreover, we provide the first linear-time algorithm forfinding a minimal front-door adjustment set. We offer implementations of ouralgorithms in multiple programming languages to facilitate practical usage andempirically validate their feasibility, even for large graphs.", "output": "Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial training is well-known to produce high-quality neural networkmodels that are empirically robust against adversarial perturbations.Nevertheless, once a model has been adversarially trained, one often desires acertification that the model is truly robust against all future attacks.Unfortunately, when faced with adversarially trained models, all existingapproaches have significant trouble making certifications that are strongenough to be practically useful. Linear programming (LP) techniques inparticular face a \"convex relaxation barrier\" that prevent them from makinghigh-quality certifications, even after refinement with mixed-integer linearprogramming (MILP) and branch-and-bound (BnB) techniques. In this paper, wepropose a nonconvex certification technique, based on a low-rank restriction ofa semidefinite programming (SDP) relaxation. The nonconvex relaxation makesstrong certifications comparable to much more expensive SDP methods, whileoptimizing over dramatically fewer variables comparable to much weaker LPmethods. Despite nonconvexity, we show how off-the-shelf local optimizationalgorithms can be used to achieve and to certify global optimality inpolynomial time. Our experiments find that the nonconvex relaxation almostcompletely closes the gap towards exact certification of adversarially trainedmodels.", "output": "Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We analyze Newton's method with lazy Hessian updates for solving generalpossibly non-convex optimization problems. We propose to reuse a previouslyseen Hessian for several iterations while computing new gradients at each stepof the method. This significantly reduces the overall arithmetical complexityof second-order optimization schemes. By using the cubic regularizationtechnique, we establish fast global convergence of our method to a second-orderstationary point, while the Hessian does not need to be updated each iteration.For convex problems, we justify global and local superlinear rates for lazyNewton steps with quadratic regularization, which is easier to compute. Theoptimal frequency for updating the Hessian is once every $d$ iterations, where$d$ is the dimension of the problem. This provably improves the totalarithmetical complexity of second-order algorithms by a factor $sqrt{d}$.", "output": "Second-order optimization with lazy Hessians."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A prominent paradigm for graph neural networks is based on themessage-passing framework. In this framework, information communication isrealized only between neighboring nodes. The challenge of approaches that usethis paradigm is to ensure efficient and accurate long-distance communicationbetween nodes, as deep convolutional networks are prone to oversmoothing. Inthis paper, we present a novel method based on time derivative graph diffusion(TIDE) to overcome these structural limitations of the message-passingframework. Our approach allows for optimizing the spatial extent of diffusionacross various tasks and network channels, thus enabling medium andlong-distance communication efficiently. Furthermore, we show that ourarchitecture design also enables local message-passing and thus inherits fromthe capabilities of local message-passing approaches. We show that on bothwidely used graph benchmarks and synthetic mesh and graph datasets, theproposed framework outperforms state-of-the-art methods by a significant margin", "output": "TIDE: Time Derivative Diffusion for Deep Learning on Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we examine the effectiveness of pre-training for visuo-motorcontrol tasks. We revisit a simple Learning-from-Scratch (LfS) baseline thatincorporates data augmentation and a shallow ConvNet, and find that thisbaseline is surprisingly competitive with recent approaches (PVR, MVP, R3M)that leverage frozen visual representations trained on large-scale visiondatasets -- across a variety of algorithms, task domains, and metrics insimulation and on a real robot. Our results demonstrate that these methods arehindered by a significant domain gap between the pre-training datasets andcurrent benchmarks for visuo-motor control, which is alleviated by finetuning.Based on our findings, we provide recommendations for future research inpre-training for control and hope that our simple yet strong baseline will aidin accurately benchmarking progress in this area.", "output": "On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current self-supervised learning algorithms are often modality-specific andrequire large amounts of computational resources. To address these issues, weincrease the training efficiency of data2vec, a learning objective thatgeneralizes across several modalities. We do not encode masked tokens, use afast convolutional decoder and amortize the effort to build teacherrepresentations. data2vec 2.0 benefits from the rich contextualized targetrepresentations introduced in data2vec which enable a fast self-supervisedlearner. Experiments on ImageNet-1K image classification show that data2vec 2.0matches the accuracy of Masked Autoencoders in 16.4x lower pre-training time,on Librispeech speech recognition it performs as well as wav2vec 2.0 in 10.6xless time, and on GLUE natural language understanding it matches a retrainedRoBERTa model in half the time. Trading some speed for accuracy results inImageNet-1K top-1 accuracy of 86.8% with a ViT-L model trained for 150 epochs.", "output": "Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Through in-context learning (ICL), large-scale language models are effectivefew-shot learners without additional model fine-tuning. However, the ICLperformance does not scale well with the number of available training samplesas it is limited by the inherent input length constraint of the underlyinglanguage model. Meanwhile, many studies have revealed that language models arealso powerful feature extractors, allowing them to be utilized in a black-boxmanner and enabling the linear probing paradigm, where lightweightdiscriminators are trained on top of the pre-extracted input representations.This paper proposes prompt-augmented linear probing (PALP), a hybrid of linearprobing and ICL, which leverages the best of both worlds. PALP inherits thescalability of linear probing and the capability of enforcing language modelsto derive more meaningful representations via tailoring input into a moreconceivable form. Throughout in-depth investigations on various datasets, weverified that PALP significantly enhances the input representations closing thegap between ICL in the data-hungry scenario and fine-tuning in thedata-abundant scenario with little training overhead, potentially making PALP astrong alternative in a black-box scenario.", "output": "Prompt-Augmented Linear Probing: Scaling beyond the Limit of Few-shot In-Context Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Natural data is redundant yet predominant architectures tile computationuniformly across their input and output space. We propose the RecurrentInterface Networks (RINs), an attention-based architecture that decouples itscore computation from the dimensionality of the data, enabling adaptivecomputation for more scalable generation of high-dimensional data. RINs focusthe bulk of computation (i.e. global self-attention) on a set of latent tokens,using cross-attention to read and write (i.e. route) information between latentand data tokens. Stacking RIN blocks allows bottom-up (data to latent) andtop-down (latent to data) feedback, leading to deeper and more expressiverouting. While this routing introduces challenges, this is less problematic inrecurrent computation settings where the task (and routing problem) changesgradually, such as iterative generation with diffusion models. We show how toleverage recurrence by conditioning the latent tokens at each forward pass ofthe reverse diffusion process with those from prior computation, i.e. latentself-conditioning. RINs yield state-of-the-art pixel diffusion models for imageand video generation, scaling to 1024X1024 images without cascades or guidance,while being domain-agnostic and up to 10X more efficient than 2D and 3D U-Nets.", "output": "Scalable Adaptive Computation for Iterative Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mixup is a popular data augmentation technique for training deep neuralnetworks where additional samples are generated by linearly interpolating pairsof inputs and their labels. This technique is known to improve thegeneralization performance in many learning paradigms and applications. In thiswork, we first analyze Mixup and show that it implicitly regularizes infinitelymany directional derivatives of all orders. Based on this new insight, wepropose an improved version of Mixup, theoretically justified to deliver bettergeneralization performance than the vanilla Mixup. To demonstrate theeffectiveness of the proposed method, we conduct experiments across variousdomains such as images, tabular data, speech, and graphs. Our results show thatthe proposed method improves Mixup across multiple datasets using a variety ofarchitectures, for instance, exhibiting an improvement over Mixup by 0.8% inImageNet top-1 accuracy.", "output": "MixupE: Understanding and Improving Mixup from Directional Derivative Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "EXplainable Artificial Intelligence (XAI) is a vibrant research topic in theartificial intelligence community, with growing interest across methods anddomains. Much has been written about the subject, yet XAI still lacks sharedterminology and a framework capable of providing structural soundness toexplanations. In our work, we address these issues by proposing a noveldefinition of explanation that is a synthesis of what can be found in theliterature. We recognize that explanations are not atomic but the combinationof evidence stemming from the model and its input-output mapping, and the humaninterpretation of this evidence. Furthermore, we fit explanations into theproperties of faithfulness (i.e., the explanation being a true description ofthe model's inner workings and decision-making process) and plausibility (i.e.,how much the explanation looks convincing to the user). Using our proposedtheoretical framework simplifies how these properties are operationalized andit provides new insight into common explanation methods that we analyze as casestudies.", "output": "A Theoretical Framework for AI Models Explainability with Application in Biomedicine."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale chest x-ray datasets have been curated for the detection ofabnormalities using deep learning, with the potential to provide substantialbenefits across many clinical applications. However, each dataset focuses onlyon a subset of findings that can be simultaneously present in a patient, makingit challenging to train models that aggregate multiple datasets together.Therefore, data harmonization is crucial to leverage these datasets inaggregate to train clinically useful models with a complete representation ofabnormalities that may occur within the thorax. To that end, we proposesurgical aggregation, a collaborative learning framework for harmonizing andaggregating knowledge from distributed heterogeneous datasets with partialannotations. We evaluate surgical aggregation across synthetic and real-worldheterogeneous datasets with partial annotations. Our results indicate thatsurgical aggregation outperforms current strategies, generalizes better, andhas the potential to facilitate the development of clinically useful modelseven when using datasets with heterogeneous disease labels.", "output": "Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Synthetic data generation is a promising solution to address privacy issueswith the distribution of sensitive health data. Recently, diffusion models haveset new standards for generative models for different data modalities. Alsovery recently, structured state space models emerged as a powerful modelingparadigm to capture long-term dependencies in time series. We put forwardSSSD-ECG, as the combination of these two technologies, for the generation ofsynthetic 12-lead electrocardiograms conditioned on more than 70 ECGstatements. Due to a lack of reliable baselines, we also propose conditionalvariants of two state-of-the-art unconditional generative models. We thoroughlyevaluate the quality of the generated samples, by evaluating pretrainedclassifiers on the generated data and by evaluating the performance of aclassifier trained only on synthetic data, where SSSD-ECG clearly outperformsits GAN-based competitors. We demonstrate the soundness of our approach throughfurther experiments, including conditional class interpolation and a clinicalTuring test demonstrating the high quality of the SSSD-ECG samples across awide range of conditions.", "output": "Diffusion-based Conditional ECG Generation with Structured State Space Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Batch Bayesian optimisation and Bayesian quadrature have been shown to besample-efficient methods of performing optimisation and quadrature whereexpensive-to-evaluate objective functions can be queried in parallel. However,current methods do not scale to large batch sizes -- a frequent desideratum inpractice (e.g. drug discovery or simulation-based inference). We present anovel algorithm, SOBER, which permits scalable and diversified batch globaloptimisation and quadrature with arbitrary acquisition functions and kernelsover discrete and mixed spaces. The key to our approach is to reformulate batchselection for global optimisation as a quadrature problem, which relaxesacquisition function maximisation (non-convex) to kernel recombination(convex). Bridging global optimisation and quadrature can efficiently solveboth tasks by balancing the merits of exploitative Bayesian optimisation andexplorative Bayesian quadrature. We show that SOBER outperforms 11 competitivebaselines on 12 synthetic and diverse real-world tasks.", "output": "SOBER: Highly Parallel Bayesian Optimization and Bayesian Quadrature over Discrete and Mixed Spaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose an efficient method to ground pretrained text-only language modelsto the visual domain, enabling them to process arbitrarily interleavedimage-and-text data, and generate text interleaved with retrieved images. Ourmethod leverages the abilities of language models learnt from large scaletext-only pretraining, such as in-context learning and free-form textgeneration. We keep the language model frozen, and finetune input and outputlinear layers to enable cross-modality interactions. This allows our model toprocess arbitrarily interleaved image-and-text inputs, and generate free-formtext interleaved with retrieved images. We achieve strong zero-shot performanceon grounded tasks such as contextual image retrieval and multimodal dialogue,and showcase compelling interactive abilities. Our approach works with anyoff-the-shelf language model and paves the way towards an effective, generalsolution for leveraging pretrained language models in visually groundedsettings.", "output": "Grounding Language Models to Images for Multimodal Inputs and Outputs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Training complex machine learning (ML) architectures requires a compute andtime consuming process of selecting the right optimizer and tuning itshyper-parameters. A new paradigm of learning optimizers from data has emergedas a better alternative to hand-designed ML optimizers. We propose Mnemosyneoptimizer, that uses Performers: implicit low-rank attention Transformers. Itcan learn to train entire neural network architectures including otherTransformers without any task-specific optimizer tuning. We show thatMnemosyne: (a) generalizes better than popular LSTM optimizer, (b) inparticular can successfully train Vision Transformers (ViTs) whilemeta--trained on standard MLPs and (c) can initialize optimizers for fasterconvergence in Robotics applications. We believe that these results open thepossibility of using Transformers to build foundational optimization modelsthat can address the challenges of regular Transformer training. We complementour results with an extensive theoretical analysis of the compact associativememory used by Mnemosyne.", "output": "Mnemosyne: Learning to Train Transformers with Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Determining the extent to which the perceptual world can be recovered fromlanguage is a longstanding problem in philosophy and cognitive science. We showthat state-of-the-art large language models can unlock new insights into thisproblem by providing a lower bound on the amount of perceptual information thatcan be extracted from language. Specifically, we elicit pairwise similarityjudgments from GPT models across six psychophysical datasets. We show that thejudgments are significantly correlated with human data across all domains,recovering well-known representations like the color wheel and pitch spiral.Surprisingly, we find that a model (GPT-4) co-trained on vision and languagedoes not necessarily lead to improvements specific to the visual modality. Tostudy the influence of specific languages on perception, we also apply themodels to a multilingual color-naming task. We find that GPT-4 replicatescross-linguistic variation in English and Russian illuminating the interactionof language and perception.", "output": "Large language models predict human sensory judgments across six modalities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In digital online advertising, advertisers procure ad impressionssimultaneously on multiple platforms, or so-called channels, such as GoogleAds, Meta Ads Manager, etc., each of which consists of numerous ad auctions. Westudy how an advertiser maximizes total conversion (e.g. ad clicks) whilesatisfying aggregate return-on-investment (ROI) and budget constraints acrossall channels. In practice, an advertiser does not have control over, and thuscannot globally optimize, which individual ad auctions she participates in foreach channel, and instead authorizes a channel to procure impressions on herbehalf: the advertiser can only utilize two levers on each channel, namelysetting a per-channel budget and per-channel target ROI. In this work, we firstanalyze the effectiveness of each of these levers for solving the advertiser'sglobal multi-channel problem. We show that when an advertiser only optimizesover per-channel ROIs, her total conversion can be arbitrarily worse than whatshe could have obtained in the global problem. Further, we show that theadvertiser can achieve the global optimal conversion when she only optimizesover per-channel budgets. In light of this finding, under a bandit feedbacksetting that mimics real-world scenarios where advertisers have limitedinformation on ad auctions in each channels and how channels procure ads, wepresent an efficient learning algorithm that produces per-channel budgets whoseresulting conversion approximates that of the global optimal problem. Finally,we argue that all our results hold for both single-item and multi-item auctionsfrom which channels procure impressions on advertisers' behalf.", "output": "Multi-channel Autobidding with Budget and ROI Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we present a variety of novel information-theoreticgeneralization bounds for learning algorithms, from the supersample setting ofSteinke &amp; Zakynthinou (2020)-the setting of the \"conditional mutualinformation\" framework. Our development exploits projecting the loss pair(obtained from a training instance and a testing instance) down to a singlenumber and correlating loss values with a Rademacher sequence (and its shiftedvariants). The presented bounds include square-root bounds, fast-rate bounds,including those based on variance and sharpness, and bounds for interpolatingalgorithms etc. We show theoretically or empirically that these bounds aretighter than all information-theoretic bounds known to date on the samesupersample setting.", "output": "Tighter Information-Theoretic Generalization Bounds from Supersamples."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the employment of deep learning methods has led to severalsignificant breakthroughs in artificial intelligence. Different fromtraditional machine learning models, deep learning-based approaches are able toextract features autonomously from raw data. This allows for bypassing thefeature engineering process, which is generally considered to be botherror-prone and tedious. Moreover, deep learning strategies often outperformtraditional models in terms of accuracy.", "output": "Stop overkilling simple tasks with black-box models and use transparent models instead."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks that model 3D data, such as point clouds or atoms, aretypically desired to be $SO(3)$ equivariant, i.e., equivariant to 3D rotations.Unfortunately equivariant convolutions, which are a fundamental operation forequivariant networks, increase significantly in computational complexity ashigher-order tensors are used. In this paper, we address this issue by reducingthe $SO(3)$ convolutions or tensor products to mathematically equivalentconvolutions in $SO(2)$ . This is accomplished by aligning the node embeddings'primary axis with the edge vectors, which sparsifies the tensor product andreduces the computational complexity from $O(L^6)$ to $O(L^3)$, where $L$ isthe degree of the representation. We demonstrate the potential implications ofthis improvement by proposing the Equivariant Spherical Channel Network (eSCN),a graph neural network utilizing our novel approach to equivariantconvolutions, which achieves state-of-the-art results on the large-scale OC-20and OC-22 datasets.", "output": "Reducing SO(3) Convolutions to SO(2) for Efficient Equivariant GNNs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a model-agnostic federated learning method for networks ofheterogeneous data and models. The network structure reflects similaritiesbetween the (statistics of) local datasets and, in turn, their associatedlocal(\"personal\") models. Our method is an instance of empirical riskminimization, with the regularization term derived from the network structureof data. In particular, we require well-connected local models, formingclusters, to yield similar predictions on a common test set. The proposedmethod allows for a wide range of local models. The only restriction on theselocal models is that they allow for efficient implementation of regularizedempirical risk minimization (training). For a wide range of models, suchimplementations are available in high-level programming libraries includingscikit-learn, Keras or PyTorch.", "output": "Towards Model-Agnostic Federated Learning over Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recommendation systems aim to predict users' feedback on items not exposed tothem.Confounding bias arises due to the presence of unmeasured variables (e.g.,the socio-economic status of a user) that can affect both a user's exposure andfeedback. Existing methods either (1) make untenable assumptions about theseunmeasured variables or (2) directly infer latent confounders from users'exposure. However, they cannot guarantee the identification of counterfactualfeedback, which can lead to biased predictions. In this work, we propose anovel method, i.e., identifiable deconfounder (iDCF), which leverages a set ofproxy variables (e.g., observed user features) to resolve the aforementionednon-identification issue. The proposed iDCF is a general deconfoundedrecommendation framework that applies proximal causal inference to infer theunmeasured confounders and identify the counterfactual feedback withtheoretical guarantees. Extensive experiments on various real-world andsynthetic datasets verify the proposed method's effectiveness and robustness.", "output": "Debiasing Recommendation by Learning Identifiable Latent Confounders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately detecting crack boundaries is crucial for reliability assessmentand risk management of structures and materials, such as structural healthmonitoring, diagnostics, prognostics, and maintenance scheduling. Uncertaintyquantification of crack detection is challenging due to various stochasticfactors, such as measurement noises, signal processing, and modelsimplifications. A machine learning-based approach is proposed to quantify bothepistemic and aleatoric uncertainties concurrently. We introduce a BayesianBoundary-Aware Convolutional Network (B-BACN) that emphasizes uncertainty-awareboundary refinement to generate precise and reliable crack boundary detections.The proposed method employs a multi-task learning approach, where we use MonteCarlo Dropout to learn the epistemic uncertainty and a Gaussian samplingfunction to predict each sample's aleatoric uncertainty. Moreover, we include aboundary refinement loss to B-BACN to enhance the determination of defectboundaries. The proposed method is demonstrated with benchmark experimentalresults and compared with several existing methods. The experimental resultsillustrate the effectiveness of our proposed approach in uncertainty-awarecrack boundary detection, minimizing misclassification rate, and improvingmodel calibration capabilities.", "output": "B-BACN: Bayesian Boundary-Aware Convolutional Network for Crack Characterization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study social learning dynamics where the agents collectively follow asimple multi-armed bandit protocol. Agents arrive sequentially, choose arms andreceive associated rewards. Each agent observes the full history (arms andrewards) of the previous agents, and there are no private signals. Whilecollectively the agents face exploration-exploitation tradeoff, each agent actsmyopically, without regards to exploration. Motivating scenarios concernreviews and ratings on online platforms.We allow a wide range of myopic behaviors that are consistent with(parameterized) confidence intervals, including the \"unbiased\" behavior as wellas various behaviorial biases. While extreme versions of these behaviorscorrespond to well-known bandit algorithms, we prove that more moderateversions lead to stark exploration failures, and consequently to regret ratesthat are linear in the number of agents. We provide matching upper bounds onregret by analyzing \"moderately optimistic\" agents.As a special case of independent interest, we obtain a general result onfailure of the greedy algorithm in multi-armed bandits. This is the first suchresult in the literature, to the best of our knowledge.", "output": "Bandit Social Learning: Exploration under Myopic Behavior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural network (GNN) is a powerful learning approach for graph-basedrecommender systems. Recently, GNNs integrated with contrastive learning haveshown superior performance in recommendation with their data augmentationschemes, aiming at dealing with highly sparse data. Despite their success, mostexisting graph contrastive learning methods either perform stochasticaugmentation (e.g., node/edge perturbation) on the user-item interaction graph,or rely on the heuristic-based augmentation techniques (e.g., user clustering)for generating contrastive views. We argue that these methods cannot wellpreserve the intrinsic semantic structures and are easily biased by the noiseperturbation. In this paper, we propose a simple yet effective graphcontrastive learning paradigm LightGCL that mitigates these issues impairingthe generality and robustness of CL-based recommenders. Our model exclusivelyutilizes singular value decomposition for contrastive augmentation, whichenables the unconstrained structural refinement with global collaborativerelation modeling. Experiments conducted on several benchmark datasetsdemonstrate the significant improvement in performance of our model over thestate-of-the-arts. Further analyses demonstrate the superiority of LightGCL'srobustness against data sparsity and popularity bias. The source code of ourmodel is available at ", "output": "LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language models (LMs) are pretrained to imitate internet text, includingcontent that would violate human preferences if generated by an LM: falsehoods,offensive comments, personally identifiable information, low-quality or buggycode, and more. Here, we explore alternative objectives for pretraining LMs ina way that also guides them to generate text aligned with human preferences. Webenchmark five objectives for pretraining with human feedback across threetasks and study how they affect the trade-off between alignment andcapabilities of pretrained LMs. We find a Pareto-optimal and simple approachamong those we explored: conditional training, or learning distribution overtokens conditional on their human preference scores given by a reward model.Conditional training reduces the rate of undesirable content by up to an orderof magnitude, both when generating without a prompt and with anadversarially-chosen prompt. Moreover, conditional training maintains thedownstream task performance of standard LM pretraining, both before and aftertask-specific finetuning. Pretraining with human feedback results in muchbetter preference satisfaction than standard LM pretraining followed byfinetuning with feedback, i.e., learning and then unlearning undesirablebehavior. Our results suggest that we should move beyond imitation learningwhen pretraining LMs and incorporate human preferences from the start oftraining.", "output": "Pretraining Language Models with Human Preferences."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traffic congestion is a persistent problem in our society. Existing methodsfor traffic control have proven futile in alleviating current congestion levelsleading researchers to explore ideas with robot vehicles given the increasedemergence of vehicles with different levels of autonomy on our roads. Thisgives rise to mixed traffic control, where robot vehicles regulate human-drivenvehicles through reinforcement learning (RL). However, most existing studiesuse precise observations that involve global information, such as environmentoutflow, and local information, i.e., vehicle positions and velocities.Obtaining this information requires updating existing road infrastructure withvast sensor environments and communication to potentially unwilling humandrivers. We consider image observations as the alternative for mixed trafficcontrol via RL: 1) images are ubiquitous through satellite imagery, in-carcamera systems, and traffic monitoring systems; 2) images do not require acomplete re-imagination of the observation space from environment toenvironment; and 3) images only require communication to equipment. In thiswork, we show robot vehicles using image observations can achieve similarperformance to using precise information on environments, including ring,figure eight, intersection, merge, and bottleneck. In certain scenarios, ourapproach even outperforms using precision observations, e.g., up to 26%increase in average vehicle velocity in the merge environment and a 6% increasein outflow in the bottleneck environment, despite only using local trafficinformation as opposed to global traffic information.", "output": "Mixed Traffic Control and Coordination from Pixels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recommender systems play an important role in many content platforms. Whilemost recommendation research is dedicated to designing better models to improveuser experience, we found that research on stabilizing the training for suchmodels is severely under-explored. As recommendation models become larger andmore sophisticated, they are more susceptible to training instability issues,i.e., loss divergence, which can make the model unusable, waste significantresources and block model developments. In this paper, we share our findingsand best practices we learned for improving the training stability of areal-world multitask ranking model for YouTube recommendations. We show someproperties of the model that lead to unstable training and conjecture on thecauses. Furthermore, based on our observations of training dynamics near thepoint of training instability, we hypothesize why existing solutions wouldfail, and propose a new algorithm to mitigate the limitations of existingsolutions. Our experiments on YouTube production dataset show the proposedalgorithm can significantly improve training stability while not compromisingconvergence, comparing with several commonly used baseline methods.", "output": "Improving Training Stability for Multitask Ranking Models in Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work we propose deep learning-based algorithms for the computation ofsystemic shortfall risk measures defined via multivariate utility functions. Wediscuss the key related theoretical aspects, with a particular focus on thefairness properties of primal optima and associated risk allocations. Thealgorithms we provide allow for learning primal optimizers, optima for the dualrepresentation and corresponding fair risk allocations. We test our algorithmsby comparison to a benchmark model, based on a paired exponential utilityfunction, for which we can provide explicit formulas. We also show evidence ofconvergence in a case for which explicit formulas are not available.", "output": "Multivariate Systemic Risk Measures and Computation by Deep Learning Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We use concept-based interpretable models to mitigate shortcut learning.Existing methods lack interpretability. Beginning with a Blackbox, weiteratively emph{carve out} a mixture of interpretable experts (MoIE) and aemph{residual network}. Each expert explains a subset of data using FirstOrder Logic (FOL). While explaining a sample, the FOL from biased BB-derivedMoIE detects the shortcut effectively. Finetuning the BB with MetadataNormalization (MDN) eliminates the shortcut. The FOLs from thefinetuned-BB-derived MoIE verify the elimination of the shortcut. Ourexperiments show that MoIE does not hurt the accuracy of the original BB andeliminates shortcuts effectively.", "output": "Tackling Shortcut Learning in Deep Neural Networks: An Iterative Approach with Interpretable Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "ML models are known to be vulnerable to adversarial query attacks. In theseattacks, queries are iteratively perturbed towards a particular class withoutany knowledge of the target model besides its output. The prevalence ofremotely-hosted ML classification models and Machine-Learning-as-a-Serviceplatforms means that query attacks pose a real threat to the security of thesesystems. To deal with this, stateful defenses have been proposed to detectquery attacks and prevent the generation of adversarial examples by monitoringand analyzing the sequence of queries received by the system. Several statefuldefenses have been proposed in recent years. However, these defenses relysolely on similarity or out-of-distribution detection methods that may beeffective in other domains. In the malware detection domain, the methods togenerate adversarial examples are inherently different, and therefore we findthat such detection mechanisms are significantly less effective. Hence, in thispaper, we present MalProtect, which is a stateful defense against query attacksin the malware detection domain. MalProtect uses several threat indicators todetect attacks. Our results show that it reduces the evasion rate ofadversarial query attacks by 80+% in Android and Windows malware, across arange of attacker scenarios. In the first evaluation of its kind, we show thatMalProtect outperforms prior stateful defenses, especially under the peakadversarial threat.", "output": "MalProtect: Stateful Defense Against Adversarial Query Attacks in ML-based Malware Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although the variational autoencoder (VAE) and its conditional extension(CVAE) are capable of state-of-the-art results across multiple domains, theirprecise behavior is still not fully understood, particularly in the context ofdata (like images) that lie on or near a low-dimensional manifold. For example,while prior work has suggested that the globally optimal VAE solution can learnthe correct manifold dimension, a necessary (but not sufficient) condition forproducing samples from the true data distribution, this has never beenrigorously proven. Moreover, it remains unclear how such considerations wouldchange when various types of conditioning variables are introduced, or when thedata support is extended to a union of manifolds (e.g., as is likely the casefor MNIST digits and related). In this work, we address these points by firstproving that VAE global minima are indeed capable of recovering the correctmanifold dimension. We then extend this result to more general CVAEs,demonstrating practical scenarios whereby the conditioning variables allow themodel to adaptively learn manifolds of varying dimension across samples. Ouranalyses, which have practical implications for various CVAE design choices,are also supported by numerical results on both synthetic and real-worlddatasets.", "output": "Learning Manifold Dimensions with Conditional Variational Autoencoders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "MADDPG is an algorithm in multi-agent reinforcement learning (MARL) thatextends the popular single-agent method, DDPG, to multi-agent scenarios.Importantly, DDPG is an algorithm designed for continuous action spaces, wherethe gradient of the state-action value function exists. For this algorithm towork in discrete action spaces, discrete gradient estimation must be performed.For MADDPG, the Gumbel-Softmax (GS) estimator is used -- a reparameterisationwhich relaxes a discrete distribution into a similar continuous one. Thismethod, however, is statistically biased, and a recent MARL benchmarking papersuggests that this bias makes MADDPG perform poorly in grid-world situations,where the action space is discrete. Fortunately, many alternatives to the GSexist, boasting a wide range of properties. This paper explores several ofthese alternatives and integrates them into MADDPG for discrete grid-worldscenarios. The corresponding impact on various performance metrics is thenmeasured and analysed. It is found that one of the proposed estimators performssignificantly better than the original GS in several tasks, achieving up to 55%higher returns, along with faster convergence.", "output": "Revisiting the Gumbel-Softmax in MADDPG."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Communication scheduling has been shown to be effective in acceleratingdistributed training, which enables all-reduce communications to be overlappedwith backpropagation computations. This has been commonly adopted in populardistributed deep learning frameworks. However, there exist two fundamentalproblems: (1) excessive startup latency proportional to the number of workersfor each all-reduce operation; (2) it only achieves sub-optimal trainingperformance due to the dependency and synchronization requirement of thefeed-forward computation in the next iteration. We propose a novel schedulingalgorithm, DeAR, that decouples the all-reduce primitive into two continuousoperations, which overlaps with both backpropagation and feed-forwardcomputations without extra communications. We further design a practical tensorfusion algorithm to improve the training performance. Experimental results withfive popular models show that DeAR achieves up to 83% and 15% training speedupover the state-of-the-art solutions on a 64-GPU cluster with 10Gb/s Ethernetand 100Gb/s InfiniBand interconnects, respectively.", "output": "DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a new method for optimistic planning in infinite-horizondiscounted Markov decision processes based on the idea of adding regularizationto the updates of an otherwise standard approximate value iteration procedure.This technique allows us to avoid contraction and monotonicity argumentstypically required by existing analyses of approximate dynamic programmingmethods, and in particular to use approximate transition functions estimatedvia least-squares procedures in MDPs with linear function approximation. We useour method to recover known guarantees in tabular MDPs and to provide acomputationally efficient algorithm for learning near-optimal policies indiscounted linear mixture MDPs from a single stream of experience, and show itachieves near-optimal statistical guarantees.", "output": "Optimistic Planning by Regularized Dynamic Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Out-of-distribution (OOD) detection recently has drawn attention due to itscritical role in the safe deployment of modern neural network architectures inreal-world applications. The OOD detectors aim to distinguish samples that lieoutside the training distribution in order to avoid the overconfidentpredictions of machine learning models on OOD data. Existing detectors, whichmainly rely on the logit, intermediate feature space, softmax score, orreconstruction loss, manage to produce promising results. However, most ofthese methods are developed for the image domain. In this study, we propose anovel reconstruction-based OOD detector to operate on the radar domain. Ourmethod exploits an autoencoder (AE) and its latent representation to detect theOOD samples. We propose two scores incorporating the patch-based reconstructionloss and the energy value calculated from the latent representations of eachpatch. We achieve an AUROC of 90.72% on our dataset collected by using 60 GHzshort-range FMCW Radar. The experiments demonstrate that, in terms of AUROC andAUPR, our method outperforms the baseline (AE) and the other state-of-the-artmethods. Also, thanks to its model size of 641 kB, our detector is suitable forembedded usage.", "output": "Reconstruction-based Out-of-Distribution Detection for Short-Range FMCW Radar."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learning partial differential equations' (PDEs) solution operators is anessential problem in machine learning. However, there are several challengesfor learning operators in practical applications like the irregular mesh,multiple input functions, and complexity of the PDEs' solution. To addressthese challenges, we propose a general neural operator transformer (GNOT), ascalable and effective transformer-based framework for learning operators. Bydesigning a novel heterogeneous normalized attention layer, our model is highlyflexible to handle multiple input functions and irregular meshes. Besides, weintroduce a geometric gating mechanism which could be viewed as a soft domaindecomposition to solve the multi-scale problems. The large model capacity ofthe transformer architecture grants our model the possibility to scale to largedatasets and practical problems. We conduct extensive experiments on multiplechallenging datasets from different domains and achieve a remarkableimprovement compared with alternative methods. Our code and data are publiclyavailable at url{", "output": "GNOT: A General Neural Operator Transformer for Operator Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing methods for explainable artificial intelligence (XAI), includingpopular feature importance measures such as SAGE, are mostly restricted to thebatch learning scenario. However, machine learning is often applied in dynamicenvironments, where data arrives continuously and learning must be done in anonline manner. Therefore, we propose iSAGE, a time- and memory-efficientincrementalization of SAGE, which is able to react to changes in the model aswell as to drift in the data-generating process. We further provide efficientfeature removal methods that break (interventional) and retain (observational)feature dependencies. Moreover, we formally analyze our explanation method toshow that iSAGE adheres to similar theoretical properties as SAGE. Finally, weevaluate our approach in a thorough experimental analysis based onwell-established data sets and data streams with concept drift.", "output": "iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the field of reinforcement learning (RL), agents are often tasked withsolving a variety of problems differing only in their reward functions. Inorder to quickly obtain solutions to unseen problems with new reward functions,a popular approach involves functional composition of previously solved tasks.However, previous work using such functional composition has primarily focusedon specific instances of composition functions whose limiting assumptions allowfor exact zero-shot composition. Our work unifies these examples and provides amore general framework for compositionality in both standard andentropy-regularized RL. We find that, for a broad class of functions, theoptimal solution for the composite task of interest can be related to the knownprimitive task solutions. Specifically, we present double-sided inequalitiesrelating the optimal composite value function to the value functions for theprimitive tasks. We also show that the regret of using a zero-shot policy canbe bounded for this class of functions. The derived bounds can be used todevelop clipping approaches for reducing uncertainty during training, allowingagents to quickly adapt to new tasks.", "output": "Bounding the Optimal Value Function in Compositional Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of recovering a latent graph where the observationsat each node are emph{aliased}, and transitions are stochastic. Observationsare gathered by an agent traversing the graph. Aliasing means that multiplenodes emit the same observation, so the agent can not know in which node it islocated. The agent needs to uncover the hidden topology as accurately aspossible and in as few steps as possible. This is equivalent to efficientrecovery of the transition probabilities of a partially observable Markovdecision process (POMDP) in which the observation probabilities are known. Analgorithm for efficiently exploring (and ultimately recovering) the latentgraph is provided. Our approach is exponentially faster than naive explorationin a variety of challenging topologies with aliased observations whileremaining competitive with existing baselines in the unaliased regime.", "output": "Fast exploration and learning of latent graphs with aliased observations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Probabilistic programming combines general computer programming, statisticalinference, and formal semantics to help systems make decisions when facinguncertainty. Probabilistic programs are ubiquitous, including having asignificant impact on machine intelligence. While many probabilistic algorithmshave been used in practice in different domains, their automated verificationbased on formal semantics is still a relatively new research area. In the lasttwo decades, it has attracted much interest. Many challenges, however, remain.The work presented in this paper, probabilistic relations, takes a step towardsour vision to tackle these challenges.Our work is based on Hehner's predicative probabilistic programming, butthere are several obstacles to the broader adoption of his work. Ourcontributions here include (1) the formalisation of its syntax and semantics byintroducing an Iverson bracket notation to separate relations from arithmetic;(2) the formalisation of relations using Unifying Theories of Programming (UTP)and probabilities outside the brackets using summation over the topologicalspace of the real numbers; (3) the constructive semantics for probabilisticloops using Kleene's fixed-point theorem; (4) the enrichment of its semanticsfrom distributions to subdistributions and superdistributions to deal with theconstructive semantics; (5) the unique fixed-point theorem to simplify thereasoning about probabilistic loops; and (6) the mechanisation of our theory inIsabelle/UTP, an implementation of UTP in Isabelle/HOL, for automated reasoningusing theorem proving.We demonstrate our work with six examples, including problems in robotlocalisation, classification in machine learning, and the termination ofprobabilistic loops.", "output": "Probabilistic relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There has been exploding interest in embracing Transformer-basedarchitectures for medical image segmentation. However, the lack of large-scaleannotated medical datasets make achieving performances equivalent to those innatural images challenging. Convolutional networks, in contrast, have higherinductive biases and consequently, are easily trainable to high performance.Recently, the ConvNeXt architecture attempted to modernize the standard ConvNetby mirroring Transformer blocks. In this work, we improve upon this to design amodernized and scalable convolutional architecture customized to challenges ofdata-scarce medical settings. We introduce MedNeXt, a Transformer-inspiredlarge kernel segmentation network which introduces - 1) A fully ConvNeXt 3DEncoder-Decoder Network for medical image segmentation, 2) Residual ConvNeXt upand downsampling blocks to preserve semantic richness across scales, 3) A noveltechnique to iteratively increase kernel sizes by upsampling small kernelnetworks, to prevent performance saturation on limited medical data, 4)Compound scaling at multiple levels (depth, width, kernel size) of MedNeXt.This leads to state-of-the-art performance on 4 tasks on CT and MRI modalitiesand varying dataset sizes, representing a modernized deep architecture formedical image segmentation. Our code is made publicly available at:", "output": "MedNeXt: Transformer-driven Scaling of ConvNets for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present LLaMA-Adapter, a lightweight adaption method to efficientlyfine-tune LLaMA into an instruction-following model. Using 52K self-instructdemonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters uponthe frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, andprepend them to the word tokens at higher transformer layers. Then, azero-initialized attention mechanism with zero gating is proposed, whichadaptively injects the new instructional cues into LLaMA, while effectivelypreserves its pre-trained knowledge. With our efficient training, LLaMA-Adaptercan generate high-quality responses, comparable to Alpaca with fully fine-tuned7B parameters. Besides language commands, our approach can be simply extendedto multi-modal instructions for learning image-conditioned LLaMA model, whichachieves superior reasoning performance on ScienceQA and COCO Captionbenchmarks. Furthermore, we also evaluate the zero-initialized attentionmechanism for fine-tuning other pre-trained models (ViT, RoBERTa) ontraditional vision and language tasks, demonstrating the superiorgeneralization capacity of our approach. Code is released at", "output": "LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study contextual combinatorial bandits with probabilistically triggeredarms (C$^2$MAB-T) under a variety of smoothness conditions that capture a widerange of applications, such as contextual cascading bandits and contextualinfluence maximization bandits. Under the triggering probability modulated(TPM) condition, we devise the C$^2$-UCB-T algorithm and propose a novelanalysis that achieves an $tilde{O}(dsqrt{KT})$ regret bound, removing apotentially exponentially large factor $O(1/p_{min})$, where $d$ is thedimension of contexts, $p_{min}$ is the minimum positive probability that anyarm can be triggered, and batch-size $K$ is the maximum number of arms that canbe triggered per round. Under the variance modulated (VM) or triggeringprobability and variance modulated (TPVM) conditions, we propose a newvariance-adaptive algorithm VAC$^2$-UCB and derive a regret bound$tilde{O}(dsqrt{T})$, which is independent of the batch-size $K$. As avaluable by-product, our analysis technique and variance-adaptive algorithm canbe applied to the CMAB-T and C$^2$MAB setting, improving existing results thereas well. We also include experiments that demonstrate the improved performanceof our algorithms compared with benchmark algorithms on synthetic andreal-world datasets.", "output": "Contextual Combinatorial Bandits with Probabilistically Triggered Arms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, the idea of using FP8 as a number format for neural networktraining has been floating around the deep learning world. Given that mosttraining is currently conducted with entire networks in FP32, or sometimes FP16with mixed-precision, the step to having some parts of a network run in FP8with 8-bit weights is an appealing potential speed-up for the generally costlyand time-intensive training procedures in deep learning. A natural questionarises regarding what this development means for efficient inference on edgedevices. In the efficient inference device world, workloads are frequentlyexecuted in INT8. Sometimes going even as low as INT4 when efficiency calls forit. In this whitepaper, we compare the performance for both the FP8 and INTformats for efficient on-device inference. We theoretically show the differencebetween the INT and FP formats for neural networks and present a plethora ofpost-training quantization and quantization-aware-training results to show howthis theory translates to practice. We also provide a hardware analysis showingthat the FP formats are somewhere between 50-180% less efficient in terms ofcompute in dedicated hardware than the INT format. Based on our research and aread of the research field, we conclude that although the proposed FP8 formatcould be good for training, the results for inference do not warrant adedicated implementation of FP8 in favor of INT8 for efficient inference. Weshow that our results are mostly consistent with previous findings but thatimportant comparisons between the formats have thus far been lacking. Finally,we discuss what happens when FP8-trained networks are converted to INT8 andconclude with a brief discussion on the most efficient way for on-devicedeployment and an extensive suite of INT8 results for many models.", "output": "FP8 versus INT8 for efficient deep learning inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a machine-learning-based tool for the Lean proof assistant thatsuggests relevant premises for theorems being proved by a user. The designprinciples for the tool are (1) tight integration with the proof assistant, (2)ease of use and installation, (3) a lightweight and fast approach. For thispurpose, we designed a custom version of the random forest model, trained in anonline fashion. It is implemented directly in Lean, which was possible thanksto the rich and efficient metaprogramming features of Lean 4. The random forestis trained on data extracted from mathlib -- Lean's mathematics library. Weexperiment with various options for producing training features and labels. Theadvice from a trained model is accessible to the user via the suggest_premisestactic which can be called in an editor while constructing a proofinteractively.", "output": "Machine-Learned Premise Selection for Lean."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Privacy-utility tradeoff remains as one of the fundamental issues ofdifferentially private machine learning. This paper introduces a geometricallyinspired kernel-based approach to mitigate the accuracy-loss issue inclassification. In this approach, a representation of the affine hull of givendata points is learned in Reproducing Kernel Hilbert Spaces (RKHS). This leadsto a novel distance measure that hides privacy-sensitive information aboutindividual data points and improves the privacy-utility tradeoff viasignificantly reducing the risk of membership inference attacks. Theeffectiveness of the approach is demonstrated through experiments on MNISTdataset, Freiburg groceries dataset, and a real biomedical dataset. It isverified that the approach remains computationally practical. The applicationof the approach to federated learning is considered and it is observed that theaccuracy-loss due to data being distributed is either marginal or notsignificantly high.", "output": "On Mitigating the Utility-Loss in Differentially Private Learning: A new Perspective by a Geometrically Inspired Kernel Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The intersection of machine learning and dynamical systems has generatedconsiderable interest recently. Neural Ordinary Differential Equations (NODEs)represent a rich overlap between these fields. In this paper, we develop acontinuous time neural network approach based on Delay Differential Equations(DDEs). Our model uses the adjoint sensitivity method to learn the modelparameters and delay directly from data. Our approach is inspired by that ofNODEs and extends earlier neural DDE models, which have assumed that the valueof the delay is known a priori. We perform a sensitivity analysis on ourproposed approach and demonstrate its ability to learn DDE parameters frombenchmark systems. We conclude our discussion with potential future directionsand applications.", "output": "Learning the Delay Using Neural Delay Differential Equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model inversion (MI) attacks aim to infer and reconstruct private trainingdata by abusing access to a model. MI attacks have raised concerns about theleaking of sensitive information (e.g. private face images used in training aface recognition system). Recently, several algorithms for MI have beenproposed to improve the attack performance. In this work, we revisit MI, studytwo fundamental issues pertaining to all state-of-the-art (SOTA) MI algorithms,and propose solutions to these issues which lead to a significant boost inattack performance for all SOTA MI. In particular, our contributions aretwo-fold: 1) We analyze the optimization objective of SOTA MI algorithms, arguethat the objective is sub-optimal for achieving MI, and propose an improvedoptimization objective that boosts attack performance significantly. 2) Weanalyze \"MI overfitting\", show that it would prevent reconstructed images fromlearning semantics of training data, and propose a novel \"model augmentation\"idea to overcome this issue. Our proposed solutions are simple and improve allSOTA MI attack accuracy significantly. E.g., in the standard CelebA benchmark,our solutions improve accuracy by 11.8% and achieve for the first time over 90%attack accuracy. Our findings demonstrate that there is a clear risk of leakingsensitive information from deep learning models. We urge serious considerationto be given to the privacy implications. Our code, demo, and models areavailable at", "output": "Re-thinking Model Inversion Attacks Against Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Selective prediction aims to learn a reliable model that abstains from makingpredictions when the model uncertainty is high. These predictions can then bedeferred to a human expert for further evaluation. In many real-worldscenarios, the distribution of test data is different from the training data.This results in more inaccurate predictions, necessitating increased humanlabeling, which can be difficult and expensive. Active learning circumventsthis by only querying the most informative examples and, in several cases, hasbeen shown to lower the overall labeling effort. In this work, we bridgeselective prediction and active learning, proposing a new learning paradigmcalled active selective prediction which learns to query more informativesamples from the shifted target domain while increasing accuracy and coverage.For this new problem, we propose a simple but effective solution, ASPEST, thatutilizes ensembles of model snapshots with self-training with their aggregatedoutputs as pseudo labels. Extensive experiments on numerous image, text andstructured datasets, particularly those suffer from domain shifts, demonstratethat our proposed method can significantly outperform prior work on selectiveprediction and active learning (e.g. on the MNIST$to$SVHN benchmark with thelabeling budget of $100$, ASPEST improves the AUC metric from $79.36%$ to$88.84%$) and achieves more optimal utilization of humans in the loop.", "output": "ASPEST: Bridging the Gap Between Active Learning and Selective Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human intelligence has the remarkable ability to assemble basic skills intocomplex ones so as to solve complex tasks. This ability is equally importantfor Artificial Intelligence (AI), and thus, we assert that in addition to thedevelopment of large, comprehensive intelligent models, it is equally crucialto equip such models with the capability to harness various domain-specificexpert models for complex task-solving in the pursuit of Artificial GeneralIntelligence (AGI). Recent developments in Large Language Models (LLMs) havedemonstrated remarkable learning and reasoning abilities, making them promisingas a controller to select, synthesize, and execute external models to solvecomplex tasks. In this project, we develop OpenAGI, an open-source AGI researchplatform, specifically designed to offer complex, multi-step tasks andaccompanied by task-specific datasets, evaluation metrics, and a diverse rangeof extensible models. OpenAGI formulates complex tasks as natural languagequeries, serving as input to the LLM. The LLM subsequently selects,synthesizes, and executes models provided by OpenAGI to address the task.Furthermore, we propose a Reinforcement Learning from Task Feedback (RLTF)mechanism, which uses the task-solving result as feedback to improve the LLM'stask-solving ability. Thus, the LLM is responsible for synthesizing variousexternal models for solving complex tasks, while RLTF provides feedback toimprove its task-solving ability, enabling a feedback loop for self-improvingAI. We believe that the paradigm of LLMs operating various expert models forcomplex task-solving is a promising approach towards AGI. To facilitate thecommunity's long-term improvement and evaluation of AGI's ability, weopen-source the code, benchmark, and evaluation methods of the OpenAGI projectat ", "output": "OpenAGI: When LLM Meets Domain Experts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing novice-friendly machine learning (ML) modeling tools center around asolo user experience, where a single user collects only their own data to builda model. However, solo modeling experiences limit valuable opportunities forencountering alternative ideas and approaches that can arise when learners worktogether; consequently, it often precludes encountering critical issues in MLaround data representation and diversity that can surface when differentperspectives are manifested in a group-constructed data set. To address thisissue, we created Co-ML -- a tablet-based app for learners to collaborativelybuild ML image classifiers through an end-to-end, iterative model-buildingprocess. In this paper, we illustrate the feasibility and potential richness ofcollaborative modeling by presenting an in-depth case study of a family (twochildren 11 and 14-years-old working with their parents) using Co-ML in afacilitated introductory ML activity at home. We share the Co-ML system designand contribute a discussion of how using Co-ML in a collaborative activityenabled beginners to collectively engage with dataset design considerationsunderrepresented in prior work such as data diversity, class imbalance, anddata quality. We discuss how a distributed collaborative process, in whichindividuals can take on different model-building responsibilities, provides arich context for children and adults to learn ML dataset design.", "output": "Collaborative Machine Learning Model Building with Families Using Co-ML."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Electrocardiography is the most common method to investigate the condition ofthe heart through the observation of cardiac rhythm and electrical activity,for both diagnosis and monitoring purposes. Analysis of electrocardiograms(ECGs) is commonly performed through the investigation of specific patterns,which are visually recognizable by trained physicians and are known to reflectcardiac (dis)function. In this work we study the use of $beta$-variationalautoencoders (VAEs) as an explainable feature extractor, and improve on itspredictive capacities by jointly optimizing signal reconstruction and cardiacfunction prediction. The extracted features are then used for cardiac functionprediction using logistic regression. The method is trained and tested on datafrom 7255 patients, who were treated for acute coronary syndrome at the LeidenUniversity Medical Center between 2010 and 2021. The results show that ourmethod significantly improved prediction and explainability compared to avanilla $beta$-VAE, while still yielding similar reconstruction performance.", "output": "Joint optimization of a $\\beta$-VAE for ECG task-specific feature extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs), such as ChatGPT and GPT-4, have dramaticallytransformed natural language processing research and shown promising stridestowards Artificial General Intelligence (AGI). Nonetheless, the high costsassociated with training and deploying LLMs present substantial obstacles totransparent, accessible academic research. While several large language models,such as LLaMA, have been open-sourced by the community, these predominantlyfocus on English corpora, limiting their usefulness for other languages. Inthis paper, we propose a method to augment LLaMA with capabilities forunderstanding and generating Chinese text and its ability to followinstructions. We achieve this by extending LLaMA's existing vocabulary with anadditional 20,000 Chinese tokens, thereby improving its encoding efficiency andsemantic understanding of Chinese. We further incorporate secondarypre-training using Chinese data and fine-tune the model with Chineseinstruction datasets, significantly enhancing the model's ability to comprehendand execute instructions. Our experimental results indicate that the newlyproposed model markedly enhances the original LLaMA's proficiency inunderstanding and generating Chinese content. Additionally, the results on theC-Eval dataset yield competitive performance among the models with severaltimes the size of ours. We have made our pre-trained models, training scripts,and other resources available through GitHub, fostering open research for ourcommunity. GitHub repository: ", "output": "Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Chen et al. [Chen2022] recently published the article 'Fast and scalablesearch of whole-slide images via self-supervised deep learning' in NatureBiomedical Engineering. The authors call their method 'self-supervised imagesearch for histology', short SISH. We express our concerns that SISH is anincremental modification of Yottixel, has used MinMax binarization but does notcite the original works, and is based on a misnomer 'self-supervised imagesearch'. As well, we point to several other concerns regarding experiments andcomparisons performed by Chen et al.", "output": "Comments on 'Fast and scalable search of whole-slide images via self-supervised deep learning'."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Humans possess an extraordinary ability to create and utilize tools, allowingthem to overcome physical limitations and explore new frontiers. With theadvent of foundation models, AI systems have the potential to be equally adeptin tool use as humans. This paradigm, i.e., tool learning with foundationmodels, combines the strengths of specialized tools and foundation models toachieve enhanced accuracy, efficiency, and automation in problem-solving.Despite its immense potential, there is still a lack of a comprehensiveunderstanding of key challenges, opportunities, and future endeavors in thisfield. To this end, we present a systematic investigation of tool learning inthis paper. We first introduce the background of tool learning, including itscognitive origins, the paradigm shift of foundation models, and thecomplementary roles of tools and models. Then we recapitulate existing toollearning research into tool-augmented and tool-oriented learning. We formulatea general tool learning framework: starting from understanding the userinstruction, models should learn to decompose a complex task into severalsubtasks, dynamically adjust their plan through reasoning, and effectivelyconquer each sub-task by selecting appropriate tools. We also discuss how totrain models for improved tool-use capabilities and facilitate thegeneralization in tool learning. Considering the lack of a systematic toollearning evaluation in prior works, we experiment with 18 representative toolsand show the potential of current foundation models in skillfully utilizingtools. Finally, we discuss several open problems that require furtherinvestigation for tool learning. Overall, we hope this paper could inspirefuture research in integrating tools with foundation models.", "output": "Tool Learning with Foundation Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Estimating heterogeneous treatment effects from observational data is acrucial task across many fields, helping policy and decision-makers take betteractions. There has been recent progress on robust and efficient methods forestimating the conditional average treatment effect (CATE) function, but thesemethods often do not take into account the risk of hidden confounding, whichcould arbitrarily and unknowingly bias any causal estimate based onobservational data. We propose a meta-learner called the B-Learner, which canefficiently learn sharp bounds on the CATE function under limits on the levelof hidden confounding. We derive the B-Learner by adapting recent results forsharp and valid bounds of the average treatment effect (Dorn et al., 2021) intothe framework given by Kallus &amp; Oprescu (2023) for robust and model-agnosticlearning of conditional distributional treatment effects. The B-Learner can useany function estimator such as random forests and deep neural networks, and weprove its estimates are valid, sharp, efficient, and have a quasi-oracleproperty with respect to the constituent estimators under more generalconditions than existing methods. Semi-synthetic experimental comparisonsvalidate the theoretical findings, and we use real-world data to demonstratehow the method might be used in practice.", "output": "B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under Hidden Confounding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a local energy distribution based hyperparameterdetermination for stochastic simulated annealing (SSA). SSA is capable ofsolving combinatorial optimization problems faster than typical simulatedannealing (SA), but requires a time-consuming hyperparameter search. Theproposed method determines hyperparameters based on the local energydistributions of spins (probabilistic bits). The spin is a basic computingelement of SSA and is graphically connected to other spins with its weights.The distribution of the local energy can be estimated based on the centrallimit theorem (CLT). The CLT-based normal distribution is used to determine thehyperparameters, which reduces the time complexity for hyperparameter searchfrom O(n^3) of the conventional method to O(1). The performance of SSA with thedetermined hyperparameters is evaluated on the Gset and K2000 benchmarks formaximum-cut problems. The results show that the proposed method achieves meancut values of approximately 98% of the best-known cut values.", "output": "Local Energy Distribution Based Hyperparameter Determination for Stochastic Simulated Annealing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large data-driven image models are extensively used to support creative andartistic work. Under the currently predominant distribution-fitting paradigm, adataset is treated as ground truth to be approximated as closely as possible.Yet, many creative applications demand a diverse range of output, and creatorsoften strive to actively diverge from a given data distribution. We argue thatan adjustment of modelling objectives, from pure mode coverage towards modebalancing, is necessary to accommodate the goal of higher output diversity. Wepresent diversity weights, a training scheme that increases a model's outputdiversity by balancing the modes in the training dataset. First experiments ina controlled setting demonstrate the potential of our method. We discussconnections of our approach to diversity, equity, and inclusion in generativemachine learning more generally, and computational creativity specifically. Animplementation of our algorithm is available at", "output": "Towards Mode Balancing of Generative Models via Diversity Weights."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite their many desirable properties, Gaussian processes (GPs) are oftencompared unfavorably to deep neural networks (NNs) for lacking the ability tolearn representations. Recent efforts to bridge the gap between GPs and deepNNs have yielded a new class of inter-domain variational GPs in which theinducing variables correspond to hidden units of a feedforward NN. In thiswork, we examine some practical issues associated with this approach andpropose an extension that leverages the orthogonal decomposition of GPs tomitigate these limitations. In particular, we introduce spherical inter-domainfeatures to construct more flexible data-dependent basis functions for both theprincipal and orthogonal components of the GP approximation and show thatincorporating NN activation features under this framework not only alleviatesthese shortcomings but is more scalable than alternative strategies.Experiments on multiple benchmark datasets demonstrate the effectiveness of ourapproach.", "output": "Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion-based models have shown the merits of generating high-qualityvisual data while preserving better diversity in recent studies. However, suchobservation is only justified with curated data distribution, where the datasamples are nicely pre-processed to be uniformly distributed in terms of theirlabels. In practice, a long-tailed data distribution appears more common andhow diffusion models perform on such class-imbalanced data remains unknown. Inthis work, we first investigate this problem and observe significantdegradation in both diversity and fidelity when the diffusion model is trainedon datasets with class-imbalanced distributions. Especially in tail classes,the generations largely lose diversity and we observe severe mode-collapseissues. To tackle this problem, we set from the hypothesis that the datadistribution is not class-balanced, and propose Class-Balancing DiffusionModels (CBDM) that are trained with a distribution adjustment regularizer as asolution. Experiments show that images generated by CBDM exhibit higherdiversity and quality in both quantitative and qualitative ways. Our methodbenchmarked the generation results on CIFAR100/CIFAR100LT dataset and showsoutstanding performance on the downstream recognition task.", "output": "Class-Balancing Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spellchecking is one of the most fundamental and widely used search features.Correcting incorrectly spelled user queries not only enhances the userexperience but is expected by the user. However, most widely availablespellchecking solutions are either lower accuracy than state-of-the-artsolutions or too slow to be used for search use cases where latency is a keyrequirement. Furthermore, most innovative recent architectures focus on Englishand are not trained in a multilingual fashion and are trained for spellcorrection in longer text, which is a different paradigm from spell correctionfor user queries, where context is sparse (most queries are 1-2 words long).Finally, since most enterprises have unique vocabularies such as product names,off-the-shelf spelling solutions fall short of users' needs. In this work, webuild a multilingual spellchecker that is extremely fast and scalable and thatadapts its vocabulary and hence speller output based on a specific product'sneeds. Furthermore, our speller out-performs general purpose spellers by a widemargin on in-domain datasets. Our multilingual speller is used in search inAdobe products, powering autocomplete in various applications.", "output": "Contextual Multilingual Spellchecker for User Queries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We develop information-geometric techniques to analyze the trajectories ofthe predictions of deep networks during training. By examining the underlyinghigh-dimensional probabilistic models, we reveal that the training processexplores an effectively low-dimensional manifold. Networks with a wide range ofarchitectures, sizes, trained using different optimization methods,regularization techniques, data augmentation techniques, and weightinitializations lie on the same manifold in the prediction space. We study thedetails of this manifold to find that networks with different architecturesfollow distinguishable trajectories but other factors have a minimal influence;larger networks train along a similar manifold as that of smaller networks,just faster; and networks initialized at very different parts of the predictionspace converge to the solution along a similar manifold.", "output": "The Training Process of Many Deep Networks Explores the Same Low-Dimensional Manifold."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurately segmenting blood vessels in retinal fundus images is crucial inthe early screening, diagnosing, and evaluating some ocular diseases, yet itposes a nontrivial uncertainty for the segmentation task due to various factorssuch as significant light variations, uneven curvilinear structures, andnon-uniform contrast. As a result, a useful approach based on multipleattention mechanisms and deep learning is proposed to accurately detect bloodvessels in retinal fundus images. To enrich contextual information for the lossof scene information compensation, an attention fusion mechanism that combinesthe channel attention with spatial attention mechanisms constructed byTransformer is employed to extract various features of blood vessels fromretinal fundus images in both spatial and channel dimensions. Subsequently, aunique spatial attention mechanism is introduced in the skip connection tofilter out redundant information and noise from low-level features, thusenabling better integration with high-level features. In addition, a DropOutlayer is employed to randomly discard some neurons, which can preventoverfitting of the deep learning network and improve its generalizationperformance.", "output": "Fundus vascular image segmentation based on multiple attention mechanisms and deep learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale pre-training and instruction tuning have been successful atcreating general-purpose language models with broad competence. However,building general-purpose vision-language models is challenging due to the richinput distributions and task diversity resulting from the additional visualinput. Although vision-language pretraining has been widely studied,vision-language instruction tuning remains under-explored. In this paper, weconduct a systematic and comprehensive study on vision-language instructiontuning based on the pretrained BLIP-2 models. We gather 26 publicly availabledatasets, covering a wide variety of tasks and capabilities, and transform theminto instruction tuning format. Additionally, we introduce an instruction-awareQuery Transformer, which extracts informative features tailored to the giveninstruction. Trained on 13 held-in datasets, InstructBLIP attainsstate-of-the-art zero-shot performance across all 13 held-out datasets,substantially outperforming BLIP-2 and larger Flamingo models. Our models alsolead to state-of-the-art performance when finetuned on individual downstreamtasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts).Furthermore, we qualitatively demonstrate the advantages of InstructBLIP overconcurrent multimodal models. All InstructBLIP models are open-sourced at", "output": "InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spectral-temporal graph neural network is a promising abstraction underlyingmost time series forecasting models that are based on graph neural networks(GNNs). However, more is needed to know about the underpinnings of this branchof methods. In this paper, we establish a theoretical framework that unravelsthe expressive power of spectral-temporal GNNs. Our results show that linearspectral-temporal GNNs are universal under mild assumptions, and theirexpressive power is bounded by our extended first-order Weisfeiler-Lemanalgorithm on discrete-time dynamic graphs. To make our findings useful inpractice on valid instantiations, we discuss related constraints in detail andoutline a theoretical blueprint for designing spatial and temporal modules inspectral domains. Building on these insights and to demonstrate how powerfulspectral-temporal GNNs are based on our framework, we propose a simpleinstantiation named Temporal Graph GegenConv (TGC), which significantlyoutperforms most existing models with only linear components and shows bettermodel efficiency.", "output": "How Expressive are Spectral-Temporal Graph Neural Networks for Time Series Forecasting?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, polyp segmentation has gained significant importance, andmany methods have been developed using CNN, Vision Transformer, and Transformertechniques to achieve competitive results. However, these methods often facedifficulties when dealing with out-of-distribution datasets, missingboundaries, and small polyps. In 2022, Meta-Former was introduced as a newbaseline for vision, which not only improved the performance of multi-taskcomputer vision but also addressed the limitations of the Vision Transformerand CNN family backbones. To further enhance segmentation, we propose a fusionof Meta-Former with UNet, along with the introduction of a Multi-scaleUpsampling block with a level-up combination in the decoder stage to enhancethe texture, also we propose the Convformer block base on the idea of theMeta-former to enhance the crucial information of the local feature. Theseblocks enable the combination of global information, such as the overall shapeof the polyp, with local information and boundary information, which is crucialfor the decision of the medical segmentation. Our proposed approach achievedcompetitive performance and obtained the top result in the State of the Art onthe CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG,others are out-of-distribution datasets. The implementation can be found at:", "output": "Meta-Polyp: a baseline for efficient Polyp segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a novel transformer architecture for graph representationlearning. The core insight of our method is to fully consider the informationpropagation among nodes and edges in a graph when building the attention modulein the transformer blocks. Specifically, we propose a new attention mechanismcalled Graph Propagation Attention (GPA). It explicitly passes the informationamong nodes and edges in three ways, i.e. node-to-node, node-to-edge, andedge-to-node, which is essential for learning graph-structured data. On thisbasis, we design an effective transformer architecture named Graph PropagationTransformer (GPTrans) to further help learn graph data. We verify theperformance of GPTrans in a wide range of graph learning experiments on severalbenchmark datasets. These results show that our method outperforms manystate-of-the-art transformer-based graph models with better performance. Thecode will be released at ", "output": "Graph Propagation Transformer for Graph Representation Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Millions of slum dwellers suffer from poor accessibility to urban servicesdue to inadequate road infrastructure within slums, and road planning for slumsis critical to the sustainable development of cities. Existing re-blocking orheuristic methods are either time-consuming which cannot generalize todifferent slums, or yield sub-optimal road plans in terms of accessibility andconstruction costs. In this paper, we present a deep reinforcement learningbased approach to automatically layout roads for slums. We propose a genericgraph model to capture the topological structure of a slum, and devise a novelgraph neural network to select locations for the planned roads. Through maskedpolicy optimization, our model can generate road plans that connect places in aslum at minimal construction costs. Extensive experiments on real-world slumsin different countries verify the effectiveness of our model, which cansignificantly improve accessibility by 14.3% against existing baseline methods.Further investigations on transferring across different tasks demonstrate thatour model can master road planning skills in simple scenarios and adapt them tomuch more complicated ones, indicating the potential of applying our model inreal-world slum upgrading. The code and data are available at", "output": "Road Planning for Slums via Deep Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Indoor localization has gained significant attention in recent years due toits various applications in smart homes, industrial automation, and healthcare,especially since more people rely on their wireless devices for location-basedservices. Deep learning-based solutions have shown promising results inaccurately estimating the position of wireless devices in indoor environmentsusing wireless parameters such as Channel State Information (CSI) and ReceivedSignal Strength Indicator (RSSI). However, despite the success of deeplearning-based approaches in achieving high localization accuracy, these modelssuffer from a lack of generalizability and can not be readily-deployed to newenvironments or operate in dynamic environments without retraining. In thispaper, we propose meta-learning-based localization models to address the lackof generalizability that persists in conventionally trained DL-basedlocalization models. Furthermore, since meta-learning algorithms requirediverse datasets from several different scenarios, which can be hard to collectin the context of localization, we design and propose a new meta-learningalgorithm, TB-MAML (Task Biased Model Agnostic Meta Learning), intended tofurther improve generalizability when the dataset is limited. Lastly, weevaluate the performance of TB-MAML-based localization against conventionallytrained localization models and localization done using other meta-learningalgorithms.", "output": "A Meta-learning based Generalizable Indoor Localization Model using Channel State Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural architecture search (NAS) for Graph neural networks (GNNs), calledNAS-GNNs, has achieved significant performance over manually designed GNNarchitectures. However, these methods inherit issues from the conventional NASmethods, such as high computational cost and optimization difficulty. Moreimportantly, previous NAS methods have ignored the uniqueness of GNNs, whereGNNs possess expressive power without training. With the randomly-initializedweights, we can then seek the optimal architecture parameters via the sparsecoding objective and derive a novel NAS-GNNs method, namely neural architecturecoding (NAC). Consequently, our NAC holds a no-update scheme on GNNs and canefficiently compute in linear time. Empirical evaluations on multiple GNNbenchmark datasets demonstrate that our approach leads to state-of-the-artperformance, which is up to $200times$ faster and $18.8%$ more accurate thanthe strong baselines.", "output": "Do Not Train It: A Linear Neural Architecture Search of Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Biogenic Volatile Organic Compounds (BVOCs) emitted from the terrestrialecosystem into the Earth's atmosphere are an important component of atmosphericchemistry. Due to the scarcity of measurement, a reliable enhancement of BVOCsemission maps can aid in providing denser data for atmospheric chemical,climate, and air quality models. In this work, we propose a strategy tosuper-resolve coarse BVOC emission maps by simultaneously exploiting thecontributions of different compounds. To this purpose, we first accuratelyinvestigate the spatial inter-connections between several BVOC species. Then,we exploit the found similarities to build a Multi-Image Super-Resolution(MISR) system, in which a number of emission maps associated with diversecompounds are aggregated to boost Super-Resolution (SR) performance. We comparedifferent configurations regarding the species and the number of joined BVOCs.Our experimental results show that incorporating BVOCs' relationship into theprocess can substantially improve the accuracy of the super-resolved maps.Interestingly, the best results are achieved when we aggregate the emissionmaps of strongly uncorrelated compounds. This peculiarity seems to confirm whatwas already guessed for other data-domains, i.e., joined uncorrelatedinformation are more helpful than correlated ones to boost MISR performance.Nonetheless, the proposed work represents the first attempt in SR of BVOCemissions through the fusion of multiple different compounds.", "output": "Multi-BVOC Super-Resolution Exploiting Compounds Inter-Connection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic occupancy prediction aims to infer dense geometry and semantics ofsurroundings for an autonomous agent to operate safely in the 3D environment.Existing occupancy prediction methods are almost entirely trained onhuman-annotated volumetric data. Although of high quality, the generation ofsuch 3D annotations is laborious and costly, restricting them to a few specificobject categories in the training dataset. To address this limitation, thispaper proposes Open Vocabulary Occupancy (OVO), a novel approach that allowssemantic occupancy prediction of arbitrary classes but without the need for 3Dannotations during training. Keys to our approach are (1) knowledgedistillation from a pre-trained 2D open-vocabulary segmentation model to the 3Doccupancy network, and (2) pixel-voxel filtering for high-quality training datageneration. The resulting framework is simple, compact, and compatible withmost state-of-the-art semantic occupancy prediction models. On NYUv2 andSemanticKITTI datasets, OVO achieves competitive performance compared tosupervised semantic occupancy prediction approaches. Furthermore, we conductextensive analyses and ablation studies to offer insights into the design ofthe proposed framework. Our code is publicly available at", "output": "OVO: Open-Vocabulary Occupancy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The current trend of scaling language models involves increasing bothparameter count and training dataset size. Extrapolating this trend suggeststhat training dataset size may soon be limited by the amount of text dataavailable on the internet. Motivated by this limit, we investigate scalinglanguage models in data-constrained regimes. Specifically, we run a large setof experiments varying the extent of data repetition and compute budget,ranging up to 900 billion training tokens and 9 billion parameter models. Wefind that with constrained data for a fixed compute budget, training with up to4 epochs of repeated data yields negligible changes to loss compared to havingunique data. However, with more repetition, the value of adding computeeventually decays to zero. We propose and empirically validate a scaling lawfor compute optimality that accounts for the decreasing value of repeatedtokens and excess parameters. Finally, we experiment with approaches mitigatingdata scarcity, including augmenting the training dataset with code data orremoving commonly used filters. Models and datasets from our 400 training runsare freely available at ", "output": "Scaling Data-Constrained Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to fuse frozen text-only large language models (LLMs)with pre-trained image encoder and decoder models, by mapping between theirembedding spaces. Our model demonstrates a wide suite of multimodalcapabilities: image retrieval, novel image generation, and multimodal dialogue.Ours is the first approach capable of conditioning on arbitrarily interleavedimage and text inputs to generate coherent image (and text) outputs. To achievestrong performance on image generation, we propose an efficient mapping networkto ground the LLM to an off-the-shelf text-to-image generation model. Thismapping network translates hidden representations of text into the embeddingspace of the visual models, enabling us to leverage the strong textrepresentations of the LLM for visual outputs. Our approach outperformsbaseline generation models on tasks with longer and more complex language. Inaddition to novel image generation, our model is also capable of imageretrieval from a prespecified dataset, and decides whether to retrieve orgenerate at inference time. This is done with a learnt decision module whichconditions on the hidden representations of the LLM. Our model exhibits a widerrange of capabilities compared to prior multimodal language models. It canprocess image-and-text inputs, and produce retrieved images, generated images,and generated text -- outperforming non-LLM based generation models acrossseveral text-to-image tasks that measure context dependence.", "output": "Generating Images with Multimodal Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider (stochastic) subgradient methods for strongly convex butpotentially nonsmooth non-Lipschitz optimization. We provide new equivalentdual descriptions (in the style of dual averaging) for the classic subgradientmethod, the proximal subgradient method, and the switching subgradient method.These equivalences enable $O(1/T)$ convergence guarantees in terms of boththeir classic primal gap and a not previously analyzed dual gap for stronglyconvex optimization. Consequently, our theory provides these classic methodswith simple, optimal stopping criteria and optimality certificates at no addedcomputational cost. Our results apply under nearly any stepsize selection andfor a range of non-Lipschitz ill-conditioned problems where the earlyiterations of the subgradient method may diverge exponentially quickly (aphenomenon which, to the best of our knowledge, no prior works address). Evenin the presence of such undesirable behaviors, our theory still ensures andbounds eventual convergence.", "output": "Some Primal-Dual Theory for Subgradient Methods for Strongly Convex Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Stochastic Gradient Descent (SGD) algorithms are widely used in optimizingneural networks, with Random Reshuffling (RR) and Single Shuffle (SS) beingpopular choices for cycling through random or single permutations of thetraining data. However, the convergence properties of these algorithms in thenon-convex case are not fully understood. Existing results suggest that, inrealistic training scenarios where the number of epochs is smaller than thetraining set size, RR may perform worse than SGD.In this paper, we analyze a general SGD algorithm that allows for arbitrarydata orderings and show improved convergence rates for non-convex functions.Specifically, our analysis reveals that SGD with random and single shuffling isalways faster or at least as good as classical SGD with replacement, regardlessof the number of iterations. Overall, our study highlights the benefits ofusing SGD with random/single shuffling and provides new insights into itsconvergence properties for non-convex optimization.", "output": "Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automatic summarization of legal case judgements has traditionally beenattempted by using extractive summarization methods. However, in recent years,abstractive summarization models are gaining popularity since they can generatemore natural and coherent summaries. Legal domain-specific pre-trainedabstractive summarization models are now available. Moreover, general-domainpre-trained Large Language Models (LLMs), such as ChatGPT, are known togenerate high-quality text and have the capacity for text summarization. Henceit is natural to ask if these models are ready for off-the-shelf application toautomatically generate abstractive summaries for case judgements. To explorethis question, we apply several state-of-the-art domain-specific abstractivesummarization models and general-domain LLMs on Indian court case judgements,and check the quality of the generated summaries. In addition to standardmetrics for summary quality, we check for inconsistencies and hallucinations inthe summaries. We see that abstractive summarization models generally achieveslightly higher scores than extractive models in terms of standard summaryevaluation metrics such as ROUGE and BLEU. However, we often find inconsistentor hallucinated information in the generated abstractive summaries. Overall,our investigation indicates that the pre-trained abstractive summarizationmodels and LLMs are not yet ready for fully automatic deployment for casejudgement summarization; rather a human-in-the-loop approach including manualchecks for inconsistencies is more suitable at present.", "output": "How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative AI has experienced remarkable growth in recent years, leading to awide array of applications across diverse domains. In this paper, we present acomprehensive survey of more than 350 generative AI applications, providing astructured taxonomy and concise descriptions of various unimodal and evenmultimodal generative AIs. The survey is organized into sections, covering awide range of unimodal generative AI applications such as text, images, video,gaming and brain information. Our survey aims to serve as a valuable resourcefor researchers and practitioners to navigate the rapidly expanding landscapeof generative AI, facilitating a better understanding of the currentstate-of-the-art and fostering further innovation in the field.", "output": "A survey of Generative AI Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The objective of the project is to explore synergies between classicalcontrol algorithms such as PID and contemporary reinforcement learningalgorithms to come up with a pragmatic control mechanism to control theCrazyFlie 2.X quadrotor. The primary objective would be performing PID tuningusing reinforcement learning strategies. The secondary objective is to leveragethe learnings from the first task to implement control for navigation byintegrating with the lighthouse positioning system. Two approaches areconsidered for navigation, a discrete navigation problem using Deep Q-Learningwith finite predefined motion primitives, and deep reinforcement learning for acontinuous navigation approach. Simulations for RL training will be performedon gym-pybullet-drones, an open-source gym-based environment for reinforcementlearning, and the RL implementations are provided by stable-baselines3", "output": "Reinforcement Learning-Based Control of CrazyFlie 2.X Quadrotor."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we describe and analyze an island-based random dynamic voltagescaling (iRDVS) approach to thwart power side-channel attacks. We first analyzethe impact of the number of independent voltage islands on the resultingsignal-to-noise ratio and trace misalignment. As part of our analysis ofmisalignment, we propose a novel unsupervised machine learning (ML) basedattack that is effective on systems with three or fewer independent voltages.Our results show that iRDVS with four voltage islands, however, cannot bebroken with 200k encryption traces, suggesting that iRDVS can be effective. Wefinish the talk by describing an iRDVS test chip in a 12nm FinFet process thatincorporates three variants of an AES-256 accelerator, all originating from thesame RTL. This included a synchronous core, an asynchronous core with noprotection, and a core employing the iRDVS technique using asynchronous logic.Lab measurements from the chips indicated that both unprotected variants failedthe test vector leakage assessment (TVLA) security metric test, while the iRDVSwas proven secure in a variety of configurations.", "output": "Island-based Random Dynamic Voltage Scaling vs ML-Enhanced Power Side-Channel Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider robust empirical risk minimization (ERM), where model parametersare chosen to minimize the worst-case empirical loss when each data pointvaries over a given convex uncertainty set. In some simple cases, such problemscan be expressed in an analytical form. In general the problem can be madetractable via dualization, which turns a min-max problem into a min-minproblem. Dualization requires expertise and is tedious and error-prone. Wedemonstrate how CVXPY can be used to automate this dualization procedure in auser-friendly manner. Our framework allows practitioners to specify and solverobust ERM problems with a general class of convex losses, capturing manystandard regression and classification problems. Users can easily specify anycomplex uncertainty set that is representable via disciplined convexprogramming (DCP) constraints.", "output": "Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Phylogenetics is now fundamental in life sciences, providing insights intothe earliest branches of life and the origins and spread of epidemics. However,finding suitable phylogenies from the vast space of possible trees remainschallenging. To address this problem, for the first time, we perform both treeexploration and inference in a continuous space where the computation ofgradients is possible. This continuous relaxation allows for major leaps acrosstree space in both rooted and unrooted trees, and is less susceptible toconvergence to local minima. Our approach outperforms the current best methodsfor inference on unrooted trees and, in simulation, accurately infers the treeand root in ultrametric cases. The approach is effective in cases of empiricaldata with negligible amounts of data, which we demonstrate on the phylogeny ofjawed vertebrates. Indeed, only a few genes with an ultrametric signal weregenerally sufficient for resolving the major lineages of vertebrate. Withcubic-time complexity and efficient optimisation via automatic differentiation,our method presents an effective way forwards for exploring the most difficult,data-deficient phylogenetic questions.", "output": "Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interpretable time series prediction is crucial for safety-critical areassuch as healthcare and autonomous driving. Most existing methods focus oninterpreting predictions by assigning important scores to segments of timeseries. In this paper, we take a different and more challenging route and aimat developing a self-interpretable model, dubbed Counterfactual Time Series(CounTS), which generates counterfactual and actionable explanations for timeseries predictions. Specifically, we formalize the problem of time seriescounterfactual explanations, establish associated evaluation protocols, andpropose a variational Bayesian deep learning model equipped with counterfactualinference capability of time series abduction, action, and prediction. Comparedwith state-of-the-art baselines, our self-interpretable model can generatebetter counterfactual explanations while maintaining comparable predictionaccuracy.", "output": "Self-Interpretable Time Series Prediction with Counterfactual Explanations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose a novel adversarial defence mechanism for imageclassification -- CARSO -- inspired by cues from cognitive neuroscience. Themethod is synergistically complementary to adversarial training and relies onknowledge of the internal representation of the attacked classifier. Exploitinga generative model for adversarial purification, conditioned on suchrepresentation, it samples reconstructions of inputs to be finally classified.Experimental evaluation by a well-established benchmark of varied, strongadaptive attacks, across diverse image datasets and classifier architectures,shows that CARSO is able to defend the classifier significantly better thanstate-of-the-art adversarial training alone -- with a tolerable clean accuracytoll. Furthermore, the defensive architecture succeeds in effectively shieldingitself from unforeseen threats, and end-to-end attacks adapted to foolstochastic defences. Code and pre-trained models are available at .", "output": "CARSO: Counter-Adversarial Recall of Synthetic Observations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Leveraging second-order information at the scale of deep networks is one ofthe main lines of approach for improving the performance of current optimizersfor deep learning. Yet, existing approaches for accurate full-matrixpreconditioning, such as Full-Matrix Adagrad (GGT) or Matrix-Free ApproximateCurvature (M-FAC) suffer from massive storage costs when applied even tomedium-scale models, as they must store a sliding window of gradients, whosememory requirements are multiplicative in the model dimension. In this paper,we address this issue via an efficient and simple-to-implement error-feedbacktechnique that can be applied to compress preconditioners by up to two ordersof magnitude in practice, without loss of convergence. Specifically, ourapproach compresses the gradient information via sparsification or low-rankcompression emph{before} it is fed into the preconditioner, feeding thecompression error back into future iterations. Extensive experiments on deepneural networks for vision show that this approach can compress full-matrixpreconditioners by up to two orders of magnitude without impact on accuracy,effectively removing the memory overhead of full-matrix preconditioning forimplementations of full-matrix Adagrad (GGT) and natural gradient (M-FAC). Ourcode is available at ", "output": "Error Feedback Can Accurately Compress Preconditioners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-time 3D fluorescence microscopy is crucial for the spatiotemporalanalysis of live organisms, such as neural activity monitoring. The eXtendedfield-of-view light field microscope (XLFM), also known as Fourier light fieldmicroscope, is a straightforward, single snapshot solution to achieve this. TheXLFM acquires spatial-angular information in a single camera exposure. In asubsequent step, a 3D volume can be algorithmically reconstructed, making itexceptionally well-suited for real-time 3D acquisition and potential analysis.Unfortunately, traditional reconstruction methods (like deconvolution) requirelengthy processing times (0.0220 Hz), hampering the speed advantages of theXLFM. Neural network architectures can overcome the speed constraints at theexpense of lacking certainty metrics, which renders them untrustworthy for thebiomedical realm. This work proposes a novel architecture to perform fast 3Dreconstructions of live immobilized zebrafish neural activity based on aconditional normalizing flow. It reconstructs volumes at 8 Hz spanning512x512x96 voxels, and it can be trained in under two hours due to the smalldataset requirements (10 image-volume pairs). Furthermore, normalizing flowsallow for exact Likelihood computation, enabling distribution monitoring,followed by out-of-distribution detection and retraining of the system when anovel sample is detected. We evaluate the proposed method on a cross-validationapproach involving multiple in-distribution samples (genetically identicalzebrafish) and various out-of-distribution ones.", "output": "Fast light-field 3D microscopy with out-of-distribution detection and adaptation through Conditional Normalizing Flows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Conventional solvers are often computationally expensive for constrainedoptimization, particularly in large-scale and time-critical problems. Whilethis leads to a growing interest in using neural networks (NNs) as fast optimalsolution approximators, incorporating the constraints with NNs is challenging.In this regard, we propose deep Lagrange dual with equality embedding(DeepLDE), a framework that learns to find an optimal solution without usinglabels. To ensure feasible solutions, we embed equality constraints into theNNs and train the NNs using the primal-dual method to impose inequalityconstraints. Furthermore, we prove the convergence of DeepLDE and show that theprimal-dual learning method alone cannot ensure equality constraints withoutthe help of equality embedding. Simulation results on convex, non-convex, andAC optimal power flow (AC-OPF) problems show that the proposed DeepLDE achievesthe smallest optimality gap among all the NN-based approaches while alwaysensuring feasible solutions. Furthermore, the computation time of the proposedmethod is about 5 to 250 times faster than DC3 and the conventional solvers insolving constrained convex, non-convex optimization, and/or AC-OPF.", "output": "Self-supervised Equality Embedded Deep Lagrange Dual for Approximate Constrained Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advent of 2019 Coronavirus (COVID-19) has engendered a momentous globalhealth crisis, necessitating the identification of the ailment in individualsthrough diverse diagnostic modalities. Radiological imaging, particularly thedeployment of X-ray imaging, has been recognized as a pivotal instrument in thedetection and characterization of COVID-19. Recent investigations have unveiledinvaluable insights pertaining to the virus within X-ray images, instigatingthe exploration of methodologies aimed at augmenting diagnostic accuracythrough the utilization of artificial intelligence (AI) techniques. The currentresearch endeavor posits an innovative framework for the automated diagnosis ofCOVID-19, harnessing raw chest X-ray images, specifically by means offine-tuning pre-trained Vision Transformer (ViT) models. The developed modelswere appraised in terms of their binary classification performance, discerningCOVID-19 from Normal cases, as well as their ternary classificationperformance, discriminating COVID-19 from Pneumonia and Normal instances, andlastly, their quaternary classification performance, discriminating COVID-19from Bacterial Pneumonia, Viral Pneumonia, and Normal conditions, employingdistinct datasets. The proposed model evinced extraordinary precision,registering results of 99.92% and 99.84% for binary classification, 97.95% and86.48% for ternary classification, and 86.81% for quaternary classification,respectively, on the respective datasets.", "output": "Enhancing COVID-19 Diagnosis through Vision Transformer-Based Analysis of Chest X-ray Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While several recent works have identified societal-scale andextinction-level risks to humanity arising from artificial intelligence, fewhave attempted an {em exhaustive taxonomy} of such risks. Many exhaustivetaxonomies are possible, and some are useful -- particularly if they reveal newrisks or practical approaches to safety. This paper explores a taxonomy basedon accountability: whose actions lead to the risk, are the actors unified, andare they deliberate? We also provide stories to illustrate how the various risktypes could each play out, including risks arising from unanticipatedinteractions of many AI systems, as well as risks from deliberate misuse, forwhich combined technical and policy solutions are indicated.", "output": "TASRA: a Taxonomy and Analysis of Societal-Scale Risks from AI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spatial-temporal graph models are prevailing for abstracting and modellingspatial and temporal dependencies. In this work, we ask the following question:whether and to what extent can we localise spatial-temporal graph models? Welimit our scope to adaptive spatial-temporal graph neural networks (ASTGNNs),the state-of-the-art model architecture. Our approach to localisation involvessparsifying the spatial graph adjacency matrices. To this end, we proposeAdaptive Graph Sparsification (AGS), a graph sparsification algorithm whichsuccessfully enables the localisation of ASTGNNs to an extreme extent (fullylocalisation). We apply AGS to two distinct ASTGNN architectures and ninespatial-temporal datasets. Intriguingly, we observe that spatial graphs inASTGNNs can be sparsified by over 99.5% without any decline in test accuracy.Furthermore, even when ASTGNNs are fully localised, becoming graph-less andpurely temporal, we record no drop in accuracy for the majority of testeddatasets, with only minor accuracy deterioration observed in the remainingdatasets. However, when the partially or fully localised ASTGNNs arereinitialised and retrained on the same data, there is a considerable andconsistent drop in accuracy. Based on these observations, we reckon thattextit{(i)} in the tested data, the information provided by the spatialdependencies is primarily included in the information provided by the temporaldependencies and, thus, can be essentially ignored for inference; andtextit{(ii)} although the spatial dependencies provide redundant information,it is vital for the effective training of ASTGNNs and thus cannot be ignoredduring training. Furthermore, the localisation of ASTGNNs holds the potentialto reduce the heavy computation overhead required on large-scalespatial-temporal data and further enable the distributed deployment of ASTGNNs.", "output": "Localised Adaptive Spatial-Temporal Graph Neural Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This note shares some simple calculations and experiments related toabsmax-based blockwise quantization, as used in Dettmers et al., 2023. Theirproposed NF4 data type is said to be information theoretically optimal forrepresenting normally distributed weights. I show that this can't quite be thecase, as the distribution of the values to be quantized depends on theblock-size. I attempt to apply these insights to derive an improved code basedon minimizing the expected L1 reconstruction error, rather than the quantilebased method. This leads to improved performance for larger quantization blocksizes, while both codes perform similarly at smaller block sizes.", "output": "NF4 Isn't Information Theoretically Optimal (and that's Good)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Strokes2Surface, an offline geometry-reconstruction pipeline builtupon a 4D Sketching Interface, MR.Sketch, targeted at architectural design. Thepipeline recovers a curve network from designer-drawn strokes, thus bridgingbetween concept design and digital modeling stages in architectural design. Theinput to our pipeline consists of 3D strokes' polyline vertices and theircorresponding timestamps (as of the fourth dimension), along with additionalgeometric and stylus-related recorded properties. Inspired by sketchconsolidation and sketch-based modeling methods, our pipeline leverages suchdata and combines three Machine Learning (ML) models; a classifier and twoclustering models. In particular, based on observations of practices designerstypically employ in architectural design sketches, we solve a binaryclassification problem to recognize whether a stroke depicts a boundary andedge or is used to fill in the enclosing areas and faces of the intendedarchitectural object. Followed by the two clustering models, strokes of eachtype are further parsed into groups, each representing either a single edge ora single face. Next, groups representing edges are approximated with B-splinecurves, followed by a topology-recovering process identifying and fixingdesired connectivities between the curves forming a well-connected curvenetwork. Next, groups representing the faces are employed to detect the cyclesbounding patches in the curve network, resulting in the final surface meshgeometry of the architectural object. We confirm the usability ofStrokes2Surface via a user study and further validate and compare our resultsagainst a range of reconstructions computed using alternative methods. We alsointroduce our manually labeled dataset of 4D architectural design sketches forfurther use in the community.", "output": "Strokes2Surface: Recovering Curve Networks From 4D Architectural Design Sketches."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Seven years ago, researchers proposed a postprocessing method to equalize theerror rates of a model across different demographic groups. The work launchedhundreds of papers purporting to improve over the postprocessing baseline. Weempirically evaluate these claims through thousands of model evaluations onseveral tabular datasets. We find that the fairness-accuracy Pareto frontierachieved by postprocessing contains all other methods we were feasibly able toevaluate. In doing so, we address two common methodological errors that haveconfounded previous observations. One relates to the comparison of methods withdifferent unconstrained base models. The other concerns methods achievingdifferent levels of constraint relaxation. At the heart of our study is asimple idea we call unprocessing that roughly corresponds to the inverse ofpostprocessing. Unprocessing allows for a direct comparison of methods usingdifferent underlying models and levels of relaxation. Interpreting ourfindings, we recall a widely overlooked theoretical argument, present sevenyears ago, that accurately predicted what we observe.", "output": "Unprocessing Seven Years of Algorithmic Fairness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperspectral Image (HSI)s cover hundreds or thousands of narrow spectralbands, conveying a wealth of spatial and spectral information. However, due tothe instrumental errors and the atmospheric changes, the HSI obtained inpractice are often contaminated by noise and dead pixels(lines), resulting inmissing information that may severely compromise the subsequent applications.We introduce here a novel HSI missing pixel prediction algorithm, called LowRank and Sparsity Constraint Plug-and-Play (LRS-PnP). It is shown that LRS-PnPis able to predict missing pixels and bands even when all spectral bands of theimage are missing. The proposed LRS-PnP algorithm is further extended to aself-supervised model by combining the LRS-PnP with the Deep Image Prior (DIP),called LRS-PnP-DIP. In a series of experiments with real data, It is shown thatthe LRS-PnP-DIP either achieves state-of-the-art inpainting performancecompared to other learning-based methods, or outperforms them.", "output": "Self-Supervised Hyperspectral Inpainting with the Optimisation inspired Deep Neural Network Prior."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quadruped animals seamlessly transition between gaits as they changelocomotion speeds. While the most widely accepted explanation for gaittransitions is energy efficiency, there is no clear consensus on thedetermining factor, nor on the potential effects from terrain properties. Inthis article, we propose that viability, i.e. the avoidance of falls,represents an important criterion for gait transitions. We investigate theemergence of gait transitions through the interaction between supraspinal drive(brain), the central pattern generator in the spinal cord, the body, andexteroceptive sensing by leveraging deep reinforcement learning and roboticstools. Consistent with quadruped animal data, we show that the walk-trot gaittransition for quadruped robots on flat terrain improves both viability andenergy efficiency. Furthermore, we investigate the effects of discrete terrain(i.e. crossing successive gaps) on imposing gait transitions, and find theemergence of trot-pronk transitions to avoid non-viable states. Compared withother potential criteria such as peak forces and energy efficiency, viabilityis the only improved factor after gait transitions on both flat and discretegap terrains, suggesting that viability could be a primary and universalobjective of gait transitions, while other criteria are secondary objectivesand/or a consequence of viability. Moreover, we deploy our learned controllerin sim-to-real hardware experiments and demonstrate state-of-the-art quadrupedagility in challenging scenarios, where the Unitree A1 quadruped autonomouslytransitions gaits between trot and pronk to cross consecutive gaps of up to 30cm (83.3 % of the body-length) at over 1.3 m/s.", "output": "DeepTransition: Viability Leads to the Emergence of Gait Transitions in Learning Anticipatory Quadrupedal Locomotion Skills."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For content recommender systems such as TikTok and YouTube, the platform'sdecision algorithm shapes the incentives of content producers, including howmuch effort the content producers invest in the quality of their content. Manyplatforms employ online learning, which creates intertemporal incentives, sincecontent produced today affects recommendations of future content. In thispaper, we study the incentives arising from online learning, analyzing thequality of content produced at a Nash equilibrium. We show that classicalonline learning algorithms, such as Hedge and EXP3, unfortunately incentivizeproducers to create low-quality content. In particular, the quality of contentis upper bounded in terms of the learning rate and approaches zero for typicallearning rate schedules. Motivated by this negative result, we design adifferent learning algorithm -- based on punishing producers who createlow-quality content -- that correctly incentivizes producers to createhigh-quality content. At a conceptual level, our work illustrates theunintended impact that a platform's learning algorithm can have on contentquality and opens the door towards designing platform learning algorithms thatincentivize the creation of high-quality content.", "output": "Incentivizing High-Quality Content in Online Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, diffusion models have achieved remarkable performance in datageneration, e.g., generating high-quality images. Nevertheless, chemistrymolecules often have complex non-Euclidean spatial structures, with thebehavior changing dynamically and unpredictably. Most existing diffusion modelshighly rely on computing the probability distribution, i.e., Gaussiandistribution, in Euclidean space, which cannot capture internal non-Euclideanstructures of molecules, especially the hierarchical structures of the implicitmanifold surface represented by molecules. It has been observed that thecomplex hierarchical structures in hyperbolic embedding space become moreprominent and easier to be captured. In order to leverage both the datageneration power of diffusion models and the strong capability to extractcomplex geometric features of hyperbolic embedding, we propose to extend thediffusion model to hyperbolic manifolds for molecule generation, namely,Hyperbolic Graph Diffusion Model (HGDM). The proposed HGDM employs a hyperbolicvariational autoencoder to generate the hyperbolic hidden representation ofnodes and then a score-based hyperbolic graph neural network is used to learnthe distribution in hyperbolic space. Numerical experimental results show thatthe proposed HGDM achieves higher performance on several molecular datasets,compared with state-of-the-art methods.", "output": "Hyperbolic Graph Diffusion Model for Molecule Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel learning-based method that achieves state-of-the-artperformance on several heart rate estimation benchmarks extracted fromphotoplethysmography signals (PPG). We consider the evolution of the heart ratein the context of a discrete-time stochastic process that we represent as ahidden Markov model. We derive a distribution over possible heart rate valuesfor a given PPG signal window through a trained neural network. Using beliefpropagation, we incorporate the statistical distribution of heart rate changesto refine these estimates in a temporal context. From this, we obtain aquantized probability distribution over the range of possible heart rate valuesthat captures a meaningful and well-calibrated estimate of the inherentpredictive uncertainty. We show the robustness of our method on eight publicdatasets with three different cross-validation experiments.", "output": "BeliefPPG: Uncertainty-aware Heart Rate Estimation from PPG signals via Belief Propagation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the non-convex optimization problem associated with thedecomposition of a real symmetric tensor into a sum of rank one terms. Use ismade of the rich symmetry structure to derive Puiseux series representations offamilies of critical points, and so obtain precise analytic estimates on thecritical values and the Hessian spectrum. The sharp results make possible ananalytic characterization of various geometric obstructions to localoptimization methods, revealing in particular a complex array of saddles andlocal minima which differ by their symmetry, structure and analytic properties.A desirable phenomenon, occurring for all critical points considered, concernsthe index of a point, i.e., the number of negative Hessian eigenvalues,increasing with the value of the objective function. Lastly, a Newton polytopeargument is used to give a complete enumeration of all critical points of fixedsymmetry, and it is shown that contrarily to the set of global minima whichremains invariant under different choices of tensor norms, certain families ofnon-global minima emerge, others disappear.", "output": "Symmetry & Critical Points for Symmetric Tensor Decomposition Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prompt learning has been proven to be highly effective in improvingpre-trained language model (PLM) adaptability, surpassing conventionalfine-tuning paradigms, and showing exceptional promise in an ever-growinglandscape of applications and APIs tailored for few-shot learning scenarios.Despite the growing prominence of prompt learning-based APIs, their securityconcerns remain underexplored. In this paper, we undertake a pioneering studyon the Trojan susceptibility of prompt-learning PLM APIs. We identified severalkey challenges, including discrete-prompt, few-shot, and black-box settings,which limit the applicability of existing backdoor attacks. To address thesechallenges, we propose TrojPrompt, an automatic and black-box framework toeffectively generate universal and stealthy triggers and insert Trojans intohard prompts. Specifically, we propose a universal API-driven trigger discoveryalgorithm for generating universal triggers for various inputs by queryingvictim PLM APIs using few-shot data samples. Furthermore, we introduce a novelprogressive trojan poisoning algorithm designed to generate poisoned promptsthat retain efficacy and transferability across a diverse range of models. Ourexperiments and results demonstrate TrojPrompt's capacity to effectively insertTrojans into text prompts in real-world black-box PLM APIs, while maintainingexceptional performance on clean test sets and significantly outperformingbaseline models. Our work sheds light on the potential security risks incurrent models and offers a potential defensive approach.", "output": "TrojPrompt: A Black-box Trojan Attack on Pre-trained Language Models."}]
\ No newline at end of file