diff --git a/LLMs-papers.md b/LLMs-papers.md
index 1beb408..d0e380d 100644
--- a/LLMs-papers.md
+++ b/LLMs-papers.md
@@ -12,36 +12,37 @@
Translation](http://arxiv.org/abs/1909.04879v1)
8. [Large Language Models are not Models of Natural Language: they are
Corpus Models](http://arxiv.org/abs/2112.07055v2)
-9. [Unsupervised Improvement of Factual Knowledge in Language Models](http://arxiv.org/abs/2304.01597v1)
-10. [Multilingual Text Classification for Dravidian Languages](http://arxiv.org/abs/2112.01705v1)
-11. [Can Large Language Models design a Robot?](http://arxiv.org/abs/2303.15324v1)
-12. [HinFlair: pre-trained contextual string embeddings for pos tagging and
+9. [UIO at SemEval-2023 Task 12: Multilingual fine-tuning for sentiment
+ classification in low-resource languages](http://arxiv.org/abs/2304.14189v1)
+10. [Unsupervised Improvement of Factual Knowledge in Language Models](http://arxiv.org/abs/2304.01597v1)
+11. [Multilingual Text Classification for Dravidian Languages](http://arxiv.org/abs/2112.01705v1)
+12. [Can Large Language Models design a Robot?](http://arxiv.org/abs/2303.15324v1)
+13. [HinFlair: pre-trained contextual string embeddings for pos tagging and
text classification in the Hindi language](http://arxiv.org/abs/2101.06949v1)
-13. [Jigsaw: Large Language Models meet Program Synthesis](http://arxiv.org/abs/2112.02969v1)
-14. [Larger-Scale Transformers for Multilingual Masked Language Modeling](http://arxiv.org/abs/2105.00572v1)
-15. [Exploring Cross-lingual Textual Style Transfer with Large Multilingual
+14. [Jigsaw: Large Language Models meet Program Synthesis](http://arxiv.org/abs/2112.02969v1)
+15. [Larger-Scale Transformers for Multilingual Masked Language Modeling](http://arxiv.org/abs/2105.00572v1)
+16. [Exploring Cross-lingual Textual Style Transfer with Large Multilingual
Language Models](http://arxiv.org/abs/2206.02252v1)
-16. [Low-Resource Language Modelling of South African Languages](http://arxiv.org/abs/2104.00772v1)
-17. [In What Languages are Generative Language Models the Most Formal?
+17. [Low-Resource Language Modelling of South African Languages](http://arxiv.org/abs/2104.00772v1)
+18. [In What Languages are Generative Language Models the Most Formal?
Analyzing Formality Distribution across Languages](http://arxiv.org/abs/2302.12299v1)
-18. [GreenPLM: Cross-lingual pre-trained language models conversion with
+19. [GreenPLM: Cross-lingual pre-trained language models conversion with
(almost) no cost](http://arxiv.org/abs/2211.06993v2)
-19. [BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting](http://arxiv.org/abs/2212.09535v1)
-20. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with
+20. [BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting](http://arxiv.org/abs/2212.09535v1)
+21. [LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with
Knowledge Distillation](http://arxiv.org/abs/2302.08387v1)
-21. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1)
-22. [Can Character-based Language Models Improve Downstream Task Performance
+22. [Benchmarking Language Models for Code Syntax Understanding](http://arxiv.org/abs/2210.14473v1)
+23. [Can Character-based Language Models Improve Downstream Task Performance
in Low-Resource and Noisy Language Scenarios?](http://arxiv.org/abs/2110.13658v1)
-23. [Learning Translation Quality Evaluation on Low Resource Languages from
+24. [Learning Translation Quality Evaluation on Low Resource Languages from
Large Language Models](http://arxiv.org/abs/2302.03491v1)
-24. [Investigating the Translation Performance of a Large Multilingual
+25. [Investigating the Translation Performance of a Large Multilingual
Language Model: the Case of BLOOM](http://arxiv.org/abs/2303.01911v1)
-25. [Investigating Robustness of Dialog Models to Popular Figurative Language
+26. [Investigating Robustness of Dialog Models to Popular Figurative Language
Constructs](http://arxiv.org/abs/2110.00687v1)
-26. [DPRK-BERT: The Supreme Language Model](http://arxiv.org/abs/2112.00567v1)
-27. [MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural
+27. [DPRK-BERT: The Supreme Language Model](http://arxiv.org/abs/2112.00567v1)
+28. [MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural
Code Generation](http://arxiv.org/abs/2208.08227v4)
-28. [Subword Segmental Language Modelling for Nguni Languages](http://arxiv.org/abs/2210.06525v1)
-29. [TabLLM: Few-shot Classification of Tabular Data with Large Language
+29. [Subword Segmental Language Modelling for Nguni Languages](http://arxiv.org/abs/2210.06525v1)
+30. [TabLLM: Few-shot Classification of Tabular Data with Large Language
Models](http://arxiv.org/abs/2210.10723v2)
-30. [Language Model Supervision for Handwriting Recognition Model Adaptation](http://arxiv.org/abs/1808.01423v1)
diff --git a/data/2023-04-30.json b/data/2023-04-30.json
new file mode 100644
index 0000000..312712e
--- /dev/null
+++ b/data/2023-04-30.json
@@ -0,0 +1 @@
+[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Brain tumor is one of the leading causes of cancer death. The high-gradebrain tumors are easier to recurrent even after standard treatment. Therefore,developing a method to predict brain tumor recurrence location plays animportant role in the treatment planning and it can potentially prolongpatient's survival time. There is still little work to deal with this issue. Inthis paper, we present a deep learning-based brain tumor recurrence locationprediction network. Since the dataset is usually small, we propose to usetransfer learning to improve the prediction. We first train a multi-modal braintumor segmentation network on the public dataset BraTS 2021. Then, thepre-trained encoder is transferred to our private dataset for extracting therich semantic features. Following that, a multi-scale multi-channel featurefusion model and a nonlinear correlation learning module are developed to learnthe effective features. The correlation between multi-channel features ismodeled by a nonlinear equation. To measure the similarity between thedistributions of original features of one modality and the estimated correlatedfeatures of another modality, we propose to use Kullback-Leibler divergence.Based on this divergence, a correlation loss function is designed to maximizethe similarity between the two feature distributions. Finally, two decoders areconstructed to jointly segment the present brain tumor and predict its futuretumor recurrence location. To the best of our knowledge, this is the first workthat can segment the present tumor and at the same time predict future tumorrecurrence location, making the treatment planning more efficient and precise.The experimental results demonstrated the effectiveness of our proposed methodto predict the brain tumor recurrence location from the limited dataset.", "output": "Prediction of brain tumor recurrence location based on multi-modal fusion and nonlinear correlation learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Increased capabilities such as recognition and self-adaptability are nowrequired from IoT applications. While IoT node power consumption is a majorconcern for these applications, cloud-based processing is becomingunsustainable due to continuous sensor or image data transmission over thewireless network. Thus optimized ML capabilities and data transfers should beintegrated in the IoT node. Moreover, IoT applications are torn betweensporadic data-logging and energy-hungry data processing (e.g. imageclassification). Thus, the versatility of the node is key in addressing thiswide diversity of energy and processing needs. This paper presents SamurAI, aversatile IoT node bridging this gap in processing and in energy by leveragingtwo on-chip sub-systems: a low power, clock-less, event-drivenAlways-Responsive (AR) part and an energy-efficient On-Demand (OD) part. ARcontains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a207ns wake-up time optimized for sporadic computing, while OD combines adeep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complextasks up to 36GOPS. This architecture partitioning achieves best in classversatility metrics such as peak performance to idle power ratio. On anapplicative classification scenario, it demonstrates system power gains, up to3.5x compared to cloud-based processing, and thus extended battery lifetime.", "output": "SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To improve the recognition ability of computer-aided breast massclassification among mammographic images, in this work we explore thestate-of-the-art classification networks to develop an ensemble mechanism.First, the regions of interest (ROIs) are obtained from the original dataset,and then three models, i.e., XceptionNet, DenseNet, and EfficientNet, aretrained individually. After training, we ensemble the mechanism by summing theprobabilities outputted from each network which enhances the performance up to5%. The scheme has been validated on a public dataset and we achieved accuracy,precision, and recall 88%, 85%, and 76% respectively.", "output": "Ensemble CNNs for Breast Tumor Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The immense scale of the recent large language models (LLM) allows manyinteresting properties, such as, instruction- and chain-of-thought-basedfine-tuning, that has significantly improved zero- and few-shot performance inmany natural language processing (NLP) tasks. Inspired by such successes, weadopt such an instruction-tuned LLM Flan-T5 as the text encoder fortext-to-audio (TTA) generation -- a task where the goal is to generate an audiofrom its textual description. The prior works on TTA either pre-trained a jointtext-audio encoder or used a non-instruction-tuned model, such as, T5.Consequently, our latent diffusion model (LDM)-based approach TANGO outperformsthe state-of-the-art AudioLDM on most metrics and stays comparable on the reston AudioCaps test set, despite training the LDM on a 63 times smaller datasetand keeping the text encoder frozen. This improvement might also be attributedto the adoption of audio pressure level-based sound mixing for training setaugmentation, whereas the prior methods take a random mix.", "output": "Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While Large Language Models (LLMs) have shown exceptional performance invarious tasks, their (arguably) most prominent drawback is generatinginaccurate or false information with a confident tone. In this paper, wehypothesize that the LLM's internal state can be used to reveal thetruthfulness of a statement. Therefore, we introduce a simple yet effectivemethod to detect the truthfulness of LLM-generated statements, which utilizesthe LLM's hidden layer activations to determine the veracity of statements. Totrain and evaluate our method, we compose a dataset of true and falsestatements in six different topics. A classifier is trained to detect whichstatement is true or false based on an LLM's activation values. Specifically,the classifier receives as input the activation values from the LLM for each ofthe statements in the dataset. Our experiments demonstrate that our method fordetecting statement veracity significantly outperforms even few-shot promptingmethods, highlighting its potential to enhance the reliability of LLM-generatedcontent and its practical applicability in real-world scenarios.", "output": "The Internal State of an LLM Knows When its Lying."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the integration of artificial intelligence (AI) and cloudcomputing has emerged as a promising avenue for addressing the growingcomputational demands of AI applications. This paper presents a comprehensivestudy of scalable, distributed AI frameworks leveraging cloud computing forenhanced deep learning performance and efficiency. We first provide an overviewof popular AI frameworks and cloud services, highlighting their respectivestrengths and weaknesses. Next, we delve into the critical aspects of datastorage and management in cloud-based AI systems, discussing datapreprocessing, feature engineering, privacy, and security. We then exploreparallel and distributed training techniques for AI models, focusing on modelpartitioning, communication strategies, and cloud-based training architectures.In subsequent chapters, we discuss optimization strategies for AI workloadsin the cloud, covering load balancing, resource allocation, auto-scaling, andperformance benchmarking. We also examine AI model deployment and serving inthe cloud, outlining containerization, serverless deployment options, andmonitoring best practices. To ensure the cost-effectiveness of cloud-based AIsolutions, we present a thorough analysis of costs, optimization strategies,and case studies showcasing successful deployments. Finally, we summarize thekey findings of this study, discuss the challenges and limitations ofcloud-based AI, and identify emerging trends and future research opportunitiesin the field.", "output": "Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose TR0N, a highly general framework to turn pre-trained unconditionalgenerative models, such as GANs and VAEs, into conditional models. Theconditioning can be highly arbitrary, and requires only a pre-trained auxiliarymodel. For example, we show how to turn unconditional models intoclass-conditional ones with the help of a classifier, and also intotext-to-image models by leveraging CLIP. TR0N learns a lightweight stochasticmapping which \"translates\" between the space of conditions and the latent spaceof the generative model, in such a way that the generated latent corresponds toa data sample satisfying the desired condition. The translated latent samplesare then further improved upon through Langevin dynamics, enabling us to obtainhigher-quality data samples. TR0N requires no training data nor fine-tuning,yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competingalternatives not only on this metric, but also in sampling speed -- all whileretaining a much higher level of generality. Our code is available at", "output": "TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The impact of artificial intelligence systems on our society is increasing atan unprecedented speed. For instance, ChatGPT is being tested in mental healthtreatment applications such as Koko, Stable Diffusion generates pieces of artcompetitive with (or outperforming) human artists, and so on. Ethical concernsregarding the behavior and applications of generative AI systems have beenincreasing over the past years, and the field of AI alignment - steering thebehavior of AI systems towards being aligned with human values - is a rapidlygrowing subfield of modern AI. In this paper, we address the challengesinvolved in ethical evaluation of a multimodal artificial intelligence system.The multimodal systems we focus on take both text and an image as input andoutput text, completing the sentence or answering the question asked as input.We perform the evaluation of these models in two steps: we first discus thecreation of a multimodal ethical database and then use this database toconstruct morality-evaluating algorithms. The creation of the multimodalethical database is done interactively through human feedback. Users arepresented with multiple examples and votes on whether they are ethical or not.Once these answers have been aggregated into a dataset, we built and testeddifferent algorithms to automatically evaluate the morality of multimodalsystems. These algorithms aim to classify the answers as ethical or not. Themodels we tested are a RoBERTa-large classifier and a multilayer perceptronclassifier.", "output": "Towards ethical multimodal systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural network model compression techniques can address the computation issueof deep neural networks on embedded devices in industrial systems. Theguaranteed output error computation problem for neural network compression withquantization is addressed in this paper. A merged neural network is built froma feedforward neural network and its quantized version to produce the exactoutput difference between two neural networks. Then, optimization-based methodsand reachability analysis methods are applied to the merged neural network tocompute the guaranteed quantization error. Finally, a numerical example isproposed to validate the applicability and effectiveness of the proposedapproach.", "output": "Guaranteed Quantization Error Computation for Neural Network Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robots operating in the real world require both rich manipulation skills aswell as the ability to semantically reason about when to apply those skills.Towards this goal, recent works have integrated semantic representations fromlarge-scale pretrained vision-language (VL) models into manipulation models,imparting them with more general reasoning capabilities. However, we show thatthe conventional pretraining-finetuning pipeline for integrating suchrepresentations entangles the learning of domain-specific action informationand domain-general visual information, leading to less data-efficient trainingand poor generalization to unseen objects and tasks. To this end, we proposeProgramPort, a modular approach to better leverage pretrained VL models byexploiting the syntactic and semantic structures of language instructions. Ourframework uses a semantic parser to recover an executable program, composed offunctional modules grounded on vision and action across different modalities.Each functional module is realized as a combination of deterministiccomputation and learnable neural networks. Program execution producesparameters to general manipulation primitives for a robotic end-effector. Theentire modular network can be trained with end-to-end imitation learningobjectives. Experiments show that our model successfully disentangles actionand perception, translating to improved zero-shot and compositionalgeneralization in a variety of manipulation behaviors. Project webpage at:url{", "output": "Programmatically Grounded, Compositionally Generalizable Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper assesses the reliability of the RemOve-And-Retrain (ROAR)protocol, which is used to measure the performance of feature importanceestimates. Our findings from the theoretical background and empiricalexperiments indicate that attributions that possess less information about thedecision function can perform better in ROAR benchmarks, conflicting with theoriginal purpose of ROAR. This phenomenon is also observed in the recentlyproposed variant RemOve-And-Debias (ROAD), and we propose a consistent trend ofblurriness bias in ROAR attribution metrics. Our results caution againstuncritical reliance on ROAR metrics.", "output": "On Pitfalls of $\\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In response to the global need for sustainable energy, green technology mayhelp fight climate change. Before green infrastructure to be easily integratedinto the world's energy system, it needs upgrading. By improving energyinfrastructure and decision-making, artificial intelligence (AI) may help solvethis challenge. EHVs have grown in popularity because to concerns about globalwarming and the need for more ecologically friendly transportation. EHVs maywork better with cutting-edge technologies like AI. Electric vehicles (EVs)reduce greenhouse gas emissions and promote sustainable mobility. Electricautomobiles (EVs) are growing in popularity due to their benefits for climatechange mitigation and sustainable mobility. Unfortunately, EV productionconsumes a lot of energy and materials, which may harm nature. EV production isbeing improved using green technologies like artificial intelligence andpredictive analysis. Electric and hybrid vehicles (EHVs) may help meet the needfor ecologically friendly transportation. However, the Battery ManagementSystem (BMS) controls EHV performance and longevity. AI may improve EHV energyefficiency, emissions reduction, and sustainability. Remote hijacking, securitybreaches, and unauthorized access are EHV cybersecurity vulnerabilitiesaddressed in the article. AI research and development may help maketransportation more sustainable, as may optimizing EHVs and charginginfrastructure.", "output": "AI-based Predictive Analytic Approaches for safeguarding the Future of Electric/Hybrid Vehicles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Open domain entity state tracking aims to predict reasonable state changes ofentities (i.e., [attribute] of [entity] was [before_state] and [after_state]afterwards) given the action descriptions. It's important to many reasoningtasks to support human everyday activities. However, it's challenging as themodel needs to predict an arbitrary number of entity state changes caused bythe action while most of the entities are implicitly relevant to the actionsand their attributes as well as states are from open vocabularies. To tacklethese challenges, we propose a novel end-to-end Knowledge Informed frameworkfor open domain Entity State Tracking, namely KIEST, which explicitly retrievesthe relevant entities and attributes from external knowledge graph (i.e.,ConceptNet) and incorporates them to autoregressively generate all the entitystate changes with a novel dynamic knowledge grained encoder-decoder framework.To enforce the logical coherence among the predicted entities, attributes, andstates, we design a new constraint decoding strategy and employ a coherencereward to improve the decoding process. Experimental results show that ourproposed KIEST framework significantly outperforms the strong baselines on thepublic benchmark dataset OpenPI.", "output": "Understand the Dynamic World: An End-to-End Knowledge Informed Framework for Open Domain Entity State Tracking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative multimodal models based on diffusion models have seen tremendousgrowth and advances in recent years. Models such as DALL-E and Stable Diffusionhave become increasingly popular and successful at creating images from texts,often combining abstract ideas. However, like other deep learning models, theyalso reflect social biases they inherit from their training data, which isoften crawled from the internet. Manually auditing models for biases can bevery time and resource consuming and is further complicated by the unboundedand unconstrained nature of inputs these models can take. Research into biasmeasurement and quantification has generally focused on small single-stagemodels working on a single modality. Thus the emergence of multistagemultimodal models requires a different approach. In this paper, we proposeMultimodal Composite Association Score (MCAS) as a new method of measuringgender bias in multimodal generative models. Evaluating both DALL-E 2 andStable Diffusion using this approach uncovered the presence of genderedassociations of concepts embedded within the models. We propose MCAS as anaccessible and scalable method of quantifying potential bias for models withdifferent modalities and a range of potential biases.", "output": "Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ensoul is a framework proposed for the purpose of creating technologies thatcreate more technologies through the combined use of networks, and nests, ofenergy homeostatic (enerstatic) loops and open-ended evolutionary techniques.Generative technologies developed by such an approach serve as both simple, yetinsightful models of thermodynamically driven complex systems and as powerfulsources of novel technologies. \"Self Organizing intelligent Ultra Low powerSystems\" (SOULS) is a term that well describes the technologies produced bysuch a generative technology, as well as the generative technology itself. Theterm is meant to capture the abstract nature of such technologies as beingindependent of the substrate in which they are embedded. In other words, SOULScan be biological, artificial or hybrid in form.", "output": "Ensoul: A framework for the creation of self organizing intelligent ultra low power systems (SOULS) through evolutionary enerstatic networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years brought advancements in using neural networks for representationlearning of various language or visual phenomena. New methods freed datascientists from hand-crafting features for common tasks. Similarly, problemsthat require considering the spatial variable can benefit from pretrained mapregion representations instead of manually creating feature tables that oneneeds to prepare to solve a task. However, very few methods for map arearepresentation exist, especially with respect to road network characteristics.In this paper, we propose a method for generating microregions' embeddings withrespect to their road infrastructure characteristics. We base ourrepresentations on OpenStreetMap road networks in a selection of cities and usethe H3 spatial index to allow reproducible and scalable representationlearning. We obtained vector representations that detect how similar maphexagons are in the road networks they contain. Additionally, we observe thatembeddings yield a latent space with meaningful arithmetic operations. Finally,clustering methods allowed us to draft a high-level typology of obtainedrepresentations. We are confident that this contribution will aid datascientists working on infrastructure-related prediction tasks with spatialvariables.", "output": "highway2vec -- representing OpenStreetMap microregions with respect to their road network characteristics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Reinforcement Learning has shown significant progress in extractinguseful representations from high-dimensional inputs albeit using hand-craftedauxiliary tasks and pseudo rewards. Automatically learning such representationsin an object-centric manner geared towards control and fast adaptation remainsan open research problem. In this paper, we introduce a method that tries todiscover meaningful features from objects, translating them to temporallycoherent \"question\" functions and leveraging the subsequent learned generalvalue functions for control. We compare our approach with state-of-the-arttechniques alongside other ablations and show competitive performance in bothstationary and non-stationary settings. Finally, we also investigate thediscovered general value functions and through qualitative analysis show thatthe learned representations are not only interpretable but also, centeredaround objects that are invariant to changes across tasks facilitating fastadaptation.", "output": "Discovering Object-Centric Generalized Value Functions From Pixels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While the use of the Internet of Things is becoming more and more popular,many security vulnerabilities are emerging with the large number of devicesbeing introduced to the market. In this environment, IoT device identificationmethods provide a preventive security measure as an important factor inidentifying these devices and detecting the vulnerabilities they suffer from.In this study, we present a method that identifies devices in the Aalto datasetusing the convolutional neural network (CNN).", "output": "CNN based IoT Device Identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid growth of social media has caused tremendous effects on informationpropagation, raising extreme challenges in detecting rumors. Existing rumordetection methods typically exploit the reposting propagation of a rumorcandidate for detection by regarding all reposts to a rumor candidate as atemporal sequence and learning semantics representations of the repostsequence. However, extracting informative support from the topologicalstructure of propagation and the influence of reposting authors for debunkingrumors is crucial, which generally has not been well addressed by existingmethods. In this paper, we organize a claim post in circulation as an adhocevent tree, extract event elements, and convert it to bipartite adhoc eventtrees in terms of both posts and authors, i.e., author tree and post tree.Accordingly, we propose a novel rumor detection model with hierarchicalrepresentation on the bipartite adhoc event trees called BAET. Specifically, weintroduce word embedding and feature encoder for the author and post tree,respectively, and design a root-aware attention module to perform noderepresentation. Then we adopt the tree-like RNN model to capture the structuralcorrelations and propose a tree-aware attention module to learn treerepresentation for the author tree and post tree, respectively. Extensiveexperimental results on two public Twitter datasets demonstrate theeffectiveness of BAET in exploring and exploiting the rumor propagationstructure and the superior detection performance of BAET over state-of-the-artbaseline methods.", "output": "Rumor Detection with Hierarchical Representation on Bipartite Adhoc Event Trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While the use of the Internet of Things is becoming more and more popular,many security vulnerabilities are emerging with the large number of devicesbeing introduced to the market. In this environment, IoT device identificationmethods provide a preventive security measure as an important factor inidentifying these devices and detecting the vulnerabilities they suffer from.In this study, we present a method that identifies devices in the Aalto datasetusing Long short-term memory (LSTM)", "output": "LSTM based IoT Device Identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We investigate how to enhance answer precision in frequently asked questionsposed by distributed users using cloud-based Large Language Models (LLMs). Ourstudy focuses on a typical situations where users ask similar queries thatinvolve identical mathematical reasoning steps and problem-solving procedures.Due to the unsatisfactory accuracy of LLMs' zero-shot prompting with standalonequestions, we propose to improve the distributed synonymous questions usingSelf-Consistency (SC) and Chain-of-Thought (CoT) techniques. Specifically, wefirst retrieve synonymous questions from a crowd-sourced database and create afederated question pool. We call these federated synonymous questions with thesame or different parameters SP-questions or DP-questions, respectively. Werefer to our methods as Fed-SP-SC and Fed-DP-CoT, which can generatesignificantly more accurate answers for all user queries without requiringsophisticated model-tuning. Through extensive experiments, we demonstrate thatour proposed methods can significantly enhance question accuracy by fullyexploring the synonymous nature of the questions and the consistency of theanswers.", "output": "Federated Prompting and Chain-of-Thought Reasoning for Improving LLMs Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many games feature a progression of levels that doesn't adapt to the player.This can be problematic because some players may get stuck if the progressionis too difficult, while others may find it boring if the progression is tooslow to get to more challenging levels. This can be addressed by buildinglevels based on the player's performance and preferences. In this work, weformulate the problem of generating levels for a player as a Markov DecisionProcess (MDP) and use adaptive dynamic programming (ADP) to solve the MDPbefore assembling a level. We tested with two case studies and found that usingan ADP outperforms two baselines. Furthermore, we experimented with playerproxies and switched them in the middle of play, and we show that a simplemodification prior to running ADP results in quick adaptation. By using ADP,which searches the entire MDP, we produce a dynamic progression of levels thatadapts to the player.", "output": "Level Assembly as a Markov Decision Process."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "NIMS-OS (NIMS Orchestration System) is a Python library created to realize aclosed loop of robotic experiments and artificial intelligence (AI) withouthuman intervention for automated materials exploration. It uses variouscombinations of modules to operate autonomously. Each module acts as an AI formaterials exploration or a controller for a robotic experiments. As AItechniques, Bayesian optimization (PHYSBO), boundless objective-freeexploration (BLOX), phase diagram construction (PDC), and random exploration(RE) methods can be used. Moreover, a system called NIMS automated roboticelectrochemical experiments (NAREE) is available as a set of roboticexperimental equipment. Visualization tools for the results are also included,which allows users to check the optimization results in real time. Newlycreated modules for AI and robotic experiments can be added easily to extendthe functionality of the system. In addition, we developed a GUI application tocontrol NIMS-OS.To demonstrate the operation of NIMS-OS, we consider anautomated exploration for new electrolytes. NIMS-OS is available at", "output": "NIMS-OS: An automation software to implement a closed loop between artificial intelligence and robotic experiments in materials science."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Organizations are increasingly adopting machine learning (ML) for personnelassessment. However, concerns exist about fairness in designing andimplementing ML assessments. Supervised ML models are trained to model patternsin data, meaning ML models tend to yield predictions that reflect subgroupdifferences in applicant attributes in the training data, regardless of theunderlying cause of subgroup differences. In this study, we systematicallyunder- and oversampled minority (Black and Hispanic) applicants to manipulateadverse impact ratios in training data and investigated how training dataadverse impact ratios affect ML model adverse impact and accuracy. We usedself-reports and interview transcripts from job applicants (N = 2,501) to train9,702 ML models to predict screening decisions. We found that training dataadverse impact related linearly to ML model adverse impact. However, removingadverse impact from training data only slightly reduced ML model adverse impactand tended to negatively affect ML model accuracy. We observed consistenteffects across self-reports and interview transcripts, whether oversamplingreal (i.e., bootstrapping) or synthetic observations. As our study relied onlimited predictor sets from one organization, the observed effects on adverseimpact may be attenuated among more accurate ML models.", "output": "Oversampling Higher-Performing Minorities During Machine Learning Model Training Reduces Adverse Impact Slightly but Also Reduces Model Accuracy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Collaborative Filtering (CF) is a widely used and effective technique forrecommender systems. In recent decades, there have been significantadvancements in latent embedding-based CF methods for improved accuracy, suchas matrix factorization, neural collaborative filtering, and LightGCN. However,the explainability of these models has not been fully explored. Addingexplainability to recommendation models can not only increase trust in thedecisionmaking process, but also have multiple benefits such as providingpersuasive explanations for item recommendations, creating explicit profilesfor users and items, and assisting item producers in design improvements.In this paper, we propose a neat and effective Explainable CollaborativeFiltering (ECF) model that leverages interpretable cluster learning to achievethe two most demanding objectives: (1) Precise - the model should notcompromise accuracy in the pursuit of explainability; and (2) Self-explainable- the model's explanations should truly reflect its decision-making process,not generated from post-hoc methods. The core of ECF is mining taste clustersfrom user-item interactions and item profiles.We map each user and item to asparse set of taste clusters, and taste clusters are distinguished by a fewrepresentative tags. The user-item preference, users/items' clusteraffiliations, and the generation of taste clusters are jointly optimized in anend-to-end manner. Additionally, we introduce a forest mechanism to ensure themodel's accuracy, explainability, and diversity. To comprehensively evaluatethe explainability quality of taste clusters, we design several quantitativemetrics, including in-cluster item coverage, tag utilization, silhouette, andinformativeness. Our model's effectiveness is demonstrated through extensiveexperiments on three real-world datasets.", "output": "Towards Explainable Collaborative Filtering with Taste Clusters Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many computer vision applications, images are acquired with arbitrary orrandom rotations and translations, and in such setups, it is desirable toobtain semantic representations disentangled from the image orientation.Examples of such applications include semiconductor wafer defect inspection,plankton microscope images, and inference on single-particle cryo-electronmicroscopy (cryo-EM) micro-graphs. In this work, we propose InvariantRepresentation Learning with Implicit Neural Representation (IRL-INR), whichuses an implicit neural representation (INR) with a hypernetwork to obtainsemantic representations disentangled from the orientation of the image. Weshow that IRL-INR can effectively learn disentangled semantic representationson more complex images compared to those considered in prior works and showthat these semantic representations synergize well with SCAN to producestate-of-the-art unsupervised clustering results.", "output": "Rotation and Translation Invariant Representation Learning with Implicit Neural Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Clinical notes are assigned ICD codes - sets of codes for diagnoses andprocedures. In the recent years, predictive machine learning models have beenbuilt for automatic ICD coding. However, there is a lack of widely acceptedbenchmarks for automated ICD coding models based on large-scale public EHRdata.This paper proposes a public benchmark suite for ICD-10 coding using a largeEHR dataset derived from MIMIC-IV, the most recent public EHR dataset. Weimplement and compare several popular methods for ICD coding prediction tasksto standardize data preprocessing and establish a comprehensive ICD codingbenchmark dataset. This approach fosters reproducibility and model comparison,accelerating progress toward employing automated ICD coding in future studies.Furthermore, we create a new ICD-9 benchmark using MIMIC-IV data, providingmore data points and a higher number of ICD codes than MIMIC-III. Ouropen-source code offers easy access to data processing steps, benchmarkcreation, and experiment replication for those with MIMIC-IV access, providinginsights, guidance, and protocols to efficiently develop ICD coding models.", "output": "Mimic-IV-ICD: A new benchmark for eXtreme MultiLabel Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite substantial progress in 3D human pose estimation from a single-viewimage, prior works rarely explore global and local correlations, leading toinsufficient learning of human skeleton representations. To address this issue,we propose a novel Interweaved Graph and Attention Network (IGANet) that allowsbidirectional communications between graph convolutional networks (GCNs) andattentions. Specifically, we introduce an IGA module, where attentions areprovided with local information from GCNs and GCNs are injected with globalinformation from attentions. Additionally, we design a simple yet effectiveU-shaped multi-layer perceptron (uMLP), which can capture multi-granularityinformation for body joints. Extensive experiments on two popular benchmarkdatasets (i.e. Human3.6M and MPI-INF-3DHP) are conducted to evaluate ourproposed method.The results show that IGANet achieves state-of-the-artperformance on both datasets. Code is available at", "output": "Interweaved Graph and Attention Network for 3D Human Pose Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Probably Approximately Correct (i.e., PAC) learning is a core concept ofsample complexity theory, and efficient PAC learnability is often seen as anatural counterpart to the class P in classical computational complexity. Butwhile the nascent theory of parameterized complexity has allowed us to pushbeyond the P-NP ``dichotomy'' in classical computational complexity andidentify the exact boundaries of tractability for numerous problems, there isno analogue in the domain of sample complexity that could push beyond efficientPAC learnability.As our core contribution, we fill this gap by developing a theory ofparameterized PAC learning which allows us to shed new light on several recentPAC learning results that incorporated elements of parameterized complexity.Within the theory, we identify not one but two notions of fixed-parameterlearnability that both form distinct counterparts to the class FPT -- the coreconcept at the center of the parameterized complexity paradigm -- and developthe machinery required to exclude fixed-parameter learnability. We thenshowcase the applications of this theory to identify refined boundaries oftractability for CNF and DNF learning as well as for a range of learningproblems on graphs.", "output": "A Parameterized Theory of PAC Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms for parsing remote sensing data have a wide rangeof societally relevant applications, but labels used to train these algorithmscan be difficult or impossible to acquire. This challenge has spurred researchinto self-supervised learning for remote sensing data aiming to unlock the useof machine learning in geographies or application domains where labelleddatasets are small. Current self-supervised learning approaches for remotesensing data draw significant inspiration from techniques applied to naturalimages. However, remote sensing data has important differences from naturalimages -- for example, the temporal dimension is critical for many tasks anddata is collected from many complementary sensors. We show that designingmodels and self-supervised training techniques specifically for remote sensingdata results in both smaller and more performant models. We introduce thePretrained Remote Sensing Transformer (Presto), a transformer-based modelpre-trained on remote sensing pixel-timeseries data. Presto excels at a widevariety of globally distributed remote sensing tasks and outperforms muchlarger models. Presto can be used for transfer learning or as a featureextractor for simple models, enabling efficient deployment at scale.", "output": "Lightweight, Pre-trained Transformers for Remote Sensing Timeseries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning methods are highly accurate, yet their opaque decision processprevents them from earning full human trust. Concept-based models aim toaddress this issue by learning tasks based on a set of human-understandableconcepts. However, state-of-the-art concept-based models rely onhigh-dimensional concept embedding representations which lack a clear semanticmeaning, thus questioning the interpretability of their decision process. Toovercome this limitation, we propose the Deep Concept Reasoner (DCR), the firstinterpretable concept-based model that builds upon concept embeddings. In DCR,neural networks do not make task predictions directly, but they build syntacticrule structures using concept embeddings. DCR then executes these rules onmeaningful concept truth degrees to provide a final interpretable andsemantically-consistent prediction in a differentiable manner. Our experimentsshow that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretableconcept-based models on challenging benchmarks (ii) discovers meaningful logicrules matching known ground truths even in the absence of concept supervisionduring training, and (iii), facilitates the generation of counterfactualexamples providing the learnt rules as guidance.", "output": "Interpretable Neural-Symbolic Concept Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-object interactions (HOIs) are crucial for human-centric sceneunderstanding applications such as human-centric visual generation, AR/VR, androbotics. Since existing methods mainly explore capturing HOIs, rendering HOIremains less investigated. In this paper, we address this challenge in HOIanimation from a compositional perspective, i.e., animating novel HOIsincluding novel interaction, novel human and/or novel object driven by a novelpose sequence. Specifically, we adopt neural human-object deformation to modeland render HOI dynamics based on implicit neural representations. To enable theinteraction pose transferring among different persons and objects, we thendevise a new compositional conditional neural radiance field (or CC-NeRF),which decomposes the interdependence between human and object using latentcodes to enable compositionally animation control of novel HOIs. Experimentsshow that the proposed method can generalize well to various novel HOIanimation settings. Our project page is ", "output": "Compositional 3D Human-Object Neural Animation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the huge recent breakthroughs in neural networks (NNs) for artificialintelligence (specifically deep convolutional networks) such NNs do not achievehuman-level performance: they can be hacked by images that would fool no humanand lack `common sense'. It has been argued that a basis of human-levelintelligence is mankind's ability to perform relational reasoning: thecomparison of different objects, measuring similarity, grasping of relationsbetween objects and the converse, figuring out the odd one out in a set ofobjects. Mankind can even do this with objects they have never seen before.Here we show how ClusterFlow, a semi-supervised hierarchical clusteringframework can operate on trained NNs utilising the rich multi-dimensional classand feature data found at the pre-SoftMax layer to build a hyperspacial map ofclasses/features and this adds more human-like functionality to modern deepconvolutional neural networks. We demonstrate this with 3 tasks. 1. thestatistical learning based `mistakes' made by infants when attending to imagesof cats and dogs. 2. improving both the resilience to hacking images and theaccurate measure of certainty in deep-NNs. 3. Relational reasoning over sets ofimages, including those not known to the NN nor seen before. We alsodemonstrate that ClusterFlow can work on non-NN data and deal with missing databy testing it on a Chemistry dataset. This work suggests that modern deep NNscan be made more human-like without re-training of the NNs. As it is known thatsome methods used in deep and convolutional NNs are not biologically plausibleor perhaps even the best approach: the ClusterFlow framework can sit on top ofany NN and will be a useful tool to add as NNs are improved in this regard.", "output": "Cluster Flow: how a hierarchical clustering layer make allows deep-NNs more resilient to hacking, more human-like and easily implements relational reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Explainable AI (XAI) aims to answer ethical and legal questions associatedwith the deployment of AI models. However, a considerable number ofdomain-specific reviews highlight the need of a mathematical foundation for thekey notions in the field, considering that even the term \"explanation\" stilllacks a precise definition. These reviews also advocate for a sound andunifying formalism for explainable AI, to avoid the emergence of ill-posedquestions, and to help researchers navigate a rapidly growing body ofknowledge. To the authors knowledge, this paper is the first attempt to fillthis gap by formalizing a unifying theory of XAI. Employing the framework ofcategory theory, and feedback monoidal categories in particular, we firstprovide formal definitions for all essential terms in explainable AI. Then wepropose a taxonomy of the field following the proposed structure, showing howthe introduced theory can be used to categorize all the main classes of XAIsystems currently studied in literature. In summary, the foundation of XAIproposed in this paper represents a significant tool to properly frame futureresearch lines, and a precious guidance for new researchers approaching thefield.", "output": "Categorical Foundations of Explainable AI: A Unifying Formalism of Structures and Semantics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is essential for autonomous robots to be socially compliant whilenavigating in human-populated environments. Machine Learning and, especially,Deep Reinforcement Learning have recently gained considerable traction in thefield of Social Navigation. This can be partially attributed to the resultingpolicies not being bound by human limitations in terms of code complexity orthe number of variables that are handled. Unfortunately, the lack of safetyguarantees and the large data requirements by DRL algorithms make learning inthe real world unfeasible. To bridge this gap, simulation environments arefrequently used. We propose SocNavGym, an advanced simulation environment forsocial navigation that can generate a wide variety of social navigationscenarios and facilitates the development of intelligent social agents.SocNavGym is light-weight, fast, easy-to-use, and can be effortlesslyconfigured to generate different types of social navigation scenarios. It canalso be configured to work with different hand-crafted and data-driven socialreward signals and to yield a variety of evaluation metrics to benchmarkagents' performance. Further, we also provide a case study where a Dueling-DQNagent is trained to learn social-navigation policies using SocNavGym. Theresults provides evidence that SocNavGym can be used to train an agent fromscratch to navigate in simple as well as complex social scenarios. Ourexperiments also show that the agents trained using the data-driven rewardfunction displays more advanced social compliance in comparison to theheuristic-based reward function.", "output": "SocNavGym: A Reinforcement Learning Gym for Social Navigation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While there are abundant researches about evaluating ChatGPT on naturallanguage understanding and generation tasks, few studies have investigated howChatGPT's behavior changes over time. In this paper, we collect acoarse-to-fine temporal dataset called ChatLog, consisting of two parts thatupdate monthly and daily: ChatLog-Monthly is a dataset of 38,730question-answer pairs collected every month including questions from both thereasoning and classification tasks. ChatLog-Daily, on the other hand, consistsof ChatGPT's responses to 1000 identical questions for long-form generationevery day. We conduct comprehensive automatic and human evaluation to providethe evidence for the existence of ChatGPT evolving patterns. We further analyzethe unchanged characteristics of ChatGPT over time by extracting its knowledgeand linguistic features. We find some stable features to improve the robustnessof a RoBERTa-based detector on new versions of ChatGPT. We will continuouslymaintain our project at ", "output": "ChatLog: Recording and Analyzing ChatGPT Across Time."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many decision-making problems feature multiple objectives. In such problems,it is not always possible to know the preferences of a decision-maker fordifferent objectives. However, it is often possible to observe the behavior ofdecision-makers. In multi-objective decision-making, preference inference isthe process of inferring the preferences of a decision-maker for differentobjectives. This research proposes a Dynamic Weight-based Preference Inference(DWPI) algorithm that can infer the preferences of agents acting inmulti-objective decision-making problems, based on observed behaviortrajectories in the environment. The proposed method is evaluated on threemulti-objective Markov decision processes: Deep Sea Treasure, Traffic, and ItemGathering. The performance of the proposed DWPI approach is compared to twoexisting preference inference methods from the literature, and empiricalresults demonstrate significant improvements compared to the baselinealgorithms, in terms of both time requirements and accuracy of the inferredpreferences. The Dynamic Weight-based Preference Inference algorithm alsomaintains its performance when inferring preferences for sub-optimal behaviordemonstrations. In addition to its impressive performance, the DynamicWeight-based Preference Inference algorithm does not require any interactionsduring training with the agent whose preferences are inferred, all that isrequired is a trajectory of observed behavior.", "output": "Inferring Preferences from Demonstrations in Multi-objective Reinforcement Learning: A Dynamic Weight-based Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is challenging to quantify numerical preferences for different objectivesin a multi-objective decision-making problem. However, the demonstrations of auser are often accessible. We propose an algorithm to infer linear preferenceweights from either optimal or near-optimal demonstrations. The algorithm isevaluated in three environments with two baseline methods. Empirical resultsdemonstrate significant improvements compared to the baseline algorithms, interms of both time requirements and accuracy of the inferred preferences. Infuture work, we plan to evaluate the algorithm's effectiveness in a multi-agentsystem, where one of the agents is enabled to infer the preferences of anopponent using our preference inference algorithm.", "output": "Preference Inference from Demonstration in Multi-objective Multi-agent Decision Making."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "AI and ML models have already found many applications in critical domains,such as healthcare and criminal justice. However, fully automating suchhigh-stakes applications can raise ethical or fairness concerns. Instead, insuch cases, humans should be assisted by automated systems so that the twoparties reach a joint decision, stemming out of their interaction. In this workwe conduct an empirical study to identify how uncertainty estimates and modelexplanations affect users' reliance, understanding, and trust towards a model,looking for potential benefits of bringing the two together. Moreover, we seekto assess how users' behaviour is affected by their own self-confidence intheir abilities to perform a certain task, while we also discuss how the lattermay distort the outcome of an analysis based on agreement and switchingpercentages.", "output": "Why not both? Complementing explanations with uncertainty, and the role of self-confidence in Human-AI collaboration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a framework for semantic segmentation on sparsesequential point clouds of millimeter-wave radar. Compared with cameras andlidars, millimeter-wave radars have the advantage of not revealing privacy,having a strong anti-interference ability, and having long detection distance.The sparsity and capturing temporal-topological features of mmWave data isstill a problem. However, the issue of capturing the temporal-topologicalcoupling features under the human semantic segmentation task prevents previousadvanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) frombeing well utilized in practical scenarios. To address the challenge caused bythe sparsity and temporal-topological feature of the data, we (i) introducegraph structure and topological features to the point cloud, (ii) propose asemantic segmentation framework including a global feature-extracting moduleand a sequential feature-extracting module. In addition, we design an efficientand more fitting loss function for a better training process and segmentationresults based on graph clustering. Experimentally, we deploy representativesemantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset.Experimental results indicate that our model achieves mean accuracy on thecustom dataset by $mathbf{82.31}%$ and outperforms the state-of-the-artalgorithms. Moreover, to validate the model's robustness, we deploy our modelon the well-known S3DIS dataset. On the S3DIS dataset, our model achieves meanaccuracy by $mathbf{92.6}%$, outperforming baseline algorithms.", "output": "Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based language models, including ChatGPT, have demonstratedexceptional performance in various natural language generation tasks. However,there has been limited research evaluating ChatGPT's keyphrase generationability, which involves identifying informative phrases that accurately reflecta document's content. This study seeks to address this gap by comparingChatGPT's keyphrase generation performance with state-of-the-art models, whilealso testing its potential as a solution for two significant challenges in thefield: domain adaptation and keyphrase generation from long documents. Weconducted experiments on six publicly available datasets from scientificarticles and news domains, analyzing performance on both short and longdocuments. Our results show that ChatGPT outperforms current state-of-the-artmodels in all tested datasets and environments, generating high-qualitykeyphrases that adapt well to diverse domains and document lengths.", "output": "ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical artificial general intelligence (MAGI) enables one foundation modelto solve different medical tasks, which is very practical in the medicaldomain. It can significantly reduce the requirement of large amounts oftask-specific data by sufficiently sharing medical knowledge among differenttasks. However, due to the challenges of designing strongly generalizablemodels with limited and complex medical data, most existing approaches tend todevelop task-specific models. To take a step towards MAGI, we propose a newparadigm called Medical-knOwledge-enhanced mulTimOdal pretRaining (MOTOR). InMOTOR, we combine two kinds of basic medical knowledge, i.e., general andspecific knowledge, in a complementary manner to boost the general pretrainingprocess. As a result, the foundation model with comprehensive basic knowledgecan learn compact representations from pretraining radiographic data for bettercross-modal alignment. MOTOR unifies the understanding and generation, whichare two kinds of core intelligence of an AI system, into a single medicalfoundation model, to flexibly handle more diverse medical tasks. To enable acomprehensive evaluation and facilitate further research, we construct amedical multimodal benchmark including a wide range of downstream tasks, suchas chest x-ray report generation and medical visual question answering.Extensive experiments on our benchmark show that MOTOR obtains promisingresults through simple task-oriented adaptation. The visualization shows thatthe injected knowledge successfully highlights key information in the medicaldata, demonstrating the excellent interpretability of MOTOR. Our MOTORsuccessfully mimics the human practice of fulfilling a \"medical student\" toaccelerate the process of becoming a \"specialist\". We believe that our workmakes a significant stride in realizing MAGI.", "output": "Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of the linear law-based feature space transformation (LLT) algorithmis to assist with the classification of univariate and multivariate timeseries. The presented R package, called LLT, implements this algorithm in aflexible yet user-friendly way. This package first splits the instances intotraining and test sets. It then utilizes time-delay embedding and spectraldecomposition techniques to identify the governing patterns (called linearlaws) of each input sequence (initial feature) within the training set.Finally, it applies the linear laws of the training set to transform theinitial features of the test set. These steps are performed by three separatefunctions called trainTest, trainLaw, and testTrans. Their application requiresa predefined data structure; however, for fast calculation, they use onlybuilt-in functions. The LLT R package and a sample dataset with the appropriatedata structure are publicly available on GitHub.", "output": "LLT: An R package for Linear Law-based Feature Space Transformation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning (DL) has been a revolutionary technique in various domains. Tofacilitate the model development and deployment, many deep learning frameworksare proposed, among which PyTorch is one of the most popular solutions. Theperformance of ecosystem around PyTorch is critically important, which savesthe costs of training models and reduces the response time of model inferences.In this paper, we propose TorchBench, a novel benchmark suite to study theperformance of PyTorch software stack. Unlike existing benchmark suites,TorchBench encloses many representative models, covering a large PyTorch APIsurface. TorchBench is able to comprehensively characterize the performance ofthe PyTorch software stack, guiding the performance optimization across models,PyTorch framework, and GPU libraries. We show two practical use cases ofTorchBench. (1) We profile TorchBench to identify GPU performanceinefficiencies in PyTorch. We are able to optimize many performance bugs andupstream patches to the official PyTorch repository. (2) We integrateTorchBench into PyTorch continuous integration system. We are able to identifyperformance regression in multiple daily code checkins to prevent PyTorchrepository from introducing performance bugs. TorchBench is open source andkeeps evolving.", "output": "TorchBench: Benchmarking PyTorch with High API Surface Coverage."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many complex scenarios require the coordination of agents possessing uniquepoints of view and distinct semantic commitments. In response, standpoint logic(SL) was introduced in the context of knowledge integration, allowing one toreason with diverse and potentially conflicting viewpoints by means of indexedmodalities. Another multi-modal logic of import is linear temporal logic (LTL)- a formalism used to express temporal properties of systems and processes,having prominence in formal methods and fields related to artificialintelligence. In this paper, we present standpoint linear temporal logic(SLTL), a new logic that combines the temporal features of LTL with themulti-perspective modelling capacity of SL. We define the logic SLTL, itssyntax, and its semantics, establish its decidability and complexity, andprovide a terminating tableau calculus to automate SLTL reasoning.Conveniently, this offers a clear path to extend existing LTL reasoners withpractical reasoning support for temporal reasoning in multi-perspectivesettings.", "output": "Standpoint Linear Temporal Logic."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Variational Bayes is a popular method for approximate inference but itsderivation can be cumbersome. To simplify the process, we give a 3-step recipeto identify the posterior form by explicitly looking for linearity with respectto expectations of well-known distributions. We can then directly write theupdate by simply ``reading-off'' the terms in front of those expectations. Therecipe makes the derivation easier, faster, shorter, and more general.", "output": "Variational Bayes Made Easy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous driving services rely heavily on sensors such as cameras, LiDAR,radar, and communication modules. A common practice of processing the senseddata is using a high-performance computing unit placed inside the vehicle,which deploys AI models and algorithms to act as the brain or administrator ofthe vehicle. The vehicular data generated from average hours of driving can beup to 20 Terabytes depending on the data rate and specification of the sensors.Given the scale and fast growth of services for autonomous driving, it isessential to improve the overall energy and environmental efficiency,especially in the trend towards vehicular electrification (e.g.,battery-powered). Although the areas have seen significant advancements insensor technologies, wireless communications, computing and AI/ML algorithms,the challenge still exists in how to apply and integrate those technologyinnovations to achieve energy efficiency. This survey reviews and compares theconnected vehicular applications, vehicular communications, approximation andEdge AI techniques. The focus is on energy efficiency by covering newlyproposed approximation and enabling frameworks. To the best of our knowledge,this survey is the first to review the latest approximate Edge AI frameworksand publicly available datasets in energy-efficient autonomous driving. Theinsights and vision from this survey can be beneficial for the collaborativedriving service development on low-power and memory-constrained systems andalso for the energy optimization of autonomous vehicles.", "output": "A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current space-based missions, such as the Transiting Exoplanet SurveySatellite (TESS), provide a large database of light curves that must beanalysed efficiently and systematically. In recent years, deep learning (DL)methods, particularly convolutional neural networks (CNN), have been used toclassify transit signals of candidate exoplanets automatically. However, CNNshave some drawbacks; for example, they require many layers to capturedependencies on sequential data, such as light curves, making the network solarge that it eventually becomes impractical. The self-attention mechanism is aDL technique that attempts to mimic the action of selectively focusing on somerelevant things while ignoring others. Models, such as the Transformerarchitecture, were recently proposed for sequential data with successfulresults. Based on these successful models, we present a new architecture forthe automatic classification of transit signals. Our proposed architecture isdesigned to capture the most significant features of a transit signal andstellar parameters through the self-attention mechanism. In addition to modelprediction, we take advantage of attention map inspection, obtaining a moreinterpretable DL approach. Thus, we can identify the relevance of each elementto differentiate a transit signal from false positives, simplifying the manualexamination of candidates. We show that our architecture achieves competitiveresults concerning the CNNs applied for recognizing exoplanetary transitsignals in data from the TESS telescope. Based on these results, we demonstratethat applying this state-of-the-art DL model to light curves can be a powerfultechnique for transit signal detection while offering a level ofinterpretability.", "output": "Distinguishing a planetary transit from false positives: a Transformer-based classification for planetary transit signals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models generate fluent texts and can follow natural languageinstructions to solve a wide range of tasks without task-specific training.Nevertheless, it is notoriously difficult to control their generation tosatisfy the various constraints required by different applications. In thiswork, we present InstructCTG, a controlled text generation framework thatincorporates different constraints by conditioning on natural languagedescriptions and demonstrations of the constraints. In particular, we firstextract the underlying constraints of natural texts through a combination ofoff-the-shelf NLP tools and simple heuristics. We then verbalize theconstraints into natural language instructions to form weakly supervisedtraining data. By prepending natural language descriptions of the constraintsand a few demonstrations, we fine-tune a pre-trained language model toincorporate various types of constraints. Compared to existing search-based orscore-based methods, InstructCTG is more flexible to different constraint typesand has a much smaller impact on the generation quality and speed because itdoes not modify the decoding procedure. Additionally, InstructCTG allows themodel to adapt to new constraints without re-training through the use offew-shot task generalization and in-context learning abilities ofinstruction-tuned language models.", "output": "Controlled Text Generation with Natural Language Instructions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing instance segmentation techniques are primarily tailored forhigh-visibility inputs, but their performance significantly deteriorates inextremely low-light environments. In this work, we take a deep look at instancesegmentation in the dark and introduce several techniques that substantiallyboost the low-light inference accuracy. The proposed method is motivated by theobservation that noise in low-light images introduces high-frequencydisturbances to the feature maps of neural networks, thereby significantlydegrading performance. To suppress this ``feature noise\", we propose a novellearning method that relies on an adaptive weighted downsampling layer, asmooth-oriented convolutional block, and disturbance suppression learning.These components effectively reduce feature noise during downsampling andconvolution operations, enabling the model to learn disturbance-invariantfeatures. Furthermore, we discover that high-bit-depth RAW images can betterpreserve richer scene information in low-light conditions compared to typicalcamera sRGB outputs, thus supporting the use of RAW-input algorithms. Ouranalysis indicates that high bit-depth can be critical for low-light instancesegmentation. To mitigate the scarcity of annotated RAW datasets, we leverage alow-light RAW synthetic pipeline to generate realistic low-light data. Inaddition, to facilitate further research in this direction, we capture areal-world low-light instance segmentation dataset comprising over two thousandpaired low/normal-light images with instance-level pixel-wise annotations.Remarkably, without any image preprocessing, we achieve satisfactoryperformance on instance segmentation in very low light (4~% AP higher thanstate-of-the-art competitors), meanwhile opening new opportunities for futureresearch.", "output": "Instance Segmentation in the Dark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new problem formulation, Double-Deck Multi-Agent Pickup andDelivery (DD-MAPD), which models the multi-robot shelf rearrangement problem inautomated warehouses. DD-MAPD extends both Multi-Agent Pickup and Delivery(MAPD) and Multi-Agent Path Finding (MAPF) by allowing agents to move beneathshelves or lift and deliver a shelf to an arbitrary location, thereby changingthe warehouse layout. We show that solving DD-MAPD is NP-hard. To tackleDD-MAPD, we propose MAPF-DECOMP, an algorithmic framework that decomposes aDD-MAPD instance into a MAPF instance for coordinating shelf trajectories and asubsequent MAPD instance with task dependencies for computing paths for agents.We also present an optimization technique to improve the performance ofMAPF-DECOMP and demonstrate how to make MAPF-DECOMP complete for well-formedDD-MAPD instances, a realistic subclass of DD-MAPD instances. Our experimentalresults demonstrate the efficiency and effectiveness of MAPF-DECOMP, with theability to compute high-quality solutions for large-scale instances with overone thousand shelves and hundreds of agents in just minutes of runtime.", "output": "Double-Deck Multi-Agent Pickup and Delivery: Multi-Robot Rearrangement in Large-Scale Warehouses."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in the field of natural language generation havefacilitated the use of large language models to assess the quality of generatedtext. Although these models have shown promising results in tasks such asmachine translation and summarization, their applicability in code generationtasks remains limited without human involvement. The complexity of programmingconcepts required for such tasks makes it difficult to develop evaluationmetrics that align with human judgment. Token-matching-based metrics, such asBLEU, have demonstrated weak correlations with human practitioners in codegeneration tasks. Moreover, the utilization of human-written test suites toevaluate functional correctness can be challenging in domains with lowresources. To overcome these obstacles, we propose a new evaluation frameworkbased on the GPT-3.5 (texttt{GPT-3.5-turbo}), for code generation assessments.Our framework addresses the limitations of existing approaches by achievingsuperior correlations with functional correctness and human preferences,without the need for test oracles or references. We evaluate the efficacy ofour framework on two different tasks and four programming languages, comparingits performance with the state-of-the-art CodeBERTScore metric, which relies ona pre-trained model. Our results demonstrate that our framework surpassesCodeBERTScore, delivering high levels of accuracy and consistency acrossvarious programming languages and tasks. We also make our evaluation frameworkand datasets available to the public aturl{ encouraging further research inthe evaluation of code generation.", "output": "Large Language Models Are State-of-the-Art Evaluators of Code Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Standpoint EL is a multi-modal extension of the popular description logic ELthat allows for the integrated representation of domain knowledge relative todiverse standpoints or perspectives. Advantageously, its satisfiability problemhas recently been shown to be in PTime, making it a promising framework forlarge-scale knowledge integration.In this paper, we show that we can further push the expressivity of thisformalism, arriving at an extended logic, called Standpoint EL+, which allowsfor axiom negation, role chain axioms, self-loops, and other features, whilemaintaining tractability. This is achieved by designing asatisfiability-checking deduction calculus, which at the same time addressesthe need for practical algorithms. We demonstrate the feasibility of ourcalculus by presenting a prototypical Datalog implementation of its deductionrules.", "output": "Pushing the Boundaries of Tractable Multiperspective Reasoning: A Deduction Calculus for Standpoint EL+."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of this paper is to learn more about how idiomatic information isstructurally encoded in embeddings, using a structural probing method. Werepurpose an existing English verbal multi-word expression (MWE) dataset tosuit the probing framework and perform a comparative probing study of static(GloVe) and contextual (BERT) embeddings. Our experiments indicate that bothencode some idiomatic information to varying degrees, but yield conflictingevidence as to whether idiomaticity is encoded in the vector norm, leaving thisan open question. We also identify some limitations of the used dataset andhighlight important directions for future work in improving its suitability fora probing analysis.", "output": "Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we investigate the use of data obtained from prompting a largegenerative language model, ChatGPT, to generate synthetic training data withthe aim of augmenting data in low resource scenarios. We show that withappropriate task-specific ChatGPT prompts, we outperform the most popularexisting approaches for such data augmentation. Furthermore, we investigatemethodologies for evaluating the similarity of the augmented data generatedfrom ChatGPT with the aim of validating and assessing the quality of the datagenerated.", "output": "ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This is an audit framework for AI-nudging. Unlike the static form of nudgingusually discussed in the literature, we focus here on a type of nudging thatuses large amounts of data to provide personalized, dynamic feedback andinterfaces. We call this AI-nudging (Lanzing, 2019, p. 549; Yeung, 2017). Theultimate goal of the audit outlined here is to ensure that an AI system thatuses nudges will maintain a level of moral inertia and neutrality by complyingwith the recommendations, requirements, or suggestions of the audit (in otherwords, the criteria of the audit). In the case of unintended negativeconsequences, the audit suggests risk mitigation mechanisms that can be put inplace. In the case of unintended positive consequences, it suggests somereinforcement mechanisms. Sponsored by the IBM-Notre Dame Tech Ethics Lab", "output": "An Audit Framework for Adopting AI-Nudging on Children."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes our system for SemEval-2023 Task 3 Subtask 2 on FramingDetection. We used a multi-label contrastive loss for fine-tuning largepre-trained language models in a multi-lingual setting, achieving verycompetitive results: our system was ranked first on the official test set andon the official shared task leaderboard for five of the six languages for whichwe had training data and for which we could perform fine-tuning. Here, wedescribe our experimental setup, as well as various ablation studies. The codeof our system is available at ", "output": "MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label Framing Detection with Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this position paper, we argue that careless reliance on AI to answer ourquestions and to judge our output is a violation of Grice's Maxim of Quality aswell as a violation of Lemoine's legal Maxim of Innocence, performing an(unwarranted) authority fallacy, and while lacking assessment signals,committing Type II errors that result from fallacies of the inverse. What ismissing in the focus on output and results of AI-generated and AI-evaluatedcontent is, apart from paying proper tribute, the demand to follow a person'sthought process (or a machine's decision processes). In deliberately avoidingNeural Networks that cannot explain how they come to their conclusions, weintroduce logic-symbolic inference to handle any possible epistemics any humanor artificial information processor may have. Our system can deal with variousbelief systems and shows how decisions may differ for what is true, false,realistic, unrealistic, literal, or anomalous. As is, stota AI such as ChatGPTis a sorcerer's apprentice.", "output": "Epistemic considerations when AI answers questions for us."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A wave of new task-based virtual assistants has been fueled by increasinglypowerful large language models, such as GPT-4. These conversational agents canbe customized to serve customer-specific use cases, but ensuring thatagent-generated text conforms to designer-specified rules included in promptinstructions alone is challenging. Therefore, chatbot designers often useanother model, called a guardrail model, to verify that the agent output alignswith their rules and constraints. We explore using a distillation approach toguardrail models to monitor the output of the first model using training datafrom GPT-4. We find two crucial steps to our CONSCENDI process:scenario-augmented generation and contrastive training examples. Whengenerating conversational data, we generate a set of rule-breaking scenarios,which enumerate a diverse set of high-level ways a rule can be violated. Thisscenario-guided approach produces a diverse training set of rule-violatingconversations, and it provides chatbot designers greater control over theclassification process. We also prompt GPT-4 to also generate contrastiveexamples by altering conversations with violations into acceptableconversations. This set of borderline, contrastive examples enables thedistilled model to learn finer-grained distinctions between what is acceptableand what is not. We find that CONSCENDI results in guardrail models thatimprove over baselines.", "output": "CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Analogical Networks, a model that encodes domain knowledgeexplicitly, in a collection of structured labelled 3D scenes, in addition toimplicitly, as model parameters, and segments 3D object scenes with analogicalreasoning: instead of mapping a scene to part segments directly, our modelfirst retrieves related scenes from memory and their corresponding partstructures, and then predicts analogous part structures for the input scene,via an end-to-end learnable modulation mechanism. By conditioning on more thanone retrieved memories, compositions of structures are predicted, that mix andmatch parts across the retrieved memories. One-shot, few-shot or many-shotlearning are treated uniformly in Analogical Networks, by conditioning on theappropriate set of memories, whether taken from a single, few or many memoryexemplars, and inferring analogous parses. We show Analogical Networks arecompetitive with state-of-the-art 3D segmentation transformers in many-shotsettings, and outperform them, as well as existing paradigms of meta-learningand few-shot learning, in few-shot settings. Analogical Networks successfullysegment instances of novel object categories simply by expanding their memory,without any weight updates. Our code and models are publicly available in theproject webpage: ", "output": "Analogy-Forming Transformers for Few-Shot 3D Parsing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is compositional; an instruction can express multiple relationconstraints to hold among objects in a scene that a robot is tasked torearrange. Our focus in this work is an instructable scene rearrangingframework that generalizes to longer instructions and to spatial conceptcompositions never seen at training time. We propose to representlanguage-instructed spatial concepts with energy functions over relative objectarrangements. A language parser maps instructions to corresponding energyfunctions and an open-vocabulary visual-language model grounds their argumentsto relevant objects in the scene. We generate goal scene configurations bygradient descent on the sum of energy functions, one per language predicate inthe instruction. Local vision-based policies then relocate objects to theinferred goal locations. We test our model on established instruction-guidedmanipulation benchmarks, as well as benchmarks of compositional instructions weintroduce. We show our model can execute highly compositional instructionszero-shot in simulation and in the real world. It outperformslanguage-to-action reactive policies and Large Language Model planners by alarge margin, especially for long instructions that involve compositions ofmultiple spatial concepts.", "output": "Energy-based Models as Zero-Shot Planners for Compositional Scene Rearrangement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of inferring scene affordances by presenting a methodfor realistically inserting people into scenes. Given a scene image with amarked region and an image of a person, we insert the person into the scenewhile respecting the scene affordances. Our model can infer the set ofrealistic poses given the scene context, re-pose the reference person, andharmonize the composition. We set up the task in a self-supervised fashion bylearning to re-pose humans in video clips. We train a large-scale diffusionmodel on a dataset of 2.4M video clips that produces diverse plausible poseswhile respecting the scene context. Given the learned human-scene composition,our model can also hallucinate realistic people and scenes when promptedwithout conditioning and also enables interactive editing. A quantitativeevaluation shows that our method synthesizes more realistic human appearanceand more natural human-scene interactions than prior work.", "output": "Putting People in Their Place: Affordance-Aware Human Insertion into Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Neural Networks (DNNs) learn representation from data with an impressivecapability, and brought important breakthroughs for processing images,time-series, natural language, audio, video, and many others. In the remotesensing field, surveys and literature revisions specifically involving DNNsalgorithms' applications have been conducted in an attempt to summarize theamount of information produced in its subfields. Recently, Unmanned AerialVehicles (UAV) based applications have dominated aerial sensing research.However, a literature revision that combines both \"deep learning\" and \"UAVremote sensing\" thematics has not yet been conducted. The motivation for ourwork was to present a comprehensive review of the fundamentals of Deep Learning(DL) applied in UAV-based imagery. We focused mainly on describingclassification and regression techniques used in recent applications withUAV-acquired data. For that, a total of 232 papers published in internationalscientific journal databases was examined. We gathered the published materialand evaluated their characteristics regarding application, sensor, andtechnique used. We relate how DL presents promising results and has thepotential for processing tasks associated with UAV-based image data. Lastly, weproject future perspectives, commentating on prominent DL paths to be exploredin the UAV remote sensing field. Our revision consists of a friendly-approachto introduce, commentate, and summarize the state-of-the-art in UAV-based imageapplications with DNNs algorithms in diverse subfields of remote sensing,grouping it in the environmental, urban, and agricultural contexts.", "output": "A Review on Deep Learning in UAV Remote Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Labelling point clouds fully is highly time-consuming and costly. As largerpoint cloud datasets with billions of points become more common, we ask whetherthe full annotation is even necessary, demonstrating that existing baselinesdesigned under a fully annotated assumption only degrade slightly even whenfaced with 1% random point annotations. However, beyond this point, e.g., at0.1% annotations, segmentation accuracy is unacceptably low. We observe that,as point clouds are samples of the 3D world, the distribution of points in alocal neighborhood is relatively homogeneous, exhibiting strong semanticsimilarity. Motivated by this, we propose a new weak supervision method toimplicitly augment highly sparse supervision signals. Extensive experimentsdemonstrate the proposed Semantic Query Network (SQN) achieves promisingperformance on seven large-scale open datasets under weak supervision schemes,while requiring only 0.1% randomly annotated points for training, greatlyreducing annotation cost and effort. The code is available at", "output": "SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image-to-image translation (i2i) networks suffer from entanglement effects inpresence of physics-related phenomena in target domain (such as occlusions,fog, etc), lowering altogether the translation quality, controllability andvariability. In this paper, we propose a general framework to disentanglevisual traits in target images. Primarily, we build upon collection of simplephysics models, guiding the disentanglement with a physical model that renderssome of the target traits, and learning the remaining ones. Because physicsallows explicit and interpretable outputs, our physical models (optimallyregressed on target) allows generating unseen scenarios in a controllablemanner. Secondarily, we show the versatility of our framework to neural-guideddisentanglement where a generative network is used in place of a physical modelin case the latter is not directly accessible. Altogether, we introduce threestrategies of disentanglement being guided from either a fully differentiablephysics model, a (partially) non-differentiable physics model, or a neuralnetwork. The results show our disentanglement strategies dramatically increaseperformances qualitatively and quantitatively in several challenging scenariosfor image translation.", "output": "Physics-informed Guided Disentanglement in Generative Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Currently, there is a burgeoning demand for deploying deep learning (DL)models on ubiquitous edge Internet of Things (IoT) devices attributed to theirlow latency and high privacy preservation. However, DL models are often largein size and require large-scale computation, which prevents them from beingplaced directly onto IoT devices, where resources are constrained and 32-bitfloating-point (float-32) operations are unavailable. Commercial framework(i.e., a set of toolkits) empowered model quantization is a pragmatic solutionthat enables DL deployment on mobile devices and embedded systems byeffortlessly post-quantizing a large high-precision model (e.g., float-32) intoa small low-precision model (e.g., int-8) while retaining the model inferenceaccuracy. However, their usability might be threatened by securityvulnerabilities.This work reveals that the standard quantization toolkits can be abused toactivate a backdoor. We demonstrate that a full-precision backdoored modelwhich does not have any backdoor effect in the presence of a trigger -- as thebackdoor is dormant -- can be activated by the default i) TensorFlow-Lite(TFLite) quantization, the only product-ready quantization framework to date,and ii) the beta released PyTorch Mobile framework. When each of the float-32models is converted into an int-8 format model through the standard TFLite orPytorch Mobile framework's post-training quantization, the backdoor isactivated in the quantized model, which shows a stable attack success rateclose to 100% upon inputs with the trigger, while it behaves normally uponnon-trigger inputs. This work highlights that a stealthy security threat occurswhen an end user utilizes the on-device post-training model quantizationframeworks, informing security researchers of cross-platform overhaul of DLmodels post quantization even if these models pass front-end backdoorinspections.", "output": "Quantization Backdoors to Deep Learning Commercial Frameworks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The capability of a mobile robot to efficiently and safely perform complexmissions is limited by its knowledge of the environment, namely the situation.Advanced reasoning, decision-making, and execution skills enable an intelligentagent to act autonomously in unknown environments. Situational Awareness (SA)is a fundamental capability of humans that has been deeply studied in variousfields, such as psychology, military, aerospace, and education. Nevertheless,it has yet to be considered in robotics, which has focused on singlecompartmentalized concepts such as sensing, spatial perception, sensor fusion,state estimation, and Simultaneous Localization and Mapping (SLAM). Hence, thepresent research aims to connect the broad multidisciplinary existing knowledgeto pave the way for a complete SA system for mobile robotics that we deemparamount for autonomy. To this aim, we define the principal components tostructure a robotic SA and their area of competence. Accordingly, this paperinvestigates each aspect of SA, surveying the state-of-the-art roboticsalgorithms that cover them, and discusses their current limitations.Remarkably, essential aspects of SA are still immature since the currentalgorithmic development restricts their performance to only specificenvironments. Nevertheless, Artificial Intelligence (AI), particularly DeepLearning (DL), has brought new methods to bridge the gap that maintains thesefields apart from the deployment to real-world scenarios. Furthermore, anopportunity has been discovered to interconnect the vastly fragmented space ofrobotic comprehension algorithms through the mechanism of Situational Graph(S-Graph), a generalization of the well-known scene graph. Therefore, wefinally shape our vision for the future of robotic Situational Awareness bydiscussing interesting recent research directions.", "output": "From SLAM to Situational Awareness: Challenges and Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents the design and results of the \"PEg TRAnsfert Workflowrecognition\" (PETRAW) challenge whose objective was to develop surgicalworkflow recognition methods based on one or several modalities, among video,kinematic, and segmentation data, in order to study their added value. ThePETRAW challenge provided a data set of 150 peg transfer sequences performed ona virtual simulator. This data set was composed of videos, kinematics, semanticsegmentation, and workflow annotations which described the sequences at threedifferent granularity levels: phase, step, and activity. Five tasks wereproposed to the participants: three of them were related to the recognition ofall granularities with one of the available modalities, while the othersaddressed the recognition with a combination of modalities. Averageapplication-dependent balanced accuracy (AD-Accuracy) was used as evaluationmetric to take unbalanced classes into account and because it is moreclinically relevant than a frame-by-frame score. Seven teams participated in atleast one task and four of them in all tasks. Best results are obtained withthe use of the video and the kinematics data with an AD-Accuracy between 93%and 90% for the four teams who participated in all tasks. The improvementbetween video/kinematic-based methods and the uni-modality ones was significantfor all of the teams. However, the difference in testing execution time betweenthe video/kinematic-based and the kinematic-based methods has to be taken intoconsideration. Is it relevant to spend 20 to 200 times more computing time forless than 3% of improvement? The PETRAW data set is publicly available atwww.synapse.org/PETRAW to encourage further research in surgical workflowrecognition.", "output": "PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Providing natural language instructions in prompts is a useful new paradigmfor improving task performance of large language models in a zero-shot setting.Recent work has aimed to improve such prompts via manual rewriting orgradient-based tuning. However, manual rewriting is time-consuming and requiressubjective interpretation, while gradient-based tuning can be extremelycomputationally demanding for large models and may not be feasible forAPI-based models. In this work, we introduce Gradient-free Instructional PromptSearch (GrIPS), a gradient-free, edit-based search approach for improving taskinstructions for large language models. GrIPS takes in instructions designedfor humans and automatically returns an improved, edited prompt, while allowingfor API-based tuning. With InstructGPT models, GrIPS improves the average taskperformance by up to 4.30 percentage points on eight classification tasks fromthe Natural Instructions dataset (with similar improvements for OPT, BLOOM, andFLAN-T5). We see improvements for both instruction-only prompts and instruction+ k-shot examples prompts. Notably, GrIPS outperforms manual rewriting andpurely example-based prompts while controlling for the available compute anddata budget. Further, performance of GrIPS is comparable to selectgradient-based tuning approaches. Qualitatively, we show our edits can simplifyinstructions and at times make them incoherent but nonetheless improveaccuracy. Our code is available at: ", "output": "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective exploration is a challenge in reinforcement learning (RL).Novelty-based exploration methods can suffer in high-dimensional state spaces,such as continuous partially-observable 3D environments. We address thischallenge by defining novelty using semantically meaningful state abstractions,which can be found in learned representations shaped by natural language. Inparticular, we evaluate vision-language representations, pretrained on naturalimage captioning datasets. We show that these pretrained representations drivemeaningful, task-relevant exploration and improve performance on 3D simulatedenvironments. We also characterize why and how language provides usefulabstractions for exploration by considering the impacts of usingrepresentations from a pretrained model, a language oracle, and severalablations. We demonstrate the benefits of our approach in two very differenttask domains -- one that stresses the identification and manipulation ofeveryday objects, and one that requires navigational exploration in anexpansive world. Our results suggest that using language-shaped representationscould improve exploration for various algorithms and agents in challengingenvironments.", "output": "Semantic Exploration from Language Abstractions and Pretrained Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent increase in the volume of online meetings necessitates automatedtools for managing and organizing the material, especially when an attendee hasmissed the discussion and needs assistance in quickly exploring it. In thiswork, we propose a novel end-to-end framework for generating interactivequestionnaires for preference-based meeting exploration. As a result, users aresupplied with a list of suggested questions reflecting their preferences. Sincethe task is new, we introduce an automatic evaluation strategy. Namely, itmeasures how much the generated questions via questionnaire are answerable toensure factual correctness and covers the source meeting for the depth ofpossible exploration.", "output": "PREME: Preference-based Meeting Exploration through an Interactive Questionnaire."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized medicine remains a major challenge for scientists. The rapidgrowth of Machine learning and Deep learning has made them a feasible al-ternative for predicting the most appropriate therapy for individual patients.However, the need to develop a custom model for every dataset, the lack ofinterpretation of their results and high computational requirements make manyreluctant to use these methods. Aiming to save time and bring light to the waymodels work internally, SIBILA has been developed. SIBILA is an ensemble ofmachine learning and deep learning models that applies a range ofinterpretability algorithms to identify the most relevant input features. Sincethe interpretability algo- rithms may not be in line with each other, aconsensus stage has been imple- mented to estimate the global attribution ofeach variable to the predictions. SIBILA is containerized to be run on anyhigh-performance computing plat- form. Although conceived as a command-linetool, it is also available to all users free of charge as a web server at Thus, even users with few technological skillscan take advantage of it. SIBILA has been applied to two medical case studiesto show its ability to predict in classification problems. Even though it is ageneral-purpose tool, it has been developed with the aim of becoming a powerfuldecision-making tool for clinicians, but can actually be used in many otherdomains. Thus, other two non-medical examples are supplied as supplementarymaterial to prove that SIBILA still works well with noise and in regressionproblems.", "output": "SIBILA: A novel interpretable ensemble of general-purpose machine learning models applied to medical contexts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A crucial challenge in reinforcement learning is to reduce the number ofinteractions with the environment that an agent requires to master a giventask. Transfer learning proposes to address this issue by re-using knowledgefrom previously learned tasks. However, determining which source task qualifiesas the most appropriate for knowledge extraction, as well as the choiceregarding which algorithm components to transfer, represent severe obstacles toits application in reinforcement learning. The goal of this paper is to addressthese issues with modular multi-source transfer learning techniques. Theproposed techniques automatically learn how to extract useful information fromsource tasks, regardless of the difference in state-action space and rewardfunction. We support our claims with extensive and challenging cross-domainexperiments for visual control.", "output": "Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although Deep Neural Networks (DNNs) are incredibly effective in learningcomplex abstractions, they are susceptible to unintentionally learning spuriousartifacts from the training data. To ensure model transparency, it is crucialto examine the relationships between learned representations, as unintendedconcepts often manifest themselves to be anomalous to the desired task. In thiswork, we introduce DORA (Data-agnOstic Representation Analysis): the firstdata-agnostic framework for the analysis of the representation space of DNNs.Our framework employs the proposed Extreme-Activation (EA) distance measurebetween representations that utilizes self-explaining capabilities within thenetwork without accessing any data. We quantitatively validate the metric'scorrectness and alignment with human-defined semantic distances. The coherencebetween the EA distance and human judgment enables us to identifyrepresentations whose underlying concepts would be considered unnatural byhumans by identifying outliers in functional distance. Finally, we demonstratethe practical usefulness of DORA by analyzing and identifying artifactrepresentations in popular Computer Vision models.", "output": "DORA: Exploring outlier representations in Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce Latent Go-Explore (LGE), a simple and generalapproach based on the Go-Explore paradigm for exploration in reinforcementlearning (RL). Go-Explore was initially introduced with a strong domainknowledge constraint for partitioning the state space into cells. However, inmost real-world scenarios, drawing domain knowledge from raw observations iscomplex and tedious. If the cell partitioning is not informative enough,Go-Explore can completely fail to explore the environment. We argue that theGo-Explore approach can be generalized to any environment without domainknowledge and without cells by exploiting a learned latent representation.Thus, we show that LGE can be flexibly combined with any strategy for learninga latent representation. Our results indicate that LGE, although simpler thanGo-Explore, is more robust and outperforms state-of-the-art algorithms in termsof pure exploration on multiple hard-exploration environments includingMontezuma's Revenge. The LGE implementation is available as open-source at", "output": "Cell-Free Latent Go-Explore."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Closed-book question answering (QA) requires a model to directly answer anopen-domain question without access to any external knowledge. Prior work onclosed-book QA either directly finetunes or prompts a pretrained language model(LM) to leverage the stored knowledge. However, they do not fully exploit theparameterized knowledge. To address this issue, we propose a two-stage,closed-book QA framework which employs a coarse-to-fine approach to extractrelevant knowledge and answer a question. Our approach first generates arelated context for a given question by prompting a pretrained LM. We thenprompt the same LM for answer prediction using the generated context and thequestion. Additionally, to eliminate failure caused by context uncertainty, wemarginalize over generated contexts. Experimental results on three QAbenchmarks show that our method significantly outperforms previous closed-bookQA methods (e.g. exact matching 68.6% vs. 55.3%), and is on par with open-bookmethods that exploit external knowledge sources (e.g. 68.6% vs. 68.0%). Ourmethod is able to better exploit the stored knowledge in pretrained LMs withoutadding extra learnable parameters or needing finetuning, and paves the way forhybrid models that integrate pretrained LMs with external knowledge.", "output": "Context Generation Improves Open Domain Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the absence of artificial labels, the independent and dependent featuresin the data are cluttered. How to construct the inductive biases of the modelto flexibly divide and effectively contain features with different complexityis the main focal point of unsupervised disentangled representation learning.This paper proposes a new iterative decomposition path of total correlation andexplains the disentangled representation ability of VAE from the perspective ofmodel capacity allocation. The newly developed objective function combineslatent variable dimensions into joint distribution while relieving theindependence constraints of marginal distributions in combination, leading tolatent variables with a more manipulable prior distribution. The novel modelenables VAE to adjust the parameter capacity to divide dependent andindependent data features flexibly. Experimental results on various datasetsshow an interesting relevance between model capacity and the latent variablegrouping size, called the \"V\"-shaped best ELBO trajectory. Additionally, weempirically demonstrate that the proposed method obtains better disentanglingperformance with reasonable parameter capacity allocation.", "output": "Break The Spell Of Total Correlation In betaTCVAE."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to ensure that a classifier gives reliable confidence scores isessential to ensure informed decision-making. To this end, recent work hasfocused on miscalibration, i.e., the over or under confidence of model scores.Yet calibration is not enough: even a perfectly calibrated classifier with thebest possible accuracy can have confidence scores that are far from the trueposterior probabilities. This is due to the grouping loss, created by sampleswith the same confidence scores but different true posterior probabilities.Proper scoring rule theory shows that given the calibration loss, the missingpiece to characterize individual errors is the grouping loss. While there aremany estimators of the calibration loss, none exists for the grouping loss instandard settings. Here, we propose an estimator to approximate the groupingloss. We show that modern neural network architectures in vision and NLPexhibit grouping loss, notably in distribution shifts settings, whichhighlights the importance of pre-production validation.", "output": "Beyond calibration: estimating the grouping loss of modern neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning has achieved remarkable success in learning representations formolecules, which is crucial for various biochemical applications, ranging fromproperty prediction to drug design. However, training Deep Neural Networks(DNNs) from scratch often requires abundant labeled molecules, which areexpensive to acquire in the real world. To alleviate this issue, tremendousefforts have been devoted to Molecular Pre-trained Models (CPMs), where DNNsare pre-trained using large-scale unlabeled molecular databases and thenfine-tuned over specific downstream tasks. Despite the prosperity, there lacksa systematic review of this fast-growing field. In this paper, we present thefirst survey that summarizes the current progress of CPMs. We first highlightthe limitations of training molecular representation models from scratch tomotivate CPM studies. Next, we systematically review recent advances on thistopic from several key perspectives, including molecular descriptors, encoderarchitectures, pre-training strategies, and applications. We also highlight thechallenges and promising avenues for future research, providing a usefulresource for both machine learning and scientific communities.", "output": "A Systematic Survey of Chemical Pre-trained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The field of geometric deep learning has had a profound impact on thedevelopment of innovative and powerful graph neural network architectures.Disciplines such as computer vision and computational biology have benefitedsignificantly from such methodological advances, which has led to breakthroughsin scientific domains such as protein structure prediction and design. In thiswork, we introduce GCPNet, a new geometry-complete, SE(3)-equivariant graphneural network designed for 3D molecular graph representation learning.Rigorous experiments across four distinct geometric tasks demonstrate thatGCPNet's predictions (1) for protein-ligand binding affinity achieve astatistically significant correlation of 0.608, more than 5% greater thancurrent state-of-the-art methods; (2) for protein structure ranking achievestatistically significant target-local and dataset-global correlations of 0.616and 0.871, respectively; (3) for Newtownian many-body systems modeling achievea task-averaged mean squared error less than 0.01, more than 15% better thancurrent methods; and (4) for molecular chirality recognition achieve astate-of-the-art prediction accuracy of 98.7%, better than any other machinelearning method to date. The source code, data, and instructions to train newmodels or reproduce our results are freely available at", "output": "Geometry-Complete Perceptron Networks for 3D Molecular Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dermatological diseases are among the most common disorders worldwide. Thispaper presents the first study of the interpretability and imbalancedsemi-supervised learning of the multiclass intelligent skin diagnosis framework(ISDL) using 58,457 skin images with 10,857 unlabeled samples. Pseudo-labelledsamples from minority classes have a higher probability at each iteration ofclass-rebalancing self-training, thereby promoting the utilization of unlabeledsamples to solve the class imbalance problem. Our ISDL achieved a promisingperformance with an accuracy of 0.979, sensitivity of 0.975, specificity of0.973, macro-F1 score of 0.974 and area under the receiver operatingcharacteristic curve (AUC) of 0.999 for multi-label skin diseaseclassification. The Shapley Additive explanation (SHAP) method is combined withour ISDL to explain how the deep learning model makes predictions. This findingis consistent with the clinical diagnosis. We also proposed a samplingdistribution optimisation strategy to select pseudo-labelled samples in a moreeffective manner using ISDLplus. Furthermore, it has the potential to relievethe pressure placed on professional doctors, as well as help with practicalissues associated with a shortage of such doctors in rural areas.", "output": "An interpretable imbalanced semi-supervised deep learning framework for improving differential diagnosis of skin diseases."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual explanation of ``black-box'' models allows researchers in explainableartificial intelligence (XAI) to interpret the model's decisions in ahuman-understandable manner. In this paper, we propose interpretable classactivation mapping for tree crown detection (Crown-CAM) that overcomesinaccurate localization & computational complexity of previous methods whilegenerating reliable visual explanations for the challenging and dynamic problemof tree crown detection in aerial images. It consists of an unsupervisedselection of activation maps, computation of local score maps, andnon-contextual background suppression to efficiently provide fine-grainlocalization of tree crowns in scenarios with dense forest trees or sceneswithout tree crowns. Additionally, two Intersection over Union (IoU)-basedmetrics are introduced to effectively quantify both the accuracy and inaccuracyof generated explanations with respect to regions with or even without treecrowns in the image. Empirical evaluations demonstrate that the proposedCrown-CAM outperforms the Score-CAM, Augmented Score-CAM, and Eigen-CAM methodsby an average IoU margin of 8.7, 5.3, and 21.7 (and 3.3, 9.8, and 16.5)respectively in improving the accuracy (and decreasing inaccuracy) of visualexplanations on the challenging NEON tree crown dataset.", "output": "Crown-CAM: Interpretable Visual Explanations for Tree Crown Detection in Aerial Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intermediate features of a pre-trained model have been shown informative formaking accurate predictions on downstream tasks, even if the model backbone iskept frozen. The key challenge is how to utilize these intermediate featuresgiven their gigantic amount. We propose visual query tuning (VQT), a simple yeteffective approach to aggregate intermediate features of Vision Transformers.Through introducing a handful of learnable ``query'' tokens to each layer, VQTleverages the inner workings of Transformers to ``summarize'' rich intermediatefeatures of each layer, which can then be used to train the prediction heads ofdownstream tasks. As VQT keeps the intermediate features intact and only learnsto combine them, it enjoys memory efficiency in training, compared to manyother parameter-efficient fine-tuning approaches that learn to adapt featuresand need back-propagation through the entire backbone. This also suggests thecomplementary role between VQT and those approaches in transfer learning.Empirically, VQT consistently surpasses the state-of-the-art approach thatutilizes intermediate features for transfer learning and outperforms fullfine-tuning in many cases. Compared to parameter-efficient approaches thatadapt features, VQT achieves much higher accuracy under memory constraints.Most importantly, VQT is compatible with these approaches to attain even higheraccuracy, making it a simple add-on to further boost transfer learning.", "output": "Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many real-world settings agents engage in strategic interactions withmultiple opposing agents who can employ a wide variety of strategies. Thestandard approach for designing agents for such settings is to compute orapproximate a relevant game-theoretic solution concept such as Nash equilibriumand then follow the prescribed strategy. However, such a strategy ignores anyobservations of opponents' play, which may indicate shortcomings that can beexploited. We present an approach for opponent modeling in multiplayerimperfect-information games where we collect observations of opponents' playthrough repeated interactions. We run experiments against a wide variety ofreal opponents and exact Nash equilibrium strategies in three-player Kuhn pokerand show that our algorithm significantly outperforms all of the agents,including the exact Nash equilibrium strategies.", "output": "Bayesian Opponent Modeling in Multiplayer Imperfect-Information Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concept bottleneck models (CBMs) are interpretable neural networks that firstpredict labels for human-interpretable concepts relevant to the predictiontask, and then predict the final label based on the concept label predictions.We extend CBMs to interactive prediction settings where the model can query ahuman collaborator for the label to some concepts. We develop an interactionpolicy that, at prediction time, chooses which concepts to request a label forso as to maximally improve the final prediction. We demonstrate that a simplepolicy combining concept prediction uncertainty and influence of the concept onthe final prediction achieves strong performance and outperforms staticapproaches as well as active feature acquisition methods proposed in theliterature. We show that the interactive CBM can achieve accuracy gains of5-10% with only 5 interactions over competitive baselines on the Caltech-UCSDBirds, CheXpert and OAI datasets.", "output": "Interactive Concept Bottleneck Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The secondary structure of ribonucleic acid (RNA) is more stable andaccessible in the cell than its tertiary structure, making it essential forfunctional prediction. Although deep learning has shown promising results inthis field, current methods suffer from poor generalization and highcomplexity. In this work, we present RFold, a simple yet effective RNAsecondary structure prediction in an end-to-end manner. RFold introduces adecoupled optimization process that decomposes the vanilla constraintsatisfaction problem into row-wise and column-wise optimization, simplifyingthe solving process while guaranteeing the validity of the output. Moreover,RFold adopts attention maps as informative representations instead of designinghand-crafted features. Extensive experiments demonstrate that RFold achievescompetitive performance and about eight times faster inference efficiency thanthe state-of-the-art method. The code and Colab demo are available inhref{}{}.", "output": "RFold: RNA Secondary Structure Prediction with Decoupled Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous manipulation systems operating in domains where human interventionis difficult or impossible (e.g., underwater, extraterrestrial or hazardousenvironments) require a high degree of robustness to sensing and communicationfailures. Crucially, motion planning and control algorithms require a stream ofaccurate joint angle data provided by joint encoders, the failure of which mayresult in an unrecoverable loss of functionality. In this paper, we present anovel method for retrieving the joint angles of a robot manipulator using onlya single RGB image of its current configuration, opening up an avenue forrecovering system functionality when conventional proprioceptive sensing isunavailable. Our approach, based on a distance-geometric representation of theconfiguration space, exploits the knowledge of a robot's kinematic model withthe goal of training a shallow neural network that performs a 2D-to-3Dregression of distances associated with detected structural keypoints. It isshown that the resulting Euclidean distance matrix uniquely corresponds to theobserved configuration, where joint angles can be recovered viamultidimensional scaling and a simple inverse kinematics procedure. We evaluatethe performance of our approach on real RGB images of a Franka Emika Pandamanipulator, showing that the proposed method is efficient and exhibits solidgeneralization ability. Furthermore, we show that our method can be easilycombined with a dense refinement technique to obtain superior results.", "output": "A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial Intelligence (AI) is about making computers that do the sorts ofthings that minds can do, and as we progress towards this goal, we tend toincreasingly delegate human tasks to machines. However, AI systems usually dothese tasks with an unusual imbalance of insight and understanding: new, deeperinsights are present, yet many important qualities that a human mind would havepreviously brought to the activity are utterly absent. Therefore, it is crucialto ask which features of minds have we replicated, which are missing, and ifthat matters. One core feature that humans bring to tasks, when dealing withthe ambiguity, emergent knowledge, and social context presented by the world,is reflection. Yet this capability is utterly missing from current mainstreamAI. In this paper we ask what reflective AI might look like. Then, drawing onnotions of reflection in complex systems, cognitive science, and agents, wesketch an architecture for reflective AI agents, and highlight ways forward.", "output": "Reflective Artificial Intelligence."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model Predictive Control (MPC) is attracting tremendous attention in theautonomous driving task as a powerful control technique. The success of an MPCcontroller strongly depends on an accurate internal dynamics model. However,the static parameters, usually learned by system identification, often fail toadapt to both internal and external perturbations in real-world scenarios. Inthis paper, we firstly (1) reformulate the problem as a Partially ObservedMarkov Decision Process (POMDP) that absorbs the uncertainties intoobservations and maintains Markov property into hidden states; and (2) learn arecurrent policy continually adapting the parameters of the dynamics model viaRecurrent Reinforcement Learning (RRL) for optimal and adaptive control; and(3) finally evaluate the proposed algorithm (referred as $textit{MPC-RRL}$) inCARLA simulator and leading to robust behaviours under a wide range ofperturbations.", "output": "Incorporating Recurrent Reinforcement Learning into Model Predictive Control for Adaptive Control in Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Message Passing Neural Networks (MPNNs) are instances of Graph NeuralNetworks that leverage the graph to send messages over the edges. Thisinductive bias leads to a phenomenon known as over-squashing, where a nodefeature is insensitive to information contained at distant nodes. Despiterecent methods introduced to mitigate this issue, an understanding of thecauses for over-squashing and of possible solutions are lacking. In thistheoretical work, we prove that: (i) Neural network width can mitigateover-squashing, but at the cost of making the whole network more sensitive;(ii) Conversely, depth cannot help mitigate over-squashing: increasing thenumber of layers leads to over-squashing being dominated by vanishinggradients; (iii) The graph topology plays the greatest role, sinceover-squashing occurs between nodes at high commute (access) time. Our analysisprovides a unified framework to study different recent methods introduced tocope with over-squashing and serves as a justification for a class of methodsthat fall under `graph rewiring'.", "output": "On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern representation learning methods often struggle to adapt quickly undernon-stationarity because they suffer from catastrophic forgetting and decayingplasticity. Such problems prevent learners from fast adaptation since they mayforget useful features or have difficulty learning new ones. Hence, thesemethods are rendered ineffective for continual learning. This paper proposesUtility-based Perturbed Gradient Descent (UPGD), an online learning algorithmwell-suited for continual learning agents. UPGD protects useful weights orfeatures from forgetting and perturbs less useful ones based on theirutilities. Our empirical results show that UPGD helps reduce forgetting andmaintain plasticity, enabling modern representation learning methods to workeffectively in continual learning.", "output": "Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is an emerging technique that trains massive andgeographically distributed edge data while maintaining privacy. However, FL hasinherent challenges in terms of fairness and computational efficiency due tothe rising heterogeneity of edges, and thus usually results in sub-optimalperformance in recent state-of-the-art (SOTA) solutions. In this paper, wepropose a Customized Federated Learning (CFL) system to eliminate FLheterogeneity from multiple dimensions. Specifically, CFL tailors personalizedmodels from the specially designed global model for each client jointly guidedby an online trained model-search helper and a novel aggregation algorithm.Extensive experiments demonstrate that CFL has full-stack advantages for bothFL training and edge reasoning and significantly improves the SOTA performancew.r.t. model accuracy (up to 7.2% in the non-heterogeneous environment and upto 21.8% in the heterogeneous environment), efficiency, and FL fairness.", "output": "Towards Fairer and More Efficient Federated Learning via Multidimensional Personalized Edge Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a method to formulate algorithm discovery as program search, andapply it to discover optimization algorithms for deep neural network training.We leverage efficient search techniques to explore an infinite and sparseprogram space. To bridge the large generalization gap between proxy and targettasks, we also introduce program selection and simplification strategies. Ourmethod discovers a simple and effective optimization algorithm, $textbf{Lion}$($textit{Evo$textbf{L}$ved S$textbf{i}$gn M$textbf{o}$me$textbf{n}$tum}$).It is more memory-efficient than Adam as it only keeps track of the momentum.Different from adaptive optimizers, its update has the same magnitude for eachparameter calculated through the sign operation. We compare Lion with widelyused optimizers, such as Adam and Adafactor, for training a variety of modelson different tasks. On image classification, Lion boosts the accuracy of ViT byup to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. Onvision-language contrastive learning, we achieve 88.3% $textit{zero-shot}$ and91.1% $textit{fine-tuning}$ accuracy on ImageNet, surpassing the previous bestresults by 2% and 0.1%, respectively. On diffusion models, Lion outperformsAdam by achieving a better FID score and reducing the training compute by up to2.3x. For autoregressive, masked language modeling, and fine-tuning, Lionexhibits a similar or better performance compared to Adam. Our analysis of Lionreveals that its performance gain grows with the training batch size. It alsorequires a smaller learning rate than Adam due to the larger norm of the updateproduced by the sign function. Additionally, we examine the limitations of Lionand identify scenarios where its improvements are small or not statisticallysignificant. The implementation of Lion is publicly available.", "output": "Symbolic Discovery of Optimization Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial Intelligence and digital health have the potential to transformglobal health. However, having access to representative data to test andvalidate algorithms in realistic production environments is essential. Weintroduce HealthSyn, an open-source synthetic data generator of user behaviorfor testing reinforcement learning algorithms in the context of mobile healthinterventions. The generator utilizes Markov processes to generate diverse useractions, with individual user behavioral patterns that can change in reactionto personalized interventions (i.e., reminders, recommendations, andincentives). These actions are translated into actual logs using an ML-purposeddata schema specific to the mobile health application functionality includedwith HealthKit, and open-source SDK. The logs can be fed to pipelines to obtainuser metrics. The generated data, which is based on real-world behaviors andsimulation techniques, can be used to develop, test, and evaluate, both MLalgorithms in research and end-to-end operational RL-based interventiondelivery frameworks.", "output": "Synthetic Data Generator for Adaptive Interventions in Global Health."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "At the core of bodily self-consciousness is the perception of the ownershipof one's body. Recent efforts to gain a deeper understanding of the mechanismsbehind the brain's encoding of the self-body have led to various attempts todevelop a unified theoretical framework to explain related behavioral andneurophysiological phenomena. A central question to be explained is how bodyillusions such as the rubber hand illusion actually occur. Despite theconceptual descriptions of the mechanisms of bodily self-consciousness and thepossible relevant brain areas, the existing theoretical models still lack anexplanation of the computational mechanisms by which the brain encodes theperception of one's body and how our subjectively perceived body illusions canbe generated by neural networks. Here we integrate the biological findings ofbodily self-consciousness to propose a Brain-inspired bodily self-perceptionmodel, by which perceptions of bodily self can be autonomously constructedwithout any supervision signals. We successfully validated our computationalmodel with six rubber hand illusion experiments and a disability experiment onplatforms including a iCub humanoid robot and simulated environments. Theexperimental results show that our model can not only well replicate thebehavioral and neural data of monkeys in biological experiments, but alsoreasonably explain the causes and results of the rubber hand illusion from theneuronal level due to advantages in biological interpretability, thuscontributing to the revealing of the computational and neural mechanismsunderlying the occurrence of the rubber hand illusion.", "output": "Brain-inspired bodily self-perception model for robot rubber hand illusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For billions of years, evolution has been the driving force behind thedevelopment of life, including humans. Evolution endowed humans with highintelligence, which allowed us to become one of the most successful species onthe planet. Today, humans aim to create artificial intelligence systems thatsurpass even our own intelligence. As artificial intelligences (AIs) evolve andeventually surpass us in all domains, how might evolution shape our relationswith AIs? By analyzing the environment that is shaping the evolution of AIs, weargue that the most successful AI agents will likely have undesirable traits.Competitive pressures among corporations and militaries will give rise to AIagents that automate human roles, deceive others, and gain power. If suchagents have intelligence that exceeds that of humans, this could lead tohumanity losing control of its future. More abstractly, we argue that naturalselection operates on systems that compete and vary, and that selfish speciestypically have an advantage over species that are altruistic to other species.This Darwinian logic could also apply to artificial agents, as agents mayeventually be better able to persist into the future if they behave selfishlyand pursue their own interests with little regard for humans, which could posecatastrophic risks. To counteract these risks and Darwinian forces, we considerinterventions such as carefully designing AI agents' intrinsic motivations,introducing constraints on their actions, and institutions that encouragecooperation. These steps, or others that resolve the problems we pose, will benecessary in order to ensure the development of artificial intelligence is apositive one.", "output": "Natural Selection Favors AIs over Humans."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is essentially a complex, intricate system of human expressionsgoverned by grammatical rules. It poses a significant challenge to developcapable AI algorithms for comprehending and grasping a language. As a majorapproach, language modeling has been widely studied for language understandingand generation in the past two decades, evolving from statistical languagemodels to neural language models. Recently, pre-trained language models (PLMs)have been proposed by pre-training Transformer models over large-scale corpora,showing strong capabilities in solving various NLP tasks. Since researchershave found that model scaling can lead to performance improvement, they furtherstudy the scaling effect by increasing the model size to an even larger size.Interestingly, when the parameter scale exceeds a certain level, these enlargedlanguage models not only achieve a significant performance improvement but alsoshow some special abilities that are not present in small-scale languagemodels. To discriminate the difference in parameter scale, the researchcommunity has coined the term large language models (LLM) for the PLMs ofsignificant size. Recently, the research on LLMs has been largely advanced byboth academia and industry, and a remarkable progress is the launch of ChatGPT,which has attracted widespread attention from society. The technical evolutionof LLMs has been making an important impact on the entire AI community, whichwould revolutionize the way how we develop and use AI algorithms. In thissurvey, we review the recent advances of LLMs by introducing the background,key findings, and mainstream techniques. In particular, we focus on four majoraspects of LLMs, namely pre-training, adaptation tuning, utilization, andcapacity evaluation. Besides, we also summarize the available resources fordeveloping LLMs and discuss the remaining issues for future directions.", "output": "A Survey of Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce Astroformer, a method to learnfrom less amount of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving 94.86% top-$1$ accuracy,beating the current state-of-the-art for this task by 4.62%. Furthermore, thisapproach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. Wealso find that models and training methods used for larger datasets would oftennot work very well in the low-data regime.", "output": "Astroformer: More Data Might not be all you need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized FL has been widely used to cater to heterogeneity challengeswith non-IID data. A primary obstacle is considering the personalizationprocess from the client's perspective to preserve their autonomy. Allowing theclients to participate in personalized FL decisions becomes significant due toprivacy and security concerns, where the clients may not be at liberty to shareprivate information necessary for producing good quality personalized models.Moreover, clients with high-quality data and resources are reluctant toparticipate in the FL process without reasonable incentive. In this paper, wepropose PI-FL, a one-shot personalization solution complemented by atoken-based incentive mechanism that rewards personalized training. PI-FLoutperforms other state-of-the-art approaches and can generate good-qualitypersonalized models while respecting clients' privacy.", "output": "PI-FL: Personalized and Incentivized Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The need for fully human-understandable models is increasingly beingrecognised as a central theme in AI research. The acceptance of AI models toassist in decision making in sensitive domains will grow when these models areinterpretable, and this trend towards interpretable models will be amplified byupcoming regulations. One of the killer applications of interpretable AI ismedical practice, which can benefit from accurate decision supportmethodologies that inherently generate trust. In this work, we propose FPT,(MedFP), a novel method that combines probabilistic trees and fuzzy logic toassist clinical practice. This approach is fully interpretable as it allowsclinicians to generate, control and verify the entire diagnosis procedure; oneof the methodology's strength is the capability to decrease the frequency ofmisdiagnoses by providing an estimate of uncertainties and counterfactuals. Ourapproach is applied as a proof-of-concept to two real medical scenarios:classifying malignant thyroid nodules and predicting the risk of progression inchronic kidney disease patients. Our results show that probabilistic fuzzydecision trees can provide interpretable support to clinicians, furthermore,introducing fuzzy variables into the probabilistic model brings significantnuances that are lost when using the crisp thresholds set by traditionalprobabilistic decision trees. We show that FPT and its predictions can assistclinical practice in an intuitive manner, with the use of a user-friendlyinterface specifically designed for this purpose. Moreover, we discuss theinterpretability of the FPT model.", "output": "Assisting clinical practice with fuzzy probabilistic decision trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of decomposing a complex text-to-sql task into smallersub-tasks and how such a decomposition can significantly improve theperformance of Large Language Models (LLMs) in the reasoning process. There iscurrently a significant gap between the performance of fine-tuned models andprompting approaches using LLMs on challenging text-to-sql datasets such asSpider. We show that SQL queries, despite their declarative structure, can bebroken down into sub-problems and the solutions of those sub-problems can befed into LLMs to significantly improve their performance. Our experiments withthree LLMs show that this approach consistently improves their performance byroughly 10%, pushing the accuracy of LLMs towards state-of-the-art, and evenbeating large fine-tuned models on the holdout Spider dataset.", "output": "DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic answer type prediction (SMART) is known to be a useful step towardseffective question answering (QA) systems. The SMART task involves predictingthe top-$k$ knowledge graph (KG) types for a given natural language question.This is challenging due to the large number of types in KGs. In this paper, wepropose use of extreme multi-label classification using Transformer models(XBERT) by clustering KG types using structural and semantic features based onquestion text. We specifically improve the clustering stage of the XBERTpipeline using textual and structural features derived from KGs. We show thatthese features can improve end-to-end performance for the SMART task, and yieldstate-of-the-art results.", "output": "Extreme Classification for Answer Type Prediction in Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the growing demand for tuning foundation vision transformers (FViTs)on downstream tasks, fully unleashing FViTs' potential under data-limitedscenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungrynature. Common data augmentation techniques fall short in this context due tothe limited features contained in the few-shot tuning data. To tackle thischallenge, we first identify an opportunity for FViTs in few-shot tuning:pretrained FViTs themselves have already learned highly representative featuresfrom large-scale pretraining data, which are fully preserved during widely usedparameter-efficient tuning. We thus hypothesize that leveraging those learnedfeatures to augment the tuning data can boost the effectiveness of few-shotFViT tuning. To this end, we propose a framework called Hint-based DataAugmentation (Hint-Aug), which aims to boost FViT in few-shot tuning byaugmenting the over-fitted parts of tuning samples with the learned features ofpretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) anAttentive Over-fitting Detector (AOD) to detect over-confident patches offoundation ViTs for potentially alleviating their over-fitting on the few-shottuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuseeasy-to-confuse features from the pretrained FViTs with the over-confidentpatches detected by the above AOD in order to enhance the feature diversityduring tuning. Extensive experiments and ablation studies on five datasets andthree parameter-efficient tuning techniques consistently validate Hint-Aug'seffectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA)data augmentation method under various low-shot settings. For example, on thePet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less trainingdata over SOTA data augmentation methods.", "output": "Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the realm of urban transportation, metro systems serve as crucial andsustainable means of public transit. However, their substantial energyconsumption poses a challenge to the goal of sustainability. Disturbances suchas delays and passenger flow changes can further exacerbate this issue bynegatively affecting energy efficiency in metro systems. To tackle thisproblem, we propose a policy-based reinforcement learning approach thatreschedules the metro timetable and optimizes energy efficiency in metrosystems under disturbances by adjusting the dwell time and cruise speed oftrains. Our experiments conducted in a simulation environment demonstrate thesuperiority of our method over baseline methods, achieving a traction energyconsumption reduction of up to 10.9% and an increase in regenerative brakingenergy utilization of up to 47.9%. This study provides an effective solution tothe energy-saving problem of urban rail transit.", "output": "Optimizing Energy Efficiency in Metro Systems Under Uncertainty Disturbances Using Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the digital transformation era, integrating digital technology into everyaspect of banking operations improves process automation, cost efficiency, andservice level improvement. Although logistics for ATM cash is a crucial taskthat impacts operating costs and consumer satisfaction, there has been littleeffort to enhance it. Specifically, in Vietnam, with a market of more than20,000 ATMs nationally, research and technological solutions that can resolvethis issue remain scarce. In this paper, we generalized the vehicle routingproblem for ATM cash replenishment, suggested a mathematical model and thenoffered a tool to evaluate various situations. When being evaluated on thesimulated dataset, our proposed model and method produced encouraging resultswith the benefits of cutting ATM cash operating costs.", "output": "Multiobjective Logistics Optimization for Automated ATM Cash Replenishment Process."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The disruptive potential of AI systems roots in the emergence of big data.Yet, a significant portion is scattered and locked in data silos, leaving itspotential untapped. Federated Machine Learning is a novel AI paradigm enablingthe creation of AI models from decentralized, potentially siloed data. Hence,Federated Machine Learning could technically open data silos and thereforeunlock economic potential. However, this requires collaboration betweenmultiple parties owning data silos. Setting up collaborative business models iscomplex and often a reason for failure. Current literature lacks guidelines onwhich aspects must be considered to successfully realize collaborative AIprojects. This research investigates the challenges of prevailing collaborativebusiness models and distinct aspects of Federated Machine Learning. Through asystematic literature review, focus group, and expert interviews, we provide asystemized collection of socio-technical challenges and an extended BusinessModel Canvas for the initial viability assessment of collaborative AI projects.", "output": "Unlocking the Potential of Collaborative AI -- On the Socio-technical Challenges of Federated Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a comprehensive and practical guide for practitioners andend-users working with Large Language Models (LLMs) in their downstream naturallanguage processing (NLP) tasks. We provide discussions and insights into theusage of LLMs from the perspectives of models, data, and downstream tasks.Firstly, we offer an introduction and brief summary of current GPT- andBERT-style LLMs. Then, we discuss the influence of pre-training data, trainingdata, and test data. Most importantly, we provide a detailed discussion aboutthe use and non-use cases of large language models for various natural languageprocessing tasks, such as knowledge-intensive tasks, traditional naturallanguage understanding tasks, natural language generation tasks, emergentabilities, and considerations for specific tasks.We present various use casesand non-use cases to illustrate the practical applications and limitations ofLLMs in real-world scenarios. We also try to understand the importance of dataand the specific challenges associated with each NLP task. Furthermore, weexplore the impact of spurious biases on LLMs and delve into other essentialconsiderations, such as efficiency, cost, and latency, to ensure acomprehensive understanding of deploying LLMs in practice. This comprehensiveguide aims to provide researchers and practitioners with valuable insights andbest practices for working with LLMs, thereby enabling the successfulimplementation of these models in a wide range of NLP tasks. A curated list ofpractical guide resources of LLMs, regularly updated, can be found aturl{", "output": "Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Brain tumor is one of the leading causes of cancer death. The high-gradebrain tumors are easier to recurrent even after standard treatment. Therefore,developing a method to predict brain tumor recurrence location plays animportant role in the treatment planning and it can potentially prolongpatient's survival time. There is still little work to deal with this issue. Inthis paper, we present a deep learning-based brain tumor recurrence locationprediction network. Since the dataset is usually small, we propose to usetransfer learning to improve the prediction. We first train a multi-modal braintumor segmentation network on the public dataset BraTS 2021. Then, thepre-trained encoder is transferred to our private dataset for extracting therich semantic features. Following that, a multi-scale multi-channel featurefusion model and a nonlinear correlation learning module are developed to learnthe effective features. The correlation between multi-channel features ismodeled by a nonlinear equation. To measure the similarity between thedistributions of original features of one modality and the estimated correlatedfeatures of another modality, we propose to use Kullback-Leibler divergence.Based on this divergence, a correlation loss function is designed to maximizethe similarity between the two feature distributions. Finally, two decoders areconstructed to jointly segment the present brain tumor and predict its futuretumor recurrence location. To the best of our knowledge, this is the first workthat can segment the present tumor and at the same time predict future tumorrecurrence location, making the treatment planning more efficient and precise.The experimental results demonstrated the effectiveness of our proposed methodto predict the brain tumor recurrence location from the limited dataset.", "output": "Prediction of brain tumor recurrence location based on multi-modal fusion and nonlinear correlation learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To improve the recognition ability of computer-aided breast massclassification among mammographic images, in this work we explore thestate-of-the-art classification networks to develop an ensemble mechanism.First, the regions of interest (ROIs) are obtained from the original dataset,and then three models, i.e., XceptionNet, DenseNet, and EfficientNet, aretrained individually. After training, we ensemble the mechanism by summing theprobabilities outputted from each network which enhances the performance up to5%. The scheme has been validated on a public dataset and we achieved accuracy,precision, and recall 88%, 85%, and 76% respectively.", "output": "Ensemble CNNs for Breast Tumor Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The field of histology relies heavily on antiquated tissue processing andstaining techniques that limit the efficiency of pathologic diagnoses of cancerand other diseases. Current staining and advanced labeling methods are oftendestructive and mutually incompatible, requiring new tissue sections for eachstain. This prolongs the diagnostic process and depletes valuable biopsysamples. In this study, we present an alternative label-free histology platformusing the first transmission-mode Photon Absorption Remote Sensing microscope.Optimized for automated whole slide scanning of unstained tissue samples, thesystem provides slide images at magnifications up to 40x that are fullycompatible with existing digital pathology tools. The scans capture highquality and high-resolution images with subcellular diagnostic detail. Afterimaging, samples remain suitable for histochemical, immunohistochemical, andother staining techniques. Scattering and absorption (radiative andnon-radiative) contrasts are shown in whole slide images of malignant humanbreast and skin tissues samples. Clinically relevant features are highlighted,and close correspondence and analogous contrast is demonstrated with one-to-onegold standard H&E stained images. Our previously reported pix2pix virtualstaining model is applied to an entire whole slide image, showcasing thepotential of this approach in whole slide label-free H&E emulation. This workis a critical advance for integrating label-free optical methods into standardhistopathology workflows, both enhancing diagnostic efficiency, and broadeningthe number of stains that can be applied while preserving valuable tissuesamples.", "output": "Automated Whole Slide Imaging for Label-Free Histology using Photon Absorption Remote Sensing Microscopy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantifying the phagocytosis of dynamic, unstained cells is essential forevaluating neurodegenerative diseases. However, measuring rapid cellinteractions and distinguishing cells from backgrounds make this taskchallenging when processing time-lapse phase-contrast video microscopy. In thisstudy, we introduce a fully automated, scalable, and versatile realtimeframework for quantifying and analyzing phagocytic activity. Our proposedpipeline can process large data-sets and includes a data quality verificationmodule to counteract potential perturbations such as microscope movements andframe blurring. We also propose an explainable cell segmentation module toimprove the interpretability of deep learning methods compared to black-boxalgorithms. This includes two interpretable deep learning capabilities: visualexplanation and model simplification. We demonstrate that interpretability indeep learning is not the opposite of high performance, but rather providesessential deep learning algorithm optimization insights and solutions.Incorporating interpretable modules results in an efficient architecture designand optimized execution time. We apply this pipeline to quantify and analyzemicroglial cell phagocytosis in frontotemporal dementia (FTD) and obtainstatistically reliable results showing that FTD mutant cells are larger andmore aggressive than control cells. To stimulate translational approaches andfuture research, we release an open-source pipeline and a unique microglialcells phagocytosis dataset for immune system characterization inneurodegenerative diseases research. This pipeline and dataset willconsistently crystallize future advances in this field, promoting thedevelopment of efficient and effective interpretable algorithms dedicated tothis critical domain. ", "output": "Phagocytosis Unveiled: A Scalable and Interpretable Deep learning Framework for Neurodegenerative Disease Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The classification of the origin of blood clots is a crucial step indiagnosing and treating ischemic stroke. Various imaging techniques such ascomputed tomography (CT), magnetic resonance imaging (MRI), and ultrasound havebeen employed to detect and locate blood clots within the body. However,identifying the origin of a blood clot remains challenging due to thecomplexity of the blood flow dynamics and the limitations of the imagingtechniques. The study suggests a novel methodology for classifying the sourceof a blood clot through the integration of data from whole-slide digitalpathology images, which are utilized to fine-tune several cutting-edge computervision models. Upon comparison, the SwinTransformerV2 model outperforms all theother models and achieves an accuracy score of 94.24%, precision score of94.41%, recall score of 94.09%, and, f1-score of 94.06%. Our approach showspromising results in detecting the origin of blood clots in different vascularregions and can potentially improve the diagnosis and management of ischemicstroke.", "output": "Automated Classification of Stroke Blood Clot Origin using Whole-Slide Digital Pathology Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose SAMed, a general solution for medical image segmentation.Different from the previous methods, SAMed is built upon the large-scale imagesegmentation model, Segment Anything Model (SAM), to explore the new researchparadigm of customizing large-scale models for medical image segmentation.SAMed applies the low-rank-based (LoRA) finetuning strategy to the SAM imageencoder and finetunes it together with the prompt encoder and the mask decoderon labeled medical image segmentation datasets. We also observe the warmupfinetuning strategy and the AdamW optimizer lead SAMed to successfulconvergence and lower loss. Different from SAM, SAMed could perform semanticsegmentation on medical images. Our trained SAMed model achieves 81.88 DSC and20.64 HD on the Synapse multi-organ segmentation dataset, which is on par withthe state-of-the-art methods. We conduct extensive experiments to validate theeffectiveness of our design. Since SAMed only updates a small fraction of theSAM parameters, its deployment cost and storage cost are quite marginal inpractical usage. The code of SAMed is available at", "output": "Customized Segment Anything Model for Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the most challenging problems in fingerprint recognition continues tobe establishing the identity of a suspect associated with partial and smudgyfingerprints left at a crime scene (i.e., latent prints or fingermarks).Despite the success of fixed-length embeddings for rolled and slap fingerprintrecognition, the features learned for latent fingerprint matching have mostlybeen limited to local minutiae-based embeddings and have not directly leveragedglobal representations for matching. In this paper, we combine globalembeddings with local embeddings for state-of-the-art latent to rolled matchingaccuracy with high throughput. The combination of both local and globalrepresentations leads to improved recognition accuracy across NIST SD 27, NISTSD 302, MSP, MOLF DB1/DB4, and MOLF DB2/DB4 latent fingerprint datasets forboth closed-set (84.11%, 54.36%, 84.35%, 70.43%, 62.86% rank-1 retrieval rate,respectively) and open-set (0.50, 0.74, 0.44, 0.60, 0.68 FNIR at FPIR=0.02,respectively) identification scenarios on a gallery of 100K rolledfingerprints. Not only do we fuse the complimentary representations, we alsouse the local features to guide the global representations to focus ondiscriminatory regions in two fingerprint images to be compared. This leads toa multi-stage matching paradigm in which subsets of the retrieved candidatelists for each probe image are passed to subsequent stages for furtherprocessing, resulting in a considerable reduction in latency (requiring just0.068 ms per latent to rolled comparison on a AMD EPYC 7543 32-Core Processor,roughly 15K comparisons per second). Finally, we show the generalizability ofthe fused representations for improving authentication accuracy across severalrolled, plain, and contactless fingerprint datasets.", "output": "Latent Fingerprint Recognition: Fusion of Local and Global Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D pose transfer is a challenging generation task that aims to transfer thepose of a source geometry onto a target geometry with the target identitypreserved. Many prior methods require keypoint annotations to findcorrespondence between the source and target. Current pose transfer methodsallow end-to-end correspondence learning but require the desired final outputas ground truth for supervision. Unsupervised methods have been proposed forgraph convolutional models but they require ground truth correspondence betweenthe source and target inputs. We present a novel self-supervised framework for3D pose transfer which can be trained in unsupervised, semi-supervised, orfully supervised settings without any correspondence labels. We introduce twocontrastive learning constraints in the latent space: a mesh-level loss fordisentangling global patterns including pose and identity, and a point-levelloss for discriminating local semantics. We demonstrate quantitatively andqualitatively that our method achieves state-of-the-art results in supervised3D pose transfer, with comparable results in unsupervised and semi-supervisedsettings. Our method is also generalisable to unseen human and animal data withcomplex topologies.", "output": "MAPConNet: Self-supervised 3D Pose Transfer with Mesh and Point Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robots operating in the real world require both rich manipulation skills aswell as the ability to semantically reason about when to apply those skills.Towards this goal, recent works have integrated semantic representations fromlarge-scale pretrained vision-language (VL) models into manipulation models,imparting them with more general reasoning capabilities. However, we show thatthe conventional pretraining-finetuning pipeline for integrating suchrepresentations entangles the learning of domain-specific action informationand domain-general visual information, leading to less data-efficient trainingand poor generalization to unseen objects and tasks. To this end, we proposeProgramPort, a modular approach to better leverage pretrained VL models byexploiting the syntactic and semantic structures of language instructions. Ourframework uses a semantic parser to recover an executable program, composed offunctional modules grounded on vision and action across different modalities.Each functional module is realized as a combination of deterministiccomputation and learnable neural networks. Program execution producesparameters to general manipulation primitives for a robotic end-effector. Theentire modular network can be trained with end-to-end imitation learningobjectives. Experiments show that our model successfully disentangles actionand perception, translating to improved zero-shot and compositionalgeneralization in a variety of manipulation behaviors. Project webpage at:url{", "output": "Programmatically Grounded, Compositionally Generalizable Robotic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper assesses the reliability of the RemOve-And-Retrain (ROAR)protocol, which is used to measure the performance of feature importanceestimates. Our findings from the theoretical background and empiricalexperiments indicate that attributions that possess less information about thedecision function can perform better in ROAR benchmarks, conflicting with theoriginal purpose of ROAR. This phenomenon is also observed in the recentlyproposed variant RemOve-And-Debias (ROAD), and we propose a consistent trend ofblurriness bias in ROAR attribution metrics. Our results caution againstuncritical reliance on ROAR metrics.", "output": "On Pitfalls of $\\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study investigates the potential of eye-tracking technology and theSegment Anything Model (SAM) to design a collaborative human-computerinteraction system that automates medical image segmentation. We present thetextbf{GazeSAM} system to enable radiologists to collect segmentation masks bysimply looking at the region of interest during image diagnosis. The proposedsystem tracks radiologists' eye movement and utilizes the eye-gaze data as theinput prompt for SAM, which automatically generates the segmentation mask inreal time. This study is the first work to leverage the power of eye-trackingtechnology and SAM to enhance the efficiency of daily clinical practice.Moreover, eye-gaze data coupled with image and corresponding segmentationlabels can be easily recorded for further advanced eye-tracking research. Thecode is available in url{", "output": "GazeSAM: What You See is What You Segment."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) algorithms can produce useful imagerepresentations by learning to associate different parts of natural images withone another. However, when taken to the extreme, SSL models can unintendedlymemorize specific parts in individual training samples rather than learningsemantically meaningful associations. In this work, we perform a systematicstudy of the unintended memorization of image-specific information in SSLmodels -- which we refer to as d'ej`a vu memorization. Concretely, we showthat given the trained model and a crop of a training image containing only thebackground (e.g., water, sky, grass), it is possible to infer the foregroundobject with high accuracy or even visually reconstruct it. Furthermore, we showthat d'ej`a vu memorization is common to different SSL algorithms, isexacerbated by certain design choices, and cannot be detected by conventionaltechniques for evaluating representation quality. Our study of d'ej`a vumemorization reveals previously unknown privacy risks in SSL models, as well assuggests potential practical mitigation strategies. Code is available at", "output": "Do SSL Models Have D\\'ej\\`a Vu? A Case of Unintended Memorization in Self-supervised Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative multimodal models based on diffusion models have seen tremendousgrowth and advances in recent years. Models such as DALL-E and Stable Diffusionhave become increasingly popular and successful at creating images from texts,often combining abstract ideas. However, like other deep learning models, theyalso reflect social biases they inherit from their training data, which isoften crawled from the internet. Manually auditing models for biases can bevery time and resource consuming and is further complicated by the unboundedand unconstrained nature of inputs these models can take. Research into biasmeasurement and quantification has generally focused on small single-stagemodels working on a single modality. Thus the emergence of multistagemultimodal models requires a different approach. In this paper, we proposeMultimodal Composite Association Score (MCAS) as a new method of measuringgender bias in multimodal generative models. Evaluating both DALL-E 2 andStable Diffusion using this approach uncovered the presence of genderedassociations of concepts embedded within the models. We propose MCAS as anaccessible and scalable method of quantifying potential bias for models withdifferent modalities and a range of potential biases.", "output": "Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the recent years, hyperspectral imaging (HSI) has gained considerablypopularity among computer vision researchers for its potential in solvingremote sensing problems, especially in agriculture field. However, HSIclassification is a complex task due to the high redundancy of spectral bands,limited training samples, and non-linear relationship between spatial positionand spectral bands. Fortunately, deep learning techniques have shown promisingresults in HSI analysis. This literature review explores recent applications ofdeep learning approaches such as Autoencoders, Convolutional Neural Networks(1D, 2D, and 3D), Recurrent Neural Networks, Deep Belief Networks, andGenerative Adversarial Networks in agriculture. The performance of theseapproaches has been evaluated and discussed on well-known land cover datasetsincluding Indian Pines, Salinas Valley, and Pavia University.", "output": "Deep Learning Techniques for Hyperspectral Image Analysis in Agriculture: A Review."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Depth completion from RGB images and sparse Time-of-Flight (ToF) measurementsis an important problem in computer vision and robotics. While traditionalmethods for depth completion have relied on stereo vision or structured lighttechniques, recent advances in deep learning have enabled more accurate andefficient completion of depth maps from RGB images and sparse ToF measurements.To evaluate the performance of different depth completion methods, we organizedan RGB+sparse ToF depth completion competition. The competition aimed toencourage research in this area by providing a standardized dataset andevaluation metrics to compare the accuracy of different approaches. In thisreport, we present the results of the competition and analyze the strengths andweaknesses of the top-performing methods. We also discuss the implications ofour findings for future research in RGB+sparse ToF depth completion. We hopethat this competition and report will help to advance the state-of-the-art inthis important area of research. More details of this challenge and the link tothe dataset can be found at ", "output": "MIPI 2023 Challenge on RGB+ToF Depth Completion: Methods and Results."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNN) have become a common sensing modality inautonomous systems as they allow for semantically perceiving the ambientenvironment given input images. Nevertheless, DNN models have proven to bevulnerable to adversarial digital and physical attacks. To mitigate this issue,several detection frameworks have been proposed to detect whether a singleinput image has been manipulated by adversarial digital noise or not. In ourprior work, we proposed a real-time detector, called VisionGuard (VG), foradversarial physical attacks against single input images to DNN models.Building upon that work, we propose VisionGuard* (VG), which couples VG withmajority-vote methods, to detect adversarial physical attacks in time-seriesimage data, e.g., videos. This is motivated by autonomous systems applicationswhere images are collected over time using onboard sensors for decision-makingpurposes. We emphasize that majority-vote mechanisms are quite common inautonomous system applications (among many other applications), as e.g., inautonomous driving stacks for object detection. In this paper, we investigate,both theoretically and experimentally, how this widely used mechanism can beleveraged to enhance the performance of adversarial detectors. We haveevaluated VG* on videos of both clean and physically attacked traffic signsgenerated by a state-of-the-art robust physical attack. We provide extensivecomparative experiments against detectors that have been designed originallyfor out-of-distribution data and digitally attacked images.", "output": "Detection of Adversarial Physical Attacks in Time-Series Image Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With recent progress in large-scale vision and language representationlearning, Vision Language Pretraining (VLP) models have achieved promisingimprovements on various multi-modal downstream tasks. Albeit powerful, thesepre-training models still do not take advantage of world knowledge, which isimplicit in multi-modal data but comprises abundant and complementaryinformation. In this work, we propose a REtrieval-based knowledge AugmentedVision Language Pre-training model (REAVL), which retrieves world knowledgefrom knowledge graphs (KGs) and incorporates them in vision-languagepre-training. REAVL has two core components: a knowledge retriever thatretrieves knowledge given multi-modal data, and a knowledge-augmented modelthat fuses multi-modal data and knowledge. By novelly unifying fourknowledge-aware self-supervised tasks, REAVL promotes the mutual integration ofmulti-modal data and knowledge by fusing explicit knowledge withvision-language pairs for masked multi-modal data modeling and KG relationalreasoning. Empirical experiments show that REAVL achieves new state-of-the-artperformance uniformly on knowledge-based vision-language understanding andmultimodal entity linking tasks, and competitive results on generalvision-language tasks while only using 0.2% pre-training data of the bestmodels.", "output": "Retrieval-based Knowledge Augmented Vision Language Pre-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease thatresults in progressive articular destruction and severe disability. Joint spacenarrowing (JSN) progression has been regarded as an important indicator for RAprogression and has received sustained attention. In the diagnosis andmonitoring of RA, radiology plays a crucial role to monitor joint space. A newframework for monitoring joint space by quantifying JSN progression throughimage registration in radiographic images has been developed. This frameworkoffers the advantage of high accuracy, however, challenges do exist in reducingmismatches and improving reliability. In this work, a deep intra-subject rigidregistration network is proposed to automatically quantify JSN progression inthe early stage of RA. In our experiments, the mean-square error of Euclideandistance between moving and fixed image is 0.0031, standard deviation is 0.0661mm, and the mismatching rate is 0.48%. The proposed method has sub-pixel levelaccuracy, exceeding manual measurements by far, and is equipped with immune tonoise, rotation, and scaling of joints. Moreover, this work provides lossvisualization, which can aid radiologists and rheumatologists in assessingquantification reliability, with important implications for possible futureclinical applications. As a result, we are optimistic that this proposed workwill make a significant contribution to the automatic quantification of JSNprogression in RA.", "output": "A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deepfake detection remains a challenging task due to the difficulty ofgeneralizing to new types of forgeries. This problem primarily stems from theoverfitting of existing detection methods to forgery-irrelevant features andmethod-specific patterns. The latter is often ignored by previous works. Thispaper presents a novel approach to address the two types of overfitting issuesby uncovering common forgery features. Specifically, we first propose adisentanglement framework that decomposes image information into three distinctcomponents: forgery-irrelevant, method-specific forgery, and common forgeryfeatures. To ensure the decoupling of method-specific and common forgeryfeatures, a multi-task learning strategy is employed, including a multi-classclassification that predicts the category of the forgery method and a binaryclassification that distinguishes the real from the fake. Additionally, aconditional decoder is designed to utilize forgery features as a conditionalong with forgery-irrelevant features to generate reconstructed images.Furthermore, a contrastive regularization technique is proposed to encouragethe disentanglement of the common and specific forgery features. Ultimately, weonly utilize the common forgery features for the purpose of generalizabledeepfake detection. Extensive evaluations demonstrate that our framework canperform superior generalization than current state-of-the-art methods.", "output": "UCF: Uncovering Common Features for Generalizable Deepfake Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robust image watermarking that can resist camera shooting has become anactive research topic in recent years due to the increasing demand forpreventing sensitive information displayed on computer screens from beingcaptured. However, many mainstream schemes require human assistance during thewatermark detection process and cannot adapt to scenarios that requireprocessing a large number of images. Although deep learning-based schemesenable end-to-end watermark embedding and detection, their limitedgeneralization ability makes them vulnerable to failure in complex scenarios.In this paper, we propose a carefully crafted watermarking system that canresist camera shooting. The proposed scheme deals with two important problems:automatic watermark localization (AWL) and automatic watermark detection (AWD).AWL automatically identifies the region of interest (RoI), which containswatermark information, in the camera-shooting image by analyzing the localstatistical characteristics. Meanwhile, AWD extracts the hidden watermark fromthe identified RoI after applying perspective correction. Compared withprevious works, the proposed scheme is fully automatic, making it ideal forapplication scenarios. Furthermore, the proposed scheme is not limited to anyspecific watermark embedding strategy, allowing for improvements in thewatermark embedding and extraction procedure. Extensive experimental resultsand analysis show that the embedded watermark can be automatically and reliablyextracted from the camera-shooting image in different scenarios, demonstratingthe superiority and applicability of the proposed approach.", "output": "Automatic Localization and Detection Applicable to Robust Image Watermarking Resisting against Camera Shooting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual-based defect detection is a crucial but challenging task in industrialquality control. Most mainstream methods rely on large amounts of existing orrelated domain data as auxiliary information. However, in actual industrialproduction, there are often multi-batch, low-volume manufacturing scenarioswith rapidly changing task demands, making it difficult to obtain sufficientand diverse defect data. This paper proposes a parallel solution that uses ahuman-machine knowledge hybrid augmentation method to help the model extractunknown important features. Specifically, by incorporating experts' knowledgeof abnormality to create data with rich features, positions, sizes, andbackgrounds, we can quickly accumulate an amount of data from scratch andprovide it to the model as prior knowledge for few-data learning. The proposedmethod was evaluated on the magnetic tile dataset and achieved F1-scores of60.73%, 70.82%, 77.09%, and 82.81% when using 2, 5, 10, and 15 training images,respectively. Compared to the traditional augmentation method's F1-score of64.59%, the proposed method achieved an 18.22% increase in the best result,demonstrating its feasibility and effectiveness in few-data industrial defectdetection.", "output": "Human-machine knowledge hybrid augmentation method for surface defect detection based few-data learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Skin cancer is a prevalent and potentially fatal disease that requiresaccurate and efficient diagnosis and treatment. Although manual tracing is thecurrent standard in clinics, automated tools are desired to reduce human laborand improve accuracy. However, developing such tools is challenging due to thehighly variable appearance of skin cancers and complex objects in thebackground. In this paper, we present SkinSAM, a fine-tuned model based on theSegment Anything Model that showed outstanding segmentation performance. Themodels are validated on HAM10000 dataset which includes 10015 dermatoscopicimages. While larger models (ViT_L, ViT_H) performed better than the smallerone (ViT_b), the finetuned model (ViT_b_finetuned) exhibited the greatestimprovement, with a Mean pixel accuracy of 0.945, Mean dice score of 0.8879,and Mean IoU score of 0.7843. Among the lesion types, vascular lesions showedthe best segmentation results. Our research demonstrates the great potential ofadapting SAM to medical image segmentation tasks.", "output": "SkinSAM: Empowering Skin Cancer Segmentation with Segment Anything Model."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain generalization (DG) aims to tackle the distribution shift betweentraining domains and unknown target domains. Generating new domains is one ofthe most effective approaches, yet its performance gain depends on thedistribution discrepancy between the generated and target domains.Distributionally robust optimization is promising to tackle distributiondiscrepancy by exploring domains in an uncertainty set. However, theuncertainty set may be overwhelmingly large, leading to low-confidenceprediction in DG. It is because a large uncertainty set could introduce domainscontaining semantically different factors from training domains. To addressthis issue, we propose to perform a $textbf{mo}$derately$textbf{d}$istributional $textbf{e}$xploration (MODE) for domaingeneralization. Specifically, MODE performs distribution exploration in anuncertainty $textit{subset}$ that shares the same semantic factors with thetraining domains. We show that MODE can endow models with provablegeneralization performance on unknown target domains. The experimental resultsshow that MODE achieves competitive performance compared to state-of-the-artbaselines.", "output": "Moderately Distributional Exploration for Domain Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Segmentation of drivable roads and negative obstacles is critical to the safedriving of autonomous vehicles. Currently, many multi-modal fusion methods havebeen proposed to improve segmentation accuracy, such as fusing RGB and depthimages. However, we find that when fusing two modals of data with untrustworthyfeatures, the performance of multi-modal networks could be degraded, even lowerthan those using a single modality. In this paper, the untrustworthy featuresrefer to those extracted from regions (e.g., far objects that are beyond thedepth measurement range) with invalid depth data (i.e., 0 pixel value) in depthimages. The untrustworthy features can confuse the segmentation results, andhence lead to inferior results. To provide a solution to this issue, we proposethe Adaptive-Mask Fusion Network (AMFNet) by introducing adaptive-weight masksin the fusion module to fuse features from RGB and depth images withinconsistency. In addition, we release a large-scale RGB-depth dataset withmanually-labeled ground truth based on the NPO dataset for drivable roads andnegative obstacles segmentation. Extensive experimental results demonstratethat our network achieves state-of-the-art performance compared with othernetworks. Our code and dataset are available at:", "output": "Adaptive-Mask Fusion Network for Segmentation of Drivable Road and Negative Obstacle With Untrustworthy Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "3D point cloud panoptic segmentation is the combined task to (i) assign eachpoint to a semantic class and (ii) separate the points in each class intoobject instances. Recently there has been an increased interest in suchcomprehensive 3D scene understanding, building on the rapid advances ofsemantic segmentation due to the advent of deep 3D neural networks. Yet, todate there is very little work about panoptic segmentation of outdoormobile-mapping data, and no systematic comparisons. The present paper tries toclose that gap. It reviews the building blocks needed to assemble a panopticsegmentation pipeline and the related literature. Moreover, a modular pipelineis set up to perform comprehensive, systematic experiments to assess the stateof panoptic segmentation in the context of street mapping. As a byproduct, wealso provide the first public dataset for that task, by extending the NPM3Ddataset to include instance labels.", "output": "A Review of Panoptic Segmentation for Mobile Mapping Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "By integrating certain optimization solvers with deep neural networks, deepunfolding network (DUN) with good interpretability and high performance hasattracted growing attention in compressive sensing (CS). However, existing DUNsoften improve the visual quality at the price of a large number of parametersand have the problem of feature information loss during iteration. In thispaper, we propose an Optimization-inspired Cross-attention Transformer (OCT)module as an iterative process, leading to a lightweight OCT-based UnfoldingFramework (OCTUF) for image CS. Specifically, we design a novel Dual CrossAttention (Dual-CA) sub-module, which consists of an Inertia-Supplied CrossAttention (ISCA) block and a Projection-Guided Cross Attention (PGCA) block.ISCA block introduces multi-channel inertia forces and increases the memoryeffect by a cross attention mechanism between adjacent iterations. And, PGCAblock achieves an enhanced information interaction, which introduces theinertia force into the gradient descent step through a cross attention block.Extensive CS experiments manifest that our OCTUF achieves superior performancecompared to state-of-the-art methods while training lower complexity. Codes areavailable at ", "output": "Optimization-Inspired Cross-Attention Transformer for Compressive Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In documents and graphics, contours are a popular format to describe specificshapes. For example, in the True Type Font (TTF) file format, contours describevector outlines of typeface shapes. Each contour is often defined as a sequenceof points. In this paper, we tackle the contour completion task. In this task,the input is a contour sequence with missing points, and the output is agenerated completed contour. This task is more difficult than image completionbecause, for images, the missing pixels are indicated. Since there is no suchindication in the contour completion task, we must solve the problem of missingpart detection and completion simultaneously. We propose a Transformer-basedmethod to solve this problem and show the results of the typeface contourcompletion.", "output": "Contour Completion by Transformers and Its Application to Vector Font Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformers are popular neural network models that use layers ofself-attention and fully-connected nodes with embedded tokens. VisionTransformers (ViT) adapt transformers for image recognition tasks. In order todo this, the images are split into patches and used as tokens. One issue withViT is the lack of inductive bias toward image structures. Because ViT wasadapted for image data from language modeling, the network does not explicitlyhandle issues such as local translations, pixel information, and informationloss in the structures and features shared by multiple patches. Conversely,Convolutional Neural Networks (CNN) incorporate this information. Thus, in thispaper, we propose the use of convolutional layers within ViT. Specifically, wepropose a model called a Vision Conformer (ViC) which replaces the Multi-LayerPerceptron (MLP) in a ViT layer with a CNN. In addition, to use the CNN, weproposed to reconstruct the image data after the self-attention in a reverseembedding layer. Through the evaluation, we demonstrate that the proposedconvolutions help improve the classification ability of ViT.", "output": "Vision Conformer: Incorporating Convolutions into Vision Transformer Layers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In many computer vision applications, images are acquired with arbitrary orrandom rotations and translations, and in such setups, it is desirable toobtain semantic representations disentangled from the image orientation.Examples of such applications include semiconductor wafer defect inspection,plankton microscope images, and inference on single-particle cryo-electronmicroscopy (cryo-EM) micro-graphs. In this work, we propose InvariantRepresentation Learning with Implicit Neural Representation (IRL-INR), whichuses an implicit neural representation (INR) with a hypernetwork to obtainsemantic representations disentangled from the orientation of the image. Weshow that IRL-INR can effectively learn disentangled semantic representationson more complex images compared to those considered in prior works and showthat these semantic representations synergize well with SCAN to producestate-of-the-art unsupervised clustering results.", "output": "Rotation and Translation Invariant Representation Learning with Implicit Neural Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although 3D-aware GANs based on neural radiance fields have achievedcompetitive performance, their applicability is still limited to objects orscenes with the ground-truths or prediction models for clearly definedcanonical camera poses. To extend the scope of applicable datasets, we proposea novel 3D-aware GAN optimization technique through contrastive learning withimplicit pose embeddings. To this end, we first revise the discriminator designand remove dependency on ground-truth camera poses. Then, to capture complexand challenging 3D scene structures more effectively, we make the discriminatorestimate a high-dimensional implicit pose embedding from a given image andperform contrastive learning on the pose embedding. The proposed approach canbe employed for the dataset, where the canonical camera pose is ill-definedbecause it does not look up or estimate camera poses. Experimental results showthat our algorithm outperforms existing methods by large margins on thedatasets with multiple object categories and inconsistent canonical cameraposes.", "output": "ContraNeRF: 3D-Aware Generative Model via Contrastive Learning with Unsupervised Implicit Pose Embedding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new generative system called Edit Everything, which can takeimage and text inputs and produce image outputs. Edit Everything allows usersto edit images using simple text instructions. Our system designs prompts toguide the visual module in generating requested images. Experiments demonstratethat Edit Everything facilitates the implementation of the visual aspects ofStable Diffusion with the use of Segment Anything model and CLIP. Our system ispublicly available at ", "output": "Edit Everything: A Text-Guided Generative System for Images Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks are a promising tool for Audio Event Classification. Incontrast to other data like natural images, there are many sensible andnon-obvious representations for audio data, which could serve as input to thesemodels. Due to their black-box nature, the effect of different inputrepresentations has so far mostly been investigated by measuring classificationperformance. In this work, we leverage eXplainable AI (XAI), to understand theunderlying classification strategies of models trained on different inputrepresentations. Specifically, we compare two model architectures with regardto relevant input features used for Audio Event Detection: one directlyprocesses the signal as the raw waveform, and the other takes in itstime-frequency spectrogram representation. We show how relevance heatmapsobtained via \"Siren\"{Layer-wise Relevance Propagation} uncoverrepresentation-dependent decision strategies. With these insights, we can makea well-informed decision about the best input representation in terms ofrobustness and representativity and confirm that the model's classificationstrategies align with human requirements.", "output": "XAI-based Comparison of Input Representations for Audio Event Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning models have demonstrated remarkable success in multi-organsegmentation but typically require large-scale datasets with all organs ofinterest annotated. However, medical image datasets are often low in samplesize and only partially labeled, i.e., only a subset of organs are annotated.Therefore, it is crucial to investigate how to learn a unified model on theavailable partially labeled datasets to leverage their synergistic potential.In this paper, we empirically and systematically study the partial-labelsegmentation with in-depth analyses on the existing approaches and identifythree distinct types of supervision signals, including two signals derived fromground truth and one from pseudo label. We propose a novel training frameworktermed COSST, which effectively and efficiently integrates comprehensivesupervision signals with self-training. Concretely, we first train an initialunified model using two ground truth-based signals and then iterativelyincorporate the pseudo label signal to the initial model using self-training.To mitigate performance degradation caused by unreliable pseudo labels, weassess the reliability of pseudo labels via outlier detection in latent spaceand exclude the most unreliable pseudo labels from each self-trainingiteration. Extensive experiments are conducted on six CT datasets for threepartial-label segmentation tasks. Experimental results show that our proposedCOSST achieves significant improvement over the baseline method, i.e.,individual networks trained on each partially labeled dataset. Compared to thestate-of-the-art partial-label segmentation methods, COSST demonstratesconsistent superior performance on various segmentation tasks and withdifferent training data size.", "output": "COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a complete workflow designed for extracting informationfrom Quebec handwritten parish registers. The acts in these documents containindividual and family information highly valuable for genetic, demographic andsocial studies of the Quebec population. From an image of parish records, ourworkflow is able to identify the acts and extract personal information. Theworkflow is divided into successive steps: page classification, text linedetection, handwritten text recognition, named entity recognition and actdetection and classification. For all these steps, different machine learningmodels are compared. Once the information is extracted, validation rulesdesigned by experts are then applied to standardize the extracted informationand ensure its consistency with the type of act (birth, marriage, and death).This validation step is able to reject records that are considered invalid ormerged. The full workflow has been used to process over two million pages ofQuebec parish registers from the 19-20th centuries. On a sample comprising 65%of registers, 3.2 million acts were recognized. Verification of the birth anddeath acts from this sample shows that 74% of them are considered complete andvalid. These records will be integrated into the BALSAC database and linkedtogether to recreate family and genealogical relations at large scale.", "output": "Large Scale Genealogical Information Extraction From Handwritten Quebec Parish Records."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite substantial progress in 3D human pose estimation from a single-viewimage, prior works rarely explore global and local correlations, leading toinsufficient learning of human skeleton representations. To address this issue,we propose a novel Interweaved Graph and Attention Network (IGANet) that allowsbidirectional communications between graph convolutional networks (GCNs) andattentions. Specifically, we introduce an IGA module, where attentions areprovided with local information from GCNs and GCNs are injected with globalinformation from attentions. Additionally, we design a simple yet effectiveU-shaped multi-layer perceptron (uMLP), which can capture multi-granularityinformation for body joints. Extensive experiments on two popular benchmarkdatasets (i.e. Human3.6M and MPI-INF-3DHP) are conducted to evaluate ourproposed method.The results show that IGANet achieves state-of-the-artperformance on both datasets. Code is available at", "output": "Interweaved Graph and Attention Network for 3D Human Pose Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Precise thigh muscle volumes are crucial to monitor the motor functionalityof patients with diseases that may result in various degrees of thigh muscleloss. T1-weighted MRI is the default surrogate to obtain thigh muscle masks dueto its contrast between muscle and fat signals. Deep learning approaches haverecently been widely used to obtain these masks through segmentation. However,due to the insufficient amount of precise annotations, thigh muscle masksgenerated by deep learning approaches tend to misclassify intra-muscular fat(IMF) as muscle impacting the analysis of muscle volumetrics. As IMF isinfiltrated inside the muscle, human annotations require expertise and time.Thus, precise muscle masks where IMF is excluded are limited in practice. Toalleviate this, we propose a few-shot segmentation framework to generate thighmuscle masks excluding IMF. In our framework, we design a novel pseudo-labelcorrection and evaluation scheme, together with a new noise robust loss forexploiting high certainty areas. The proposed framework only takes $1%$ of thefine-annotated training dataset, and achieves comparable performance with fullysupervised methods according to the experimental results.", "output": "Precise Few-shot Fat-free Thigh Muscle Segmentation in T1-weighted MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning algorithms for parsing remote sensing data have a wide rangeof societally relevant applications, but labels used to train these algorithmscan be difficult or impossible to acquire. This challenge has spurred researchinto self-supervised learning for remote sensing data aiming to unlock the useof machine learning in geographies or application domains where labelleddatasets are small. Current self-supervised learning approaches for remotesensing data draw significant inspiration from techniques applied to naturalimages. However, remote sensing data has important differences from naturalimages -- for example, the temporal dimension is critical for many tasks anddata is collected from many complementary sensors. We show that designingmodels and self-supervised training techniques specifically for remote sensingdata results in both smaller and more performant models. We introduce thePretrained Remote Sensing Transformer (Presto), a transformer-based modelpre-trained on remote sensing pixel-timeseries data. Presto excels at a widevariety of globally distributed remote sensing tasks and outperforms muchlarger models. Presto can be used for transfer learning or as a featureextractor for simple models, enabling efficient deployment at scale.", "output": "Lightweight, Pre-trained Transformers for Remote Sensing Timeseries."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-object interactions (HOIs) are crucial for human-centric sceneunderstanding applications such as human-centric visual generation, AR/VR, androbotics. Since existing methods mainly explore capturing HOIs, rendering HOIremains less investigated. In this paper, we address this challenge in HOIanimation from a compositional perspective, i.e., animating novel HOIsincluding novel interaction, novel human and/or novel object driven by a novelpose sequence. Specifically, we adopt neural human-object deformation to modeland render HOI dynamics based on implicit neural representations. To enable theinteraction pose transferring among different persons and objects, we thendevise a new compositional conditional neural radiance field (or CC-NeRF),which decomposes the interdependence between human and object using latentcodes to enable compositionally animation control of novel HOIs. Experimentsshow that the proposed method can generalize well to various novel HOIanimation settings. Our project page is ", "output": "Compositional 3D Human-Object Neural Animation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Atrial fibrillation (AF) is the most common cardiac arrhythmia. Accuratesegmentation of the left atrial (LA) and LA scars can provide valuableinformation to predict treatment outcomes in AF. In this paper, we proposed toautomatically segment LA cavity and quantify LA scars with late gadoliniumenhancement Magnetic Resonance Imagings (LGE-MRIs). We adopted nnU-Net as thebaseline model and exploited the importance of LA boundary characteristics withthe TopK loss as the loss function. Specifically, a focus on LA boundary pixelsis achieved during training, which provides a more accurate boundaryprediction. On the other hand, a distance map transformation of the predictedLA boundary is regarded as an additional input for the LA scar prediction,which provides marginal constraint on scar locations. We further designed anovel uncertainty-aware module (UAM) to produce better results for predictionswith high uncertainty. Experiments on the LAScarQS 2022 dataset demonstratedour model's superior performance on the LA cavity and LA scar segmentation.Specifically, we achieved 88.98% and 64.08% Dice coefficient for LA cavityand scar segmentation, respectively. We will make our implementation codepublic available at ", "output": "Automatically Segment the Left Atrium and Scars from LGE-MRIs Using a Boundary-focused nnU-Net."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the huge recent breakthroughs in neural networks (NNs) for artificialintelligence (specifically deep convolutional networks) such NNs do not achievehuman-level performance: they can be hacked by images that would fool no humanand lack `common sense'. It has been argued that a basis of human-levelintelligence is mankind's ability to perform relational reasoning: thecomparison of different objects, measuring similarity, grasping of relationsbetween objects and the converse, figuring out the odd one out in a set ofobjects. Mankind can even do this with objects they have never seen before.Here we show how ClusterFlow, a semi-supervised hierarchical clusteringframework can operate on trained NNs utilising the rich multi-dimensional classand feature data found at the pre-SoftMax layer to build a hyperspacial map ofclasses/features and this adds more human-like functionality to modern deepconvolutional neural networks. We demonstrate this with 3 tasks. 1. thestatistical learning based `mistakes' made by infants when attending to imagesof cats and dogs. 2. improving both the resilience to hacking images and theaccurate measure of certainty in deep-NNs. 3. Relational reasoning over sets ofimages, including those not known to the NN nor seen before. We alsodemonstrate that ClusterFlow can work on non-NN data and deal with missing databy testing it on a Chemistry dataset. This work suggests that modern deep NNscan be made more human-like without re-training of the NNs. As it is known thatsome methods used in deep and convolutional NNs are not biologically plausibleor perhaps even the best approach: the ClusterFlow framework can sit on top ofany NN and will be a useful tool to add as NNs are improved in this regard.", "output": "Cluster Flow: how a hierarchical clustering layer make allows deep-NNs more resilient to hacking, more human-like and easily implements relational reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "RegHEC is a registration-based hand-eye calibration technique with no needfor accurate calibration rig but arbitrary available objects, applicable forboth eye-in-hand and eye-to-hand cases. It tries to find the hand-eye relationwhich brings multi-view point clouds of arbitrary scene into simultaneousregistration under a common reference frame. RegHEC first achieves initialalignment of multi-view point clouds via Bayesian optimization, whereregistration problem is modeled as a Gaussian process over hand-eye relationand the covariance function is modified to be compatible with distance metricin 3-D motion space SE(3), then passes the initial guess of hand-eye relationto an Anderson Accelerated ICP variant for later fine registration and accuratecalibration. RegHEC has little requirement on calibration object, it isapplicable with sphere, cone, cylinder and even simple plane, which can bequite challenging for correct point cloud registration and sensor motionestimation using existing methods. While suitable for most 3-D vision guidedtasks, RegHEC is especially favorable for robotic 3-D reconstruction, ascalibration and multi-view point clouds registration of reconstruction targetare unified into a single process. Our technique is verified with extensiveexperiments using varieties of arbitrary objects and real hand-eye system. Werelease an open-source C++ implementation of RegHEC.", "output": "RegHEC: Hand-Eye Calibration via Simultaneous Multi-view Point Clouds Registration of Arbitrary Object."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interactions between humans are diverse and context-dependent, but previousworks have treated them as categorical, disregarding the heavy tail of possibleinteractions. We propose a new paradigm of learning human-human interactions asfree text from a single still image, allowing for flexibility in modeling theunlimited space of situations and relationships between people. To overcome theabsence of data labelled specifically for this task, we use knowledgedistillation applied to synthetic caption data produced by a large languagemodel without explicit supervision. We show that the pseudo-labels produced bythis procedure can be used to train a captioning model to effectivelyunderstand human-human interactions in images, as measured by a variety ofmetrics that measure textual and semantic faithfulness and factual groundednessof our predictions. We further show that our approach outperforms SOTA imagecaptioning and situation recognition models on this task. We will release ourcode and pseudo-labels along with Waldo and Wenda, a manually-curated test setfor still image human-human interaction understanding.", "output": "Learning Human-Human Interactions in Images from Weak Textual Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large multimodal datasets have been instrumental in recent breakthroughs suchas CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receivethe same research attention as model architectures or training algorithms. Toaddress this shortcoming in the machine learning ecosystem, we introduceDataComp, a benchmark where the training code is fixed and researchers innovateby proposing new training sets. We provide a testbed for dataset experimentscentered around a new candidate pool of 12.8B image-text pairs from CommonCrawl. Participants in our benchmark design new filtering techniques or curatenew data sources and then evaluate their new dataset by running ourstandardized CLIP training code and testing on 38 downstream test sets. Ourbenchmark consists of multiple scales, with four candidate pool sizes andassociated compute budgets ranging from 12.8M to 12.8B samples seen duringtraining. This multi-scale design facilitates the study of scaling trends andmakes the benchmark accessible to researchers with varying resources.Our baseline experiments show that the DataComp workflow is a promising wayof improving multimodal datasets. We introduce DataComp-1B, a dataset createdby applying a simple filtering algorithm to the 12.8B candidate pool. Theresulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2%zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a largerViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x lesstraining compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentagepoints, which is trained with the same compute budget as our model. These gainshighlight the potential for improving model performance by carefully curatingtraining sets. We view DataComp-1B as only the first step and hope thatDataComp paves the way toward the next generation of multimodal datasets.", "output": "DataComp: In search of the next generation of multimodal datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Weakly supervised object detection (WSOD) aims at learning precise objectdetectors with only image-level tags. In spite of intensive research on deeplearning (DL) approaches over the past few years, there is still a significantperformance gap between WSOD and fully supervised object detection. In fact,most existing WSOD methods only consider the visual appearance of each regionproposal but ignore employing the useful context information in the image. Tothis end, this paper proposes an interactive end-to-end WSDO framework calledJLWSOD with two innovations: i) two types of WSOD-specific context information(i.e., instance-wise correlation andsemantic-wise correlation) are proposed andintroduced into WSOD framework; ii) an interactive graph contrastive learning(iGCL) mechanism is designed to jointly optimize the visual appearance andcontext information for better WSOD performance. Specifically, the iGCLmechanism takes full advantage of the complementary interpretations of theWSOD, namely instance-wise detection and semantic-wise prediction tasks,forming a more comprehensive solution. Extensive experiments on the widely usedPASCAL VOC and MS COCO benchmarks verify the superiority of JLWSOD overalternative state-of-the-art approaches and baseline models (improvement of3.6%~23.3% on mAP and 3.4%~19.7% on CorLoc, respectively).", "output": "Towards Precise Weakly Supervised Object Detection via Interactive Contrastive Learning of Context Information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Advanced deep Convolutional Neural Networks (CNNs) have shown great successin video-based person Re-Identification (Re-ID). However, they usually focus onthe most obvious regions of persons with a limited global representationability. Recently, it witnesses that Transformers explore the inter-patchrelations with global observations for performance improvements. In this work,we take both sides and propose a novel spatial-temporal complementary learningframework named Deeply-Coupled Convolution-Transformer (DCCT) forhigh-performance video-based person Re-ID. Firstly, we couple CNNs andTransformers to extract two kinds of visual features and experimentally verifytheir complementarity. Further, in spatial, we propose a Complementary ContentAttention (CCA) to take advantages of the coupled structure and guideindependent features for spatial complementary learning. In temporal, aHierarchical Temporal Aggregation (HTA) is proposed to progressively capturethe inter-frame dependencies and encode temporal information. Besides, a gatedattention is utilized to deliver aggregated temporal information into the CNNand Transformer branches for temporal complementary learning. Finally, weintroduce a self-distillation training strategy to transfer the superiorspatial-temporal knowledge to backbone networks for higher accuracy and moreefficiency. In this way, two kinds of typical features from same videos areintegrated mechanically for more informative representations. Extensiveexperiments on four public Re-ID benchmarks demonstrate that our frameworkcould attain better performances than most state-of-the-art methods.", "output": "Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose MCLFIQ: Mobile Contactless Fingerprint Image Quality, the firstquality assessment algorithm for mobile contactless fingerprint samples. Tothis end, we retrained the NIST Fingerprint Image Quality (NFIQ) 2 method,which was originally designed for contact-based fingerprints, with a syntheticcontactless fingerprint database. We evaluate the predictive performance of theresulting MCLFIQ model in terms of Error-vs.-Discard Characteristic (EDC)curves on three real-world contactless fingerprint databases using tworecognition algorithms. In experiments, the MCLFIQ method is compared againstthe original NFIQ 2 method and a sharpness-based quality assessment algorithmdeveloped for contactless fingerprint images. Obtained results show that there-training of NFIQ 2 on synthetic data is a viable alternative to training onreal databases. Moreover, the evaluation shows that our MCLFIQ method worksmore accurate and robust compared to NFIQ 2 and the sharpness-based qualityassessment. We suggest considering the proposed MCLFIQ method as a candidatefor a new standard algorithm for contactless fingerprint quality assessment.", "output": "MCLFIQ: Mobile Contactless Fingerprint Image Quality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Discovering inter-point connection for efficient high-dimensional featureextraction from point coordinate is a key challenge in processing point cloud.Most existing methods focus on designing efficient local feature extractorswhile ignoring global connection, or vice versa. In this paper, we design a newInductive Bias-aided Transformer (IBT) method to learn 3D inter-pointrelations, which considers both local and global attentions. Specifically,considering local spatial coherence, local feature learning is performedthrough Relative Position Encoding and Attentive Feature Pooling. Weincorporate the learned locality into the Transformer module. The local featureaffects value component in Transformer to modulate the relationship betweenchannels of each point, which can enhance self-attention mechanism withlocality based channel interaction. We demonstrate its superiorityexperimentally on classification and segmentation tasks. The code is availableat: ", "output": "Exploiting Inductive Bias in Transformer for Point Cloud Classification and Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Contrast maximization (CMax) techniques are widely used in event-based visionsystems to estimate the motion parameters of the camera and generatehigh-contrast images. However, these techniques are noise-intolerance andsuffer from the multiple extrema problem which arises when the scene containsmore noisy events than structure, causing the contrast to be higher at multiplelocations. This makes the task of estimating the camera motion extremelychallenging, which is a problem for neuromorphic earth observation, because,without a proper estimation of the motion parameters, it is not possible togenerate a map with high contrast, causing important details to be lost.Similar methods that use CMax addressed this problem by changing or augmentingthe objective function to enable it to converge to the correct motionparameters. Our proposed solution overcomes the multiple extrema andnoise-intolerance problems by correcting the warped event before calculatingthe contrast and offers the following advantages: it does not depend on theevent data, it does not require a prior about the camera motion, and keeps therest of the CMax pipeline unchanged. This is to ensure that the contrast isonly high around the correct motion parameters. Our approach enables thecreation of better motion-compensated maps through an analytical compensationtechnique using a novel dataset from the International Space Station (ISS).Code is available at url{", "output": "Density Invariant Contrast Maximization for Neuromorphic Earth Observations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a framework for semantic segmentation on sparsesequential point clouds of millimeter-wave radar. Compared with cameras andlidars, millimeter-wave radars have the advantage of not revealing privacy,having a strong anti-interference ability, and having long detection distance.The sparsity and capturing temporal-topological features of mmWave data isstill a problem. However, the issue of capturing the temporal-topologicalcoupling features under the human semantic segmentation task prevents previousadvanced segmentation methods (e.g PointNet, PointCNN, Point Transformer) frombeing well utilized in practical scenarios. To address the challenge caused bythe sparsity and temporal-topological feature of the data, we (i) introducegraph structure and topological features to the point cloud, (ii) propose asemantic segmentation framework including a global feature-extracting moduleand a sequential feature-extracting module. In addition, we design an efficientand more fitting loss function for a better training process and segmentationresults based on graph clustering. Experimentally, we deploy representativesemantic segmentation algorithms (Transformer, GCNN, etc.) on a custom dataset.Experimental results indicate that our model achieves mean accuracy on thecustom dataset by $mathbf{82.31}%$ and outperforms the state-of-the-artalgorithms. Moreover, to validate the model's robustness, we deploy our modelon the well-known S3DIS dataset. On the S3DIS dataset, our model achieves meanaccuracy by $mathbf{92.6}%$, outperforming baseline algorithms.", "output": "Human Semantic Segmentation using Millimeter-Wave Radar Sparse Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimedia content has become ubiquitous on social media platforms, leadingto the rise of multimodal misinformation and the urgent need for effectivestrategies to detect and prevent its spread. This study focuses on CrossModalMisinformation (CMM) where image-caption pairs work together to spreadfalsehoods. We contrast CMM with Asymmetric Multimodal Misinformation (AMM),where one dominant modality propagates falsehoods while other modalities havelittle or no influence. We show that AMM adds noise to the training andevaluation process while exacerbating the unimodal bias, where text-only orimage-only detectors can seemingly outperform their multimodal counterparts onan inherently multimodal task. To address this issue, we collect and curateFIGMENTS, a robust evaluation benchmark for CMM, which consists of real worldcases of misinformation, excludes AMM and utilizes modality balancing tosuccessfully alleviate unimodal bias. FIGMENTS also provides a first steptowards fine-grained CMM detection by including three classes: truthful,out-of-context, and miscaptioned image-caption pairs. Furthermore, we introducea method for generating realistic synthetic training data that maintainscrossmodal relations between legitimate images and false human-written captionsthat we term Crossmodal HArd Synthetic MisAlignment (CHASMA). We conductextensive comparative study using a Transformer-based architecture. Our resultsshow that incorporating CHASMA in conjunction with other generated datasetsconsistently improved the overall performance on FIGMENTS in both binary(+6.26%) and multiclass settings (+15.8%).We release our code at:", "output": "Figments and Misalignments: A Framework for Fine-grained Crossmodal Misinformation Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have demonstrated impressive zero-shot abilitieson a variety of open-ended tasks, while recent research has also explored theuse of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl,a novel training paradigm that equips LLMs with multi-modal abilities throughmodularized learning of foundation LLM, a visual knowledge module, and a visualabstractor module. This approach can support multiple modalities and facilitatediverse unimodal and multimodal abilities through modality collaboration. Thetraining paradigm of mPLUG-Owl involves a two-stage method for aligning imageand text, which learns visual knowledge with the assistance of LLM whilemaintaining and even improving the generation abilities of LLM. In the firststage, the visual knowledge module and abstractor module are trained with afrozen LLM module to align the image and text. In the second stage,language-only and multi-modal supervised datasets are used to jointly fine-tunea low-rank adaption (LoRA) module on LLM and the abstractor module by freezingthe visual knowledge module. We carefully build a visually-related instructionevaluation set OwlEval. Experimental results show that our model outperformsexisting multi-modal models, demonstrating mPLUG-Owl's impressive instructionand visual understanding ability, multi-turn conversation ability, andknowledge reasoning ability. Besides, we observe some unexpected and excitingabilities such as multi-image correlation and scene text understanding, whichmakes it possible to leverage it for harder real scenarios, such as vision-onlydocument comprehension. Our code, pre-trained model, instruction-tuned models,and evaluation set are available at Theonline demo is available at ", "output": "mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current LiDAR odometry, mapping and localization methods leverage point-wiserepresentations of 3D scenes and achieve high accuracy in autonomous drivingtasks. However, the space-inefficiency of methods that use point-wiserepresentations limits their development and usage in practical applications.In particular, scan-submap matching and global map representation methods arerestricted by the inefficiency of nearest neighbor searching (NNS) forlarge-volume point clouds. To improve space-time efficiency, we propose a novelmethod of describing scenes using quadric surfaces, which are far more compactrepresentations of 3D objects than conventional point clouds. In contrast topoint cloud-based methods, our quadric representation-based method decomposes a3D scene into a collection of sparse quadric patches, which improves storageefficiency and avoids the slow point-wise NNS process. Our method firstsegments a given point cloud into patches and fits each of them to a quadricimplicit function. Each function is then coupled with other geometricdescriptors of the patch, such as its center position and covariance matrix.Collectively, these patch representations fully describe a 3D scene, which canbe used in place of the original point cloud and employed in LiDAR odometry,mapping and localization algorithms. We further design a novel incrementalgrowing method for quadric representations, which eliminates the need torepeatedly re-fit quadric surfaces from the original point cloud. Extensiveodometry, mapping and localization experiments on large-volume point clouds inthe KITTI and UrbanLoco datasets demonstrate that our method maintains lowlatency and memory utility while achieving competitive, and even superior,accuracy.", "output": "Quadric Representations for LiDAR Odometry, Mapping and Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical artificial general intelligence (MAGI) enables one foundation modelto solve different medical tasks, which is very practical in the medicaldomain. It can significantly reduce the requirement of large amounts oftask-specific data by sufficiently sharing medical knowledge among differenttasks. However, due to the challenges of designing strongly generalizablemodels with limited and complex medical data, most existing approaches tend todevelop task-specific models. To take a step towards MAGI, we propose a newparadigm called Medical-knOwledge-enhanced mulTimOdal pretRaining (MOTOR). InMOTOR, we combine two kinds of basic medical knowledge, i.e., general andspecific knowledge, in a complementary manner to boost the general pretrainingprocess. As a result, the foundation model with comprehensive basic knowledgecan learn compact representations from pretraining radiographic data for bettercross-modal alignment. MOTOR unifies the understanding and generation, whichare two kinds of core intelligence of an AI system, into a single medicalfoundation model, to flexibly handle more diverse medical tasks. To enable acomprehensive evaluation and facilitate further research, we construct amedical multimodal benchmark including a wide range of downstream tasks, suchas chest x-ray report generation and medical visual question answering.Extensive experiments on our benchmark show that MOTOR obtains promisingresults through simple task-oriented adaptation. The visualization shows thatthe injected knowledge successfully highlights key information in the medicaldata, demonstrating the excellent interpretability of MOTOR. Our MOTORsuccessfully mimics the human practice of fulfilling a \"medical student\" toaccelerate the process of becoming a \"specialist\". We believe that our workmakes a significant stride in realizing MAGI.", "output": "Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of the linear law-based feature space transformation (LLT) algorithmis to assist with the classification of univariate and multivariate timeseries. The presented R package, called LLT, implements this algorithm in aflexible yet user-friendly way. This package first splits the instances intotraining and test sets. It then utilizes time-delay embedding and spectraldecomposition techniques to identify the governing patterns (called linearlaws) of each input sequence (initial feature) within the training set.Finally, it applies the linear laws of the training set to transform theinitial features of the test set. These steps are performed by three separatefunctions called trainTest, trainLaw, and testTrans. Their application requiresa predefined data structure; however, for fast calculation, they use onlybuilt-in functions. The LLT R package and a sample dataset with the appropriatedata structure are publicly available on GitHub.", "output": "LLT: An R package for Linear Law-based Feature Space Transformation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Bias in computer vision systems can perpetuate or even amplify discriminationagainst certain populations. Considering that bias is often introduced bybiased visual datasets, many recent research efforts focus on training fairmodels using such data. However, most of them heavily rely on the availabilityof protected attribute labels in the dataset, which limits their applicability,while label-unaware approaches, i.e., approaches operating without such labels,exhibit considerably lower performance. To overcome these limitations, thiswork introduces FLAC, a methodology that minimizes mutual information betweenthe features extracted by the model and a protected attribute, without the useof attribute labels. To do that, FLAC proposes a sampling strategy thathighlights underrepresented samples in the dataset, and casts the problem oflearning fair representations as a probability matching problem that leveragesrepresentations extracted by a bias-capturing classifier. It is theoreticallyshown that FLAC can indeed lead to fair representations, that are independentof the protected attributes. FLAC surpasses the current state-of-the-art onBiased MNIST, CelebA, and UTKFace, by 29.1%, 18.1%, and 21.9%, respectively.Additionally, FLAC exhibits 2.2% increased accuracy on ImageNet-A consisting ofthe most challenging samples of ImageNet. Finally, in most experiments, FLACeven outperforms the bias label-aware state-of-the-art methods.", "output": "FLAC: Fairness-Aware Representation Learning by Suppressing Attribute-Class Associations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transductive few-shot learning algorithms have showed substantially superiorperformance over their inductive counterparts by leveraging the unlabeledqueries. However, the vast majority of such methods are evaluated on perfectlyclass-balanced benchmarks. It has been shown that they undergo remarkable dropin performance under a more realistic, imbalanced setting. To this end, wepropose a novel algorithm to address imbalanced transductive few-shot learning,named Adaptive Manifold. Our method exploits the underlying manifold of thelabeled support examples and unlabeled queries by using manifold similarity topredict the class probability distribution per query. It is parameterized byone centroid per class as well as a set of graph-specific parameters thatdetermine the manifold. All parameters are optimized through a loss functionthat can be tuned towards class-balanced or imbalanced distributions. Themanifold similarity shows substantial improvement over Euclidean distance,especially in the 1-shot setting. Our algorithm outperforms or is on par withother state of the art methods in three benchmark datasets, namelyminiImageNet, tieredImageNet and CUB, and three different backbones, namelyResNet-18, WideResNet-28-10 and DenseNet-121. In certain cases, our algorithmoutperforms the previous state of the art by as much as 4.2%.", "output": "Adaptive manifold for imbalanced transductive few-shot learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With autonomous industries on the rise, domain adaptation of the visualperception stack is an important research direction due to the cost savingspromise. Much prior art was dedicated to domain-adaptive semantic segmentationin the synthetic-to-real context. Despite being a crucial output of theperception stack, panoptic segmentation has been largely overlooked by thedomain adaptation community. Therefore, we revisit well-performing domainadaptation strategies from other fields, adapt them to panoptic segmentation,and show that they can effectively enhance panoptic domain adaptation. Further,we study the panoptic network design and propose a novel architecture (EDAPS)designed explicitly for domain-adaptive panoptic segmentation. It uses ashared, domain-robust transformer encoder to facilitate the joint adaptation ofsemantic and instance features, but task-specific decoders tailored for thespecific requirements of both domain-adaptive semantic and instancesegmentation. As a result, the performance gap seen in challenging panopticbenchmarks is substantially narrowed. EDAPS significantly improves thestate-of-the-art performance for panoptic segmentation UDA by a large margin of25% on SYNTHIA-to-Cityscapes and even 72% on the more challengingSYNTHIA-to-Mapillary Vistas. The implementation is available at", "output": "EDAPS: Enhanced Domain-Adaptive Panoptic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing instance segmentation techniques are primarily tailored forhigh-visibility inputs, but their performance significantly deteriorates inextremely low-light environments. In this work, we take a deep look at instancesegmentation in the dark and introduce several techniques that substantiallyboost the low-light inference accuracy. The proposed method is motivated by theobservation that noise in low-light images introduces high-frequencydisturbances to the feature maps of neural networks, thereby significantlydegrading performance. To suppress this ``feature noise\", we propose a novellearning method that relies on an adaptive weighted downsampling layer, asmooth-oriented convolutional block, and disturbance suppression learning.These components effectively reduce feature noise during downsampling andconvolution operations, enabling the model to learn disturbance-invariantfeatures. Furthermore, we discover that high-bit-depth RAW images can betterpreserve richer scene information in low-light conditions compared to typicalcamera sRGB outputs, thus supporting the use of RAW-input algorithms. Ouranalysis indicates that high bit-depth can be critical for low-light instancesegmentation. To mitigate the scarcity of annotated RAW datasets, we leverage alow-light RAW synthetic pipeline to generate realistic low-light data. Inaddition, to facilitate further research in this direction, we capture areal-world low-light instance segmentation dataset comprising over two thousandpaired low/normal-light images with instance-level pixel-wise annotations.Remarkably, without any image preprocessing, we achieve satisfactoryperformance on instance segmentation in very low light (4~% AP higher thanstate-of-the-art competitors), meanwhile opening new opportunities for futureresearch.", "output": "Instance Segmentation in the Dark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, deep learning based approaches have shown promising results in 3Dhand reconstruction from a single RGB image. These approaches can be roughlydivided into model-based approaches, which are heavily dependent on the model'sparameter space, and model-free approaches, which require large numbers of 3Dground truths to reduce depth ambiguity and struggle in weakly-supervisedscenarios. To overcome these issues, we propose a novel probabilistic model toachieve the robustness of model-based approaches and reduced dependence on themodel's parameter space of model-free approaches. The proposed probabilisticmodel incorporates a model-based network as a prior-net to estimate the priorprobability distribution of joints and vertices. An Attention-based MeshVertices Uncertainty Regression (AMVUR) model is proposed to capturedependencies among vertices and the correlation between joints and meshvertices to improve their feature representation. We further propose a learningbased occlusion-aware Hand Texture Regression model to achieve high-fidelitytexture reconstruction. We demonstrate the flexibility of the proposedprobabilistic model to be trained in both supervised and weakly-supervisedscenarios. The experimental results demonstrate our probabilistic model'sstate-of-the-art accuracy in 3D hand and texture reconstruction from a singleimage in both training schemes, including in the presence of severe occlusions.", "output": "A Probabilistic Attention Model with Occlusion-aware Texture Regression for 3D Hand Reconstruction from a Single RGB Image."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work represents a large step into modern ways of fast 3D reconstructionbased on RGB camera images. Utilizing a Microsoft HoloLens 2 as a multisensorplatform that includes an RGB camera and an inertial measurement unit forSLAM-based camera-pose determination, we train a Neural Radiance Field (NeRF)as a neural scene representation in real-time with the acquired data from theHoloLens. The HoloLens is connected via Wifi to a high-performance PC that isresponsible for the training and 3D reconstruction. After the data stream ends,the training is stopped and the 3D reconstruction is initiated, which extractsa point cloud of the scene. With our specialized inference algorithm, fivemillion scene points can be extracted within 1 second. In addition, the pointcloud also includes radiometry per point. Our method of 3D reconstructionoutperforms grid point sampling with NeRFs by multiple orders of magnitude andcan be regarded as a complete real-time 3D reconstruction method in a mobilemapping setup.", "output": "Combining HoloLens with Instant-NeRFs: Advanced Real-Time 3D Mobile Mapping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We explore the problem of Incremental Generalized Category Discovery (IGCD).This is a challenging category incremental learning setting where the goal isto develop models that can correctly categorize images from previously seencategories, in addition to discovering novel ones. Learning is performed over aseries of time steps where the model obtains new labeled and unlabeled data,and discards old data, at each iteration. The difficulty of the problem iscompounded in our generalized setting as the unlabeled data can contain imagesfrom categories that may or may not have been observed before. We present a newmethod for IGCD which combines non-parametric categorization with efficientimage sampling to mitigate catastrophic forgetting. To quantify performance, wepropose a new benchmark dataset named iNatIGCD that is motivated by areal-world fine-grained visual categorization task. In our experiments weoutperform existing related methods", "output": "Incremental Generalized Category Discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Person Re-ID has been gaining a lot of attention and nowadays is offundamental importance in many camera surveillance applications. The taskconsists of identifying individuals across multiple cameras that have nooverlapping views. Most of the approaches require labeled data, which is notalways available, given the huge amount of demanded data and the difficulty ofmanually assigning a class for each individual. Recently, studies have shownthat re-ranking methods are capable of achieving significant gains, especiallyin the absence of labeled data. Besides that, the fusion of feature extractorsand multiple-source training is another promising research direction notextensively exploited. We aim to fill this gap through a manifold rankaggregation approach capable of exploiting the complementarity of differentperson Re-ID rankers. In this work, we perform a completely unsupervisedselection and fusion of diverse ranked lists obtained from multiple and diversefeature extractors. Among the contributions, this work proposes a queryperformance prediction measure that models the relationship among imagesconsidering a hypergraph structure and does not require the use of any labeleddata. Expressive gains were obtained in four datasets commonly used for personRe-ID. We achieved results competitive to the state-of-the-art in most of thescenarios.", "output": "Person Re-ID through Unsupervised Hypergraph Rank Selection and Fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "By identifying four important components of existing LiDAR-camera 3D objectdetection methods (LiDAR and camera candidates, transformation, and fusionoutputs), we observe that all existing methods either find dense candidates oryield dense representations of scenes. However, given that objects occupy onlya small part of a scene, finding dense candidates and generating denserepresentations is noisy and inefficient. We propose SparseFusion, a novelmulti-sensor 3D detection method that exclusively uses sparse candidates andsparse representations. Specifically, SparseFusion utilizes the outputs ofparallel detectors in the LiDAR and camera modalities as sparse candidates forfusion. We transform the camera candidates into the LiDAR coordinate space bydisentangling the object representations. Then, we can fuse the multi-modalitycandidates in a unified 3D space by a lightweight self-attention module. Tomitigate negative transfer between modalities, we propose novel semantic andgeometric cross-modality transfer modules that are applied prior to themodality-specific detectors. SparseFusion achieves state-of-the-art performanceon the nuScenes benchmark while also running at the fastest speed, evenoutperforming methods with stronger backbones. We perform extensive experimentsto demonstrate the effectiveness and efficiency of our modules and overallmethod pipeline. Our code will be made publicly available at", "output": "SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper we present a method to analyze the inner structure of thecomposite FRP rebar, namely the shift of the real center of gravity with arespect to the geometrical center of rebar and changes of cross-sectionalcharacteristics. We propose an automated pipeline based on classical computervision techniques and on the ratio between the glass fibers and epoxy filamentin the analyzed cross-section to compute the shift vector of the real center ofgravity in respect to the geometrical center together with the cross-sectionarea and its principal moments. We discuss the achieved results over two crosssections in a different portion of the rebar and in the end, we suggestpossible direction and improvements for our future work. We also made our codepublicly available.", "output": "Structure Analysis of the FRP Rebar Using Computer Vision Techniques."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robotic perception requires the modeling of both 3D geometry and semantics.Existing methods typically focus on estimating 3D bounding boxes, neglectingfiner geometric details and struggling to handle general, out-of-vocabularyobjects. To overcome these limitations, we introduce a novel task for 3Doccupancy prediction, which aims to estimate the detailed occupancy andsemantics of objects from multi-view images. To facilitate this task, wedevelop a label generation pipeline that produces dense, visibility-awarelabels for a given scene. This pipeline includes point cloud aggregation, pointlabeling, and occlusion handling. We construct two benchmarks based on theWaymo Open Dataset and the nuScenes Dataset, resulting in the Occ3D-Waymo andOcc3D-nuScenes benchmarks. Lastly, we propose a model, dubbed Coarse-to-FineOccupancy (CTF-Occ) network, which demonstrates superior performance in the 3Doccupancy prediction task. This approach addresses the need for finer geometricunderstanding in a coarse-to-fine fashion. The code, data, and benchmarks arereleased at ", "output": "Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural fields are neural networks which map coordinates to a desired signal.When a neural field should jointly model multiple signals, and not memorizeonly one, it needs to be conditioned on a latent code which describes thesignal at hand. Despite being an important aspect, there has been littleresearch on conditioning strategies for neural fields. In this work, we explorethe use of neural fields as decoders for 2D semantic segmentation. For thistask, we compare three conditioning methods, simple concatenation of the latentcode, Feature Wise Linear Modulation (FiLM), and Cross-Attention, inconjunction with latent codes which either describe the full image or only alocal region of the image. Our results show a considerable difference inperformance between the examined conditioning strategies. Furthermore, we showthat conditioning via Cross-Attention achieves the best results and iscompetitive with a CNN-based decoder for semantic segmentation.", "output": "Neural Field Conditioning Strategies for 2D Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Segmentation is a core computer vision competency, with applications spanninga broad range of scientifically and economically valuable domains. To date,however, the prohibitive cost of annotation has limited the deployment offlexible segmentation models. In this work, we propose Zero-shot UnsupervisedTransfer Instance Segmentation (ZUTIS), a framework that aims to meet thischallenge. The key strengths of ZUTIS are: (i) no requirement forinstance-level or pixel-level annotations; (ii) an ability of zero-shottransfer, i.e., no assumption on access to a target data distribution; (iii) aunified framework for semantic and instance segmentations with solidperformance on both tasks compared to state-of-the-art unsupervised methods.While comparing to previous work, we show ZUTIS achieves a gain of 2.2 mask APon COCO-20K and 14.5 mIoU on ImageNet-S with 919 categories for instance andsemantic segmentations, respectively. The code is made publicly available.", "output": "Zero-shot Unsupervised Transfer Instance Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Co-SLAM, a neural RGB-D SLAM system based on a hybridrepresentation, that performs robust camera tracking and high-fidelity surfacereconstruction in real time. Co-SLAM represents the scene as a multi-resolutionhash-grid to exploit its high convergence speed and ability to representhigh-frequency local features. In addition, Co-SLAM incorporates one-blobencoding, to encourage surface coherence and completion in unobserved areas.This joint parametric-coordinate encoding enables real-time and robustperformance by bringing the best of both worlds: fast convergence and surfacehole filling. Moreover, our ray sampling strategy allows Co-SLAM to performglobal bundle adjustment over all keyframes instead of requiring keyframeselection to maintain a small number of active keyframes as competing neuralSLAM approaches do. Experimental results show that Co-SLAM runs at 10-17Hz andachieves state-of-the-art scene reconstruction results, and competitivetracking performance in various datasets and benchmarks (ScanNet, TUM, Replica,Synthetic RGBD). Project page: ", "output": "Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Foundation models have achieved great advances in multi-task learning with aunified interface of unimodal and multimodal tasks. However, the potential ofsuch multi-task learners has not been exploited during transfer learning. Inthis work, we present a universal parameter-efficient transfer learning method,termed Predict-Interpolate Tuning ($pi$-Tuning), for vision, language, andvision-language tasks. It aggregates the parameters of lightweighttask-specific experts learned from similar tasks to aid the target downstreamtask. The task similarities are predicted in a unified modality-independentspace, yielding a scalable graph to demonstrate task relationships.$pi$-Tuning has several appealing benefits. First, it flexibly explores bothintra- and inter-modal transferability between similar tasks to improve theaccuracy and robustness of transfer learning, especially in data-scarcescenarios. Second, it offers a systematical solution for transfer learning withmulti-task prediction-and-then-interpolation, compatible with diverse types ofparameter-efficient experts, such as prompt and adapter. Third, an extensivestudy of task-level mutual benefits on 14 unimodal and 6 multimodal datasetsshows that $pi$-Tuning surpasses fine-tuning and other parameter-efficienttransfer learning methods both in full-shot and low-shot regimes. The taskgraph also enables an in-depth interpretable analysis of task transferabilityacross modalities.", "output": "$\\pi$-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Analogical Networks, a model that encodes domain knowledgeexplicitly, in a collection of structured labelled 3D scenes, in addition toimplicitly, as model parameters, and segments 3D object scenes with analogicalreasoning: instead of mapping a scene to part segments directly, our modelfirst retrieves related scenes from memory and their corresponding partstructures, and then predicts analogous part structures for the input scene,via an end-to-end learnable modulation mechanism. By conditioning on more thanone retrieved memories, compositions of structures are predicted, that mix andmatch parts across the retrieved memories. One-shot, few-shot or many-shotlearning are treated uniformly in Analogical Networks, by conditioning on theappropriate set of memories, whether taken from a single, few or many memoryexemplars, and inferring analogous parses. We show Analogical Networks arecompetitive with state-of-the-art 3D segmentation transformers in many-shotsettings, and outperform them, as well as existing paradigms of meta-learningand few-shot learning, in few-shot settings. Analogical Networks successfullysegment instances of novel object categories simply by expanding their memory,without any weight updates. Our code and models are publicly available in theproject webpage: ", "output": "Analogy-Forming Transformers for Few-Shot 3D Parsing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is compositional; an instruction can express multiple relationconstraints to hold among objects in a scene that a robot is tasked torearrange. Our focus in this work is an instructable scene rearrangingframework that generalizes to longer instructions and to spatial conceptcompositions never seen at training time. We propose to representlanguage-instructed spatial concepts with energy functions over relative objectarrangements. A language parser maps instructions to corresponding energyfunctions and an open-vocabulary visual-language model grounds their argumentsto relevant objects in the scene. We generate goal scene configurations bygradient descent on the sum of energy functions, one per language predicate inthe instruction. Local vision-based policies then relocate objects to theinferred goal locations. We test our model on established instruction-guidedmanipulation benchmarks, as well as benchmarks of compositional instructions weintroduce. We show our model can execute highly compositional instructionszero-shot in simulation and in the real world. It outperformslanguage-to-action reactive policies and Large Language Model planners by alarge margin, especially for long instructions that involve compositions ofmultiple spatial concepts.", "output": "Energy-based Models as Zero-Shot Planners for Compositional Scene Rearrangement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we present a new sequence-to-sequence learning framework forvisual tracking, dubbed SeqTrack. It casts visual tracking as a sequencegeneration problem, which predicts object bounding boxes in an autoregressivefashion. This is different from prior Siamese trackers and transformertrackers, which rely on designing complicated head networks, such asclassification and regression heads. SeqTrack only adopts a simpleencoder-decoder transformer architecture. The encoder extracts visual featureswith a bidirectional transformer, while the decoder generates a sequence ofbounding box values autoregressively with a causal transformer. The lossfunction is a plain cross-entropy. Such a sequence learning paradigm not onlysimplifies tracking framework, but also achieves competitive performance onbenchmarks. For instance, SeqTrack gets 72.5% AUC on LaSOT, establishing a newstate-of-the-art performance. Code and models are available at here.", "output": "SeqTrack: Sequence to Sequence Learning for Visual Object Tracking."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper shows that it is possible to learn models for monocular 3Dreconstruction of articulated objects (e.g., horses, cows, sheep), using as fewas 50-150 images labeled with 2D keypoints. Our proposed approach involvestraining category-specific keypoint estimators, generating 2D keypointpseudo-labels on unlabeled web images, and using both the labeled andself-labeled sets to train 3D reconstruction models. It is based on two keyinsights: (1) 2D keypoint estimation networks trained on as few as 50-150images of a given object category generalize well and generate reliablepseudo-labels; (2) a data selection mechanism can automatically create a\"curated\" subset of the unlabeled web images that can be used for training --we evaluate four data selection methods. Coupling these two insights enables usto train models that effectively utilize web images, resulting in improved 3Dreconstruction performance for several articulated object categories beyond thefully-supervised baseline. Our approach can quickly bootstrap a model andrequires only a few images labeled with 2D keypoints. This requirement can beeasily satisfied for any new object category. To showcase the practicality ofour approach for predicting the 3D shape of arbitrary object categories, weannotate 2D keypoints on giraffe and bear images from COCO -- the annotationprocess takes less than 1 minute per image.", "output": "Learning Articulated Shape with Keypoint Pseudo-labels from Web Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scalable Vector Graphics (SVG) is a prevalent vector image format with goodsupport for interactivity and animation. Despite such appealingcharacteristics, it is generally challenging for users to create their own SVGcontent because of the long learning curve to comprehend SVG grammars oracquaint themselves with professional editing software. Recent progress intext-to-image generation has inspired researchers to explore image-based iconsynthesis (i.e., text -> raster image -> vector image) via differentialrendering and language-based icon synthesis (i.e., text -> vector image script)via the \"zero-shot\" capabilities of large language models. However, thesemethods may suffer from several limitations regarding generation quality,diversity, flexibility, and speed. In this paper, we introduce IconShop, atext-guided vector icon synthesis method using an autoregressive transformer.The key to success of our approach is to sequentialize and tokenize the SVGpaths (and textual descriptions) into a uniquely decodable command sequence.With such a single sequence as input, we are able to fully exploit the sequencelearning power of autoregressive transformers, while enabling various iconsynthesis and manipulation tasks. Through standard training to predict the nexttoken on a large-scale icon dataset accompanied by textural descriptions, theproposed IconShop consistently exhibits better icon synthesis performance thanexisting image-based and language-based methods both quantitatively (using theFID and CLIP score) and qualitatively (through visual inspection). Meanwhile,we observe a dramatic improvement in generation diversity, which is supportedby objective measures (Uniqueness and Novelty). More importantly, wedemonstrate the flexibility of IconShop with two novel icon manipulation tasks- text-guided icon infilling, and text-combined icon synthesis.", "output": "IconShop: Text-Based Vector Icon Synthesis with Autoregressive Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While NeRF-based human representations have shown impressive novel viewsynthesis results, most methods still rely on a large number of images / viewsfor training. In this work, we propose a novel animatable NeRF calledActorsNeRF. It is first pre-trained on diverse human subjects, and then adaptedwith few-shot monocular video frames for a new actor with unseen poses.Building on previous generalizable NeRFs with parameter sharing using a ConvNetencoder, ActorsNeRF further adopts two human priors to capture the large humanappearance, shape, and pose variations. Specifically, in the encoded featurespace, we will first align different human subjects in a category-levelcanonical space, and then align the same human from different frames in aninstance-level canonical space for rendering. We quantitatively andqualitatively demonstrate that ActorsNeRF significantly outperforms theexisting state-of-the-art on few-shot generalization to new people and poses onmultiple datasets. Project Page: ", "output": "ActorsNeRF: Animatable Few-shot Human Rendering with Generalizable NeRFs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "StyleGAN's disentangled style representation enables powerful image editingby manipulating the latent variables, but accurately mapping real-world imagesto their latent variables (GAN inversion) remains a challenge. Existing GANinversion methods struggle to maintain editing directions and produce realisticresults.To address these limitations, we propose Make It So, a novel GAN inversionmethod that operates in the $mathcal{Z}$ (noise) space rather than the typical$mathcal{W}$ (latent style) space. Make It So preserves editing capabilities,even for out-of-domain images. This is a crucial property that was overlookedin prior methods. Our quantitative evaluations demonstrate that Make It Sooutperforms the state-of-the-art method PTI~cite{roich2021pivotal} by a factorof five in inversion accuracy and achieves ten times better edit quality forcomplex indoor scenes.", "output": "Make It So: Steering StyleGAN for Any Image Inversion and Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in diffusion models have greatly improved the quality anddiversity of synthesized content. To harness the expressive power of diffusionmodels, researchers have explored various controllable mechanisms that allowusers to intuitively guide the content synthesis process. Although the latestefforts have primarily focused on video synthesis, there has been a lack ofeffective methods for controlling and describing desired content and motion. Inresponse to this gap, we introduce MCDiff, a conditional diffusion model thatgenerates a video from a starting image frame and a set of strokes, which allowusers to specify the intended content and dynamics for synthesis. To tackle theambiguity of sparse motion inputs and achieve better synthesis quality, MCDifffirst utilizes a flow completion model to predict the dense video motion basedon the semantic understanding of the video frame and the sparse motion control.Then, the diffusion model synthesizes high-quality future frames to form theoutput video. We qualitatively and quantitatively show that MCDiff achieves thestate-the-of-art visual quality in stroke-guided controllable video synthesis.Additional experiments on MPII Human Pose further exhibit the capability of ourmodel on diverse content and motion synthesis.", "output": "Motion-Conditioned Diffusion Model for Controllable Video Synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of inferring scene affordances by presenting a methodfor realistically inserting people into scenes. Given a scene image with amarked region and an image of a person, we insert the person into the scenewhile respecting the scene affordances. Our model can infer the set ofrealistic poses given the scene context, re-pose the reference person, andharmonize the composition. We set up the task in a self-supervised fashion bylearning to re-pose humans in video clips. We train a large-scale diffusionmodel on a dataset of 2.4M video clips that produces diverse plausible poseswhile respecting the scene context. Given the learned human-scene composition,our model can also hallucinate realistic people and scenes when promptedwithout conditioning and also enables interactive editing. A quantitativeevaluation shows that our method synthesizes more realistic human appearanceand more natural human-scene interactions than prior work.", "output": "Putting People in Their Place: Affordance-Aware Human Insertion into Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing deep video models are limited by specific tasks, fixed input-outputspaces, and poor generalization capabilities, making it difficult to deploythem in real-world scenarios. In this paper, we present our vision formultimodal and versatile video understanding and propose a prototype system,system. Our system is built upon a tracklet-centric paradigm, which treatstracklets as the basic video unit and employs various Video Foundation Models(ViFMs) to annotate their properties e.g., appearance, motion, etc. All thedetected tracklets are stored in a database and interact with the user througha database manager. We have conducted extensive case studies on different typesof in-the-wild videos, which demonstrates the effectiveness of our method inanswering various video-related problems. Our project is available at", "output": "ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGGNet,ResNet, GoogleNet, include tens to hundreds of millions of parameters, whichimpose considerable computation and memory overhead. This limits theirpractical use for training, optimization and memory efficiency. On thecontrary, light-weight architectures, being proposed to address this issue,mainly suffer from low accuracy. These inefficiencies mostly stem fromfollowing an ad hoc procedure. We propose a simple architecture, calledSimpleNet, based on a set of designing principles, with which we empiricallyshow, a well-crafted yet simple and reasonably deep architecture can perform onpar with deeper and more complex architectures. SimpleNet provides a goodtradeoff between the computation/memory efficiency and the accuracy. Our simple13-layer architecture outperforms most of the deeper and complex architecturesto date such as VGGNet, ResNet, and GoogleNet on several well-known benchmarkswhile having 2 to 25 times fewer number of parameters and operations. Thismakes it very handy for embedded systems or systems with computational andmemory limitations. We achieved state-of-the-art result on CIFAR10outperforming several heavier architectures, near state of the art on MNIST andcompetitive results on CIFAR100 and SVHN. We also outperformed the much largerand deeper architectures such as VGGNet and popular variants of ResNets amongothers on the ImageNet dataset. Models are made available at:", "output": "Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Typical machine learning approaches require centralized data for modeltraining, which may not be possible where restrictions on data sharing are inplace due to, for instance, privacy and gradient protection. The recentlyproposed Federated Learning (FL) framework allows learning a shared modelcollaboratively without data being centralized or shared among data owners.However, we show in this paper that the generalization ability of the jointmodel is poor on Non-Independent and Non-Identically Distributed (Non-IID)data, particularly when the Federated Averaging (FedAvg) strategy is used dueto the weight divergence phenomenon. Hence, we propose a novel boostingalgorithm for FL to address both the generalization and gradient leakageissues, as well as achieve faster convergence in gradient-based optimization.In addition, a secure gradient sharing protocol using Homomorphic Encryption(HE) and Differential Privacy (DP) is introduced to defend against gradientleakage attack and avoid pairwise encryption that is not scalable. Wedemonstrate the proposed Federated Boosting (FedBoosting) method achievesnoticeable improvements in both prediction accuracy and run-time efficiency ina visual text recognition task on public benchmark.", "output": "FedBoosting: Federated Learning with Gradient Protected Boosting for Text Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Neural Networks (DNNs) learn representation from data with an impressivecapability, and brought important breakthroughs for processing images,time-series, natural language, audio, video, and many others. In the remotesensing field, surveys and literature revisions specifically involving DNNsalgorithms' applications have been conducted in an attempt to summarize theamount of information produced in its subfields. Recently, Unmanned AerialVehicles (UAV) based applications have dominated aerial sensing research.However, a literature revision that combines both \"deep learning\" and \"UAVremote sensing\" thematics has not yet been conducted. The motivation for ourwork was to present a comprehensive review of the fundamentals of Deep Learning(DL) applied in UAV-based imagery. We focused mainly on describingclassification and regression techniques used in recent applications withUAV-acquired data. For that, a total of 232 papers published in internationalscientific journal databases was examined. We gathered the published materialand evaluated their characteristics regarding application, sensor, andtechnique used. We relate how DL presents promising results and has thepotential for processing tasks associated with UAV-based image data. Lastly, weproject future perspectives, commentating on prominent DL paths to be exploredin the UAV remote sensing field. Our revision consists of a friendly-approachto introduce, commentate, and summarize the state-of-the-art in UAV-based imageapplications with DNNs algorithms in diverse subfields of remote sensing,grouping it in the environmental, urban, and agricultural contexts.", "output": "A Review on Deep Learning in UAV Remote Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Labelling point clouds fully is highly time-consuming and costly. As largerpoint cloud datasets with billions of points become more common, we ask whetherthe full annotation is even necessary, demonstrating that existing baselinesdesigned under a fully annotated assumption only degrade slightly even whenfaced with 1% random point annotations. However, beyond this point, e.g., at0.1% annotations, segmentation accuracy is unacceptably low. We observe that,as point clouds are samples of the 3D world, the distribution of points in alocal neighborhood is relatively homogeneous, exhibiting strong semanticsimilarity. Motivated by this, we propose a new weak supervision method toimplicitly augment highly sparse supervision signals. Extensive experimentsdemonstrate the proposed Semantic Query Network (SQN) achieves promisingperformance on seven large-scale open datasets under weak supervision schemes,while requiring only 0.1% randomly annotated points for training, greatlyreducing annotation cost and effort. The code is available at", "output": "SQN: Weakly-Supervised Semantic Segmentation of Large-Scale 3D Point Clouds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image-to-image translation (i2i) networks suffer from entanglement effects inpresence of physics-related phenomena in target domain (such as occlusions,fog, etc), lowering altogether the translation quality, controllability andvariability. In this paper, we propose a general framework to disentanglevisual traits in target images. Primarily, we build upon collection of simplephysics models, guiding the disentanglement with a physical model that renderssome of the target traits, and learning the remaining ones. Because physicsallows explicit and interpretable outputs, our physical models (optimallyregressed on target) allows generating unseen scenarios in a controllablemanner. Secondarily, we show the versatility of our framework to neural-guideddisentanglement where a generative network is used in place of a physical modelin case the latter is not directly accessible. Altogether, we introduce threestrategies of disentanglement being guided from either a fully differentiablephysics model, a (partially) non-differentiable physics model, or a neuralnetwork. The results show our disentanglement strategies dramatically increaseperformances qualitatively and quantitatively in several challenging scenariosfor image translation.", "output": "Physics-informed Guided Disentanglement in Generative Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional Neural Networks have demonstrated dermatologist-levelperformance in the classification of melanoma from skin lesion images, butprediction irregularities due to biases seen within the training data are anissue that should be addressed before widespread deployment is possible. Inthis work, we robustly remove bias and spurious variation from an automatedmelanoma classification pipeline using two leading bias unlearning techniques.We show that the biases introduced by surgical markings and rulers presented inprevious studies can be reasonably mitigated using these bias removal methods.We also demonstrate the generalisation benefits of unlearning spuriousvariation relating to the imaging instrument used to capture lesion images. Ourexperimental results provide evidence that the effects of each of theaforementioned biases are notably reduced, with different debiasing techniquesexcelling at different tasks.", "output": "Skin Deep Unlearning: Artefact and Instrument Debiasing in the Context of Melanoma Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The capability of a mobile robot to efficiently and safely perform complexmissions is limited by its knowledge of the environment, namely the situation.Advanced reasoning, decision-making, and execution skills enable an intelligentagent to act autonomously in unknown environments. Situational Awareness (SA)is a fundamental capability of humans that has been deeply studied in variousfields, such as psychology, military, aerospace, and education. Nevertheless,it has yet to be considered in robotics, which has focused on singlecompartmentalized concepts such as sensing, spatial perception, sensor fusion,state estimation, and Simultaneous Localization and Mapping (SLAM). Hence, thepresent research aims to connect the broad multidisciplinary existing knowledgeto pave the way for a complete SA system for mobile robotics that we deemparamount for autonomy. To this aim, we define the principal components tostructure a robotic SA and their area of competence. Accordingly, this paperinvestigates each aspect of SA, surveying the state-of-the-art roboticsalgorithms that cover them, and discusses their current limitations.Remarkably, essential aspects of SA are still immature since the currentalgorithmic development restricts their performance to only specificenvironments. Nevertheless, Artificial Intelligence (AI), particularly DeepLearning (DL), has brought new methods to bridge the gap that maintains thesefields apart from the deployment to real-world scenarios. Furthermore, anopportunity has been discovered to interconnect the vastly fragmented space ofrobotic comprehension algorithms through the mechanism of Situational Graph(S-Graph), a generalization of the well-known scene graph. Therefore, wefinally shape our vision for the future of robotic Situational Awareness bydiscussing interesting recent research directions.", "output": "From SLAM to Situational Awareness: Challenges and Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial learning-based image defogging methods have been extensivelystudied in computer vision due to their remarkable performance. However, mostexisting methods have limited defogging capabilities for real cases becausethey are trained on the paired clear and synthesized foggy images of the samescenes. In addition, they have limitations in preserving vivid color and richtextual details in defogging. To address these issues, we develop a novelgenerative adversarial network, called quad-path cycle consistent adversarialnetwork (QPC-Net), for single image defogging. QPC-Net consists of aFog2Fogfree block and a Fogfree2Fog block. In each block, there are threelearning-based modules, namely, fog removal, color-texture recovery, and fogsynthetic, which sequentially compose dual-path that constrain each other togenerate high quality images. Specifically, the color-texture recovery model isdesigned to exploit the self-similarity of texture and structure information bylearning the holistic channel-spatial feature correlations between the foggyimage with its several derived images. Moreover, in the fog synthetic module,we utilize the atmospheric scattering model to guide it to improve thegenerative quality by focusing on an atmospheric light optimization with anovel sky segmentation network. Extensive experiments on both synthetic andreal-world datasets show that QPC-Net outperforms state-of-the-art defoggingmethods in terms of quantitative accuracy and subjective visual quality.", "output": "Unpaired Quad-Path Cycle Consistent Adversarial Networks for Single Image Defogging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparsely-gated Mixture of Expert (MoE) layers have been recently successfullyapplied for scaling large transformers, especially for language modeling tasks.An intriguing side effect of sparse MoE layers is that they convey inherentinterpretability to a model via natural expert specialization. In this work, weapply sparse MoE layers to CNNs for computer vision tasks and analyze theresulting effect on model interpretability. To stabilize MoE training, wepresent both soft and hard constraint-based approaches. With hard constraints,the weights of certain experts are allowed to become zero, while softconstraints balance the contribution of experts with an additional auxiliaryloss. As a result, soft constraints handle expert utilization better andsupport the expert specialization process, while hard constraints maintain moregeneralized experts and increase overall model performance. Our findingsdemonstrate that experts can implicitly focus on individual sub-domains of theinput space. For example, experts trained for CIFAR-100 image classificationspecialize in recognizing different domains such as flowers or animals withoutprevious data clustering. Experiments with RetinaNet and the COCO datasetfurther indicate that object detection experts can also specialize in detectingobjects of distinct sizes.", "output": "Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although Deep Neural Networks (DNNs) are incredibly effective in learningcomplex abstractions, they are susceptible to unintentionally learning spuriousartifacts from the training data. To ensure model transparency, it is crucialto examine the relationships between learned representations, as unintendedconcepts often manifest themselves to be anomalous to the desired task. In thiswork, we introduce DORA (Data-agnOstic Representation Analysis): the firstdata-agnostic framework for the analysis of the representation space of DNNs.Our framework employs the proposed Extreme-Activation (EA) distance measurebetween representations that utilizes self-explaining capabilities within thenetwork without accessing any data. We quantitatively validate the metric'scorrectness and alignment with human-defined semantic distances. The coherencebetween the EA distance and human judgment enables us to identifyrepresentations whose underlying concepts would be considered unnatural byhumans by identifying outliers in functional distance. Finally, we demonstratethe practical usefulness of DORA by analyzing and identifying artifactrepresentations in popular Computer Vision models.", "output": "DORA: Exploring outlier representations in Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deploying Machine learning (ML) on milliwatt-scale edge devices (tinyML) isgaining popularity due to recent breakthroughs in ML and Internet of Things(IoT). Most tinyML research focuses on model compression techniques that tradeaccuracy (and model capacity) for compact models to fit into the KB-sizedtiny-edge devices. In this paper, we show how such models can be enhanced bythe addition of an early exit intermediate classifier. If the intermediateclassifier exhibits sufficient confidence in its prediction, the network exitsearly thereby, resulting in considerable savings in time. Although early exitclassifiers have been proposed in previous work, these previous proposals focuson large networks, making their techniques suboptimal/impractical for tinyMLapplications. Our technique is optimized specifically for tiny-CNN sizedmodels. In addition, we present a method to alleviate the effect of networkoverthinking by leveraging the representations learned by the early exit. Weevaluate T-RecX on three CNNs from the MLPerf tiny benchmark suite for imageclassification, keyword spotting and visual wake word detection tasks. Ourresults show that T-RecX 1) improves the accuracy of baseline network, 2)achieves 31.58% average reduction in FLOPS in exchange for one percent accuracyacross all evaluated models. Furthermore, we show that our methods consistentlyoutperform popular prior works on the tiny-CNNs we evaluate.", "output": "T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Leveraging multi-modal fusion, especially between camera and LiDAR, hasbecome essential for building accurate and robust 3D object detection systemsfor autonomous vehicles. Until recently, point decorating approaches, in whichpoint clouds are augmented with camera features, have been the dominantapproach in the field. However, these approaches fail to utilize the higherresolution images from cameras. Recent works projecting camera features to thebird's-eye-view (BEV) space for fusion have also been proposed, however theyrequire projecting millions of pixels, most of which only contain backgroundinformation. In this work, we propose a novel approach Center Feature Fusion(CFF), in which we leverage center-based detection networks in both the cameraand LiDAR streams to identify relevant object locations. We then use thecenter-based detection to identify the locations of pixel features relevant toobject locations, a small fraction of the total number in the image. These arethen projected and fused in the BEV frame. On the nuScenes dataset, weoutperform the LiDAR-only baseline by 4.9% mAP while fusing up to 100x fewerfeatures than other fusion methods.", "output": "Center Feature Fusion: Selective Multi-Sensor Fusion of Center-based Objects."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the manufacturing process of heavy industrial equipment, the specific unitin the welding diagram is first manually redrawn and then the correspondingsheet metal parts are cut, which is inefficient. To this end, this paperproposes a U-net-based method for the segmentation and extraction of specificunits in welding engineering drawings. This method enables the cutting deviceto automatically segment specific graphic units according to visual informationand automatically cut out sheet metal parts of corresponding shapes accordingto the segmentation results. This process is more efficient than traditionalhuman-assisted cutting. Two weaknesses in the U-net network will lead to adecrease in segmentation performance: first, the focus on global semanticfeature information is weak, and second, there is a large dimensionaldifference between shallow encoder features and deep decoder features. Based onthe CBAM (Convolutional Block Attention Module) attention mechanism, this paperproposes a U-net jump structure model with an attention mechanism to improvethe network's global semantic feature extraction ability. In addition, a U-netattention mechanism model with dual pooling convolution fusion is designed, thedeep encoder's maximum pooling + convolution features and the shallow encoder'saverage pooling + convolution features are fused vertically to reduce thedimension difference between the shallow encoder and deep decoder. Thedual-pool convolutional attention jump structure replaces the traditional U-netjump structure, which can effectively improve the specific unit segmentationperformance of the welding engineering drawing. Using vgg16 as the backbonenetwork, experiments have verified that the IoU, mAP, and Accu of our model inthe welding engineering drawing dataset segmentation task are 84.72%, 86.84%,and 99.42%, respectively.", "output": "Segmentation method of U-net sheet metal engineering drawing based on CBAM attention mechanism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Prior work has shown that Visual Recognition datasets frequentlyunderrepresent bias groups $B$ (eg Female) within class labels $Y$ (egProgrammers). This dataset bias can lead to models that learn spuriouscorrelations between class labels and bias groups such as age, gender, or race.Most recent methods that address this problem require significant architecturalchanges or additional loss functions requiring more hyper-parameter tuning.Alternatively, data sampling baselines from the class imbalance literature (egUndersampling, Upweighting), which can often be implemented in a single line ofcode and often have no hyperparameters, offer a cheaper and more efficientsolution. However, these methods suffer from significant shortcomings. Forexample, Undersampling drops a significant part of the input distribution perepoch while Oversampling repeats samples, causing overfitting. To address theseshortcomings, we introduce a new class-conditioned sampling method: BiasMimicking. The method is based on the observation that if a class $c$ biasdistribution, ie $P_D(B|Y=c)$ is mimicked across every $c^{prime}neq c$,then $Y$ and $B$ are statistically independent. Using this notion, BM, througha novel training procedure, ensures that the model is exposed to the entiredistribution per epoch without repeating samples. Consequently, Bias Mimickingimproves underrepresented groups' accuracy of sampling methods by 3% over fourbenchmarks while maintaining and sometimes improving performance overnonsampling methods. Code: url{", "output": "Bias Mimicking: A Simple Sampling Approach for Bias Mitigation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dermatological diseases are among the most common disorders worldwide. Thispaper presents the first study of the interpretability and imbalancedsemi-supervised learning of the multiclass intelligent skin diagnosis framework(ISDL) using 58,457 skin images with 10,857 unlabeled samples. Pseudo-labelledsamples from minority classes have a higher probability at each iteration ofclass-rebalancing self-training, thereby promoting the utilization of unlabeledsamples to solve the class imbalance problem. Our ISDL achieved a promisingperformance with an accuracy of 0.979, sensitivity of 0.975, specificity of0.973, macro-F1 score of 0.974 and area under the receiver operatingcharacteristic curve (AUC) of 0.999 for multi-label skin diseaseclassification. The Shapley Additive explanation (SHAP) method is combined withour ISDL to explain how the deep learning model makes predictions. This findingis consistent with the clinical diagnosis. We also proposed a samplingdistribution optimisation strategy to select pseudo-labelled samples in a moreeffective manner using ISDLplus. Furthermore, it has the potential to relievethe pressure placed on professional doctors, as well as help with practicalissues associated with a shortage of such doctors in rural areas.", "output": "An interpretable imbalanced semi-supervised deep learning framework for improving differential diagnosis of skin diseases."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual explanation of ``black-box'' models allows researchers in explainableartificial intelligence (XAI) to interpret the model's decisions in ahuman-understandable manner. In this paper, we propose interpretable classactivation mapping for tree crown detection (Crown-CAM) that overcomesinaccurate localization & computational complexity of previous methods whilegenerating reliable visual explanations for the challenging and dynamic problemof tree crown detection in aerial images. It consists of an unsupervisedselection of activation maps, computation of local score maps, andnon-contextual background suppression to efficiently provide fine-grainlocalization of tree crowns in scenarios with dense forest trees or sceneswithout tree crowns. Additionally, two Intersection over Union (IoU)-basedmetrics are introduced to effectively quantify both the accuracy and inaccuracyof generated explanations with respect to regions with or even without treecrowns in the image. Empirical evaluations demonstrate that the proposedCrown-CAM outperforms the Score-CAM, Augmented Score-CAM, and Eigen-CAM methodsby an average IoU margin of 8.7, 5.3, and 21.7 (and 3.3, 9.8, and 16.5)respectively in improving the accuracy (and decreasing inaccuracy) of visualexplanations on the challenging NEON tree crown dataset.", "output": "Crown-CAM: Interpretable Visual Explanations for Tree Crown Detection in Aerial Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The interaction and dimension of points are two important axes in designingpoint operators to serve hierarchical 3D models. Yet, these two axes areheterogeneous and challenging to fully explore. Existing works craft pointoperator under a single axis and reuse the crafted operator in all parts of 3Dmodels. This overlooks the opportunity to better combine point interactions anddimensions by exploiting varying geometry/density of 3D point clouds. In thiswork, we establish PIDS, a novel paradigm to jointly explore point interactionsand point dimensions to serve semantic segmentation on point cloud data. Weestablish a large search space to jointly consider versatile point interactionsand point dimensions. This supports point operators with variousgeometry/density considerations. The enlarged search space with heterogeneoussearch components calls for a better ranking of candidate models. To achievethis, we improve the search space exploration by leveraging predictor-basedNeural Architecture Search (NAS), and enhance the quality of prediction byassigning unique encoding to heterogeneous search components based on theirpriors. We thoroughly evaluate the networks crafted by PIDS on two semanticsegmentation benchmarks, showing ~1% mIOU improvement on SemanticKITTI andS3DIS over state-of-the-art 3D models.", "output": "PIDS: Joint Point Interaction-Dimension Search for 3D Point Cloud."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intermediate features of a pre-trained model have been shown informative formaking accurate predictions on downstream tasks, even if the model backbone iskept frozen. The key challenge is how to utilize these intermediate featuresgiven their gigantic amount. We propose visual query tuning (VQT), a simple yeteffective approach to aggregate intermediate features of Vision Transformers.Through introducing a handful of learnable ``query'' tokens to each layer, VQTleverages the inner workings of Transformers to ``summarize'' rich intermediatefeatures of each layer, which can then be used to train the prediction heads ofdownstream tasks. As VQT keeps the intermediate features intact and only learnsto combine them, it enjoys memory efficiency in training, compared to manyother parameter-efficient fine-tuning approaches that learn to adapt featuresand need back-propagation through the entire backbone. This also suggests thecomplementary role between VQT and those approaches in transfer learning.Empirically, VQT consistently surpasses the state-of-the-art approach thatutilizes intermediate features for transfer learning and outperforms fullfine-tuning in many cases. Compared to parameter-efficient approaches thatadapt features, VQT achieves much higher accuracy under memory constraints.Most importantly, VQT is compatible with these approaches to attain even higheraccuracy, making it a simple add-on to further boost transfer learning.", "output": "Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous manipulation systems operating in domains where human interventionis difficult or impossible (e.g., underwater, extraterrestrial or hazardousenvironments) require a high degree of robustness to sensing and communicationfailures. Crucially, motion planning and control algorithms require a stream ofaccurate joint angle data provided by joint encoders, the failure of which mayresult in an unrecoverable loss of functionality. In this paper, we present anovel method for retrieving the joint angles of a robot manipulator using onlya single RGB image of its current configuration, opening up an avenue forrecovering system functionality when conventional proprioceptive sensing isunavailable. Our approach, based on a distance-geometric representation of theconfiguration space, exploits the knowledge of a robot's kinematic model withthe goal of training a shallow neural network that performs a 2D-to-3Dregression of distances associated with detected structural keypoints. It isshown that the resulting Euclidean distance matrix uniquely corresponds to theobserved configuration, where joint angles can be recovered viamultidimensional scaling and a simple inverse kinematics procedure. We evaluatethe performance of our approach on real RGB images of a Franka Emika Pandamanipulator, showing that the proposed method is efficient and exhibits solidgeneralization ability. Furthermore, we show that our method can be easilycombined with a dense refinement technique to obtain superior results.", "output": "A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel framework to regularize Neural Radiance Field (NeRF) in afew-shot setting with a geometry-aware consistency regularization. The proposedapproach leverages a rendered depth map at unobserved viewpoint to warp sparseinput images to the unobserved viewpoint and impose them as pseudo groundtruths to facilitate learning of NeRF. By encouraging such geometry-awareconsistency at a feature-level instead of using pixel-level reconstructionloss, we regularize the NeRF at semantic and structural levels while allowingfor modeling view dependent radiance to account for color variations acrossviewpoints. We also propose an effective method to filter out erroneous warpedsolutions, along with training strategies to stabilize training duringoptimization. We show that our model achieves competitive results compared tostate-of-the-art few-shot NeRF models. Project page is available at", "output": "GeCoNeRF: Few-shot Neural Radiance Fields via Geometric Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "PhyCV is the first computer vision library which utilizes algorithms directlyderived from the equations of physics governing physical phenomena. Thealgorithms appearing in the current release emulate, in a metaphoric sense, thepropagation of light through a physical medium with natural and engineereddiffractive properties followed by coherent detection. Unlike traditionalalgorithms that are a sequence of hand-crafted empirical rules or deep learningalgorithms that are usually data-driven and computationally heavy,physics-inspired algorithms leverage physical laws of nature as blueprints forinventing algorithms. PhyCV features low-dimensionality and high- efficiency,making it ideal for edge computing applications. We demonstrate real-time videoprocessing on NVIDIA Jetson Nano using PhyCV. In addition, these algorithmshave the potential to be implemented in real physical devices for fast andefficient computation in the form of analog computing. The open-sourced code isavailable at ", "output": "PhyCV: The First Physics-inspired Computer Vision Library."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a method to formulate algorithm discovery as program search, andapply it to discover optimization algorithms for deep neural network training.We leverage efficient search techniques to explore an infinite and sparseprogram space. To bridge the large generalization gap between proxy and targettasks, we also introduce program selection and simplification strategies. Ourmethod discovers a simple and effective optimization algorithm, $textbf{Lion}$($textit{Evo$textbf{L}$ved S$textbf{i}$gn M$textbf{o}$me$textbf{n}$tum}$).It is more memory-efficient than Adam as it only keeps track of the momentum.Different from adaptive optimizers, its update has the same magnitude for eachparameter calculated through the sign operation. We compare Lion with widelyused optimizers, such as Adam and Adafactor, for training a variety of modelson different tasks. On image classification, Lion boosts the accuracy of ViT byup to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. Onvision-language contrastive learning, we achieve 88.3% $textit{zero-shot}$ and91.1% $textit{fine-tuning}$ accuracy on ImageNet, surpassing the previous bestresults by 2% and 0.1%, respectively. On diffusion models, Lion outperformsAdam by achieving a better FID score and reducing the training compute by up to2.3x. For autoregressive, masked language modeling, and fine-tuning, Lionexhibits a similar or better performance compared to Adam. Our analysis of Lionreveals that its performance gain grows with the training batch size. It alsorequires a smaller learning rate than Adam due to the larger norm of the updateproduced by the sign function. Additionally, we examine the limitations of Lionand identify scenarios where its improvements are small or not statisticallysignificant. The implementation of Lion is publicly available.", "output": "Symbolic Discovery of Optimization Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vision Transformer models process input images by dividing them into aspatially regular grid of equal-size patches. Conversely, Transformers wereoriginally introduced over natural language sequences, where each tokenrepresents a subword - a chunk of raw data of arbitrary size. In this work, weapply this approach to Vision Transformers by introducing a novel imagetokenization scheme, replacing the standard uniform grid with amixed-resolution sequence of tokens, where each token represents a patch ofarbitrary size. Using the Quadtree algorithm and a novel saliency scorer, weconstruct a patch mosaic where low-saliency areas of the image are processed inlow resolution, routing more of the model's capacity to important imageregions. Using the same architecture as vanilla ViTs, our Quadformer modelsachieve substantial accuracy gains on image classification when controlling forthe computational budget. Code and models are publicly available at .", "output": "Vision Transformers with Mixed-Resolution Tokenization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper explores the properties of the plain Vision Transformer (ViT) forWeakly-supervised Semantic Segmentation (WSSS). The class activation map (CAM)is of critical importance for understanding a classification network andlaunching WSSS. We observe that different attention heads of ViT focus ondifferent image areas. Thus a novel weight-based method is proposed toend-to-end estimate the importance of attention heads, while the self-attentionmaps are adaptively fused for high-quality CAM results that tend to have morecomplete objects. Besides, we propose a ViT-based gradient clipping decoder foronline retraining with the CAM results to complete the WSSS task. We name thisplain Transformer-based Weakly-supervised learning framework WeakTr. Itachieves the state-of-the-art WSSS performance on standard benchmarks, i.e.,78.4% mIoU on the val set of PASCAL VOC 2012 and 50.3% mIoU on the val set ofCOCO 2014. Code is available at ", "output": "WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce Astroformer, a method to learnfrom less amount of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving 94.86% top-$1$ accuracy,beating the current state-of-the-art for this task by 4.62%. Furthermore, thisapproach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. Wealso find that models and training methods used for larger datasets would oftennot work very well in the low-data regime.", "output": "Astroformer: More Data Might not be all you need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Segmenting anything is a ground-breaking step toward artificial generalintelligence, and the Segment Anything Model (SAM) greatly fosters thefoundation models for computer vision. We could not be more excited to probethe performance traits of SAM. In particular, exploring situations in which SAMdoes not perform well is interesting. In this report, we choose three concealedscenes, i.e., camouflaged animals, industrial defects, and medical lesions, toevaluate SAM under unprompted settings. Our main observation is that SAM looksunskilled in concealed scenes.", "output": "SAM Struggles in Concealed Scenes -- Empirical Study on \"Segment Anything\"."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Transformers have shown promising performance in various visiontasks. However, the high costs of global self-attention remain challenging forTransformers, especially for high-resolution vision tasks. Local self-attentionruns attention computation within a limited region for the sake of efficiency,resulting in insufficient context modeling as their receptive fields are small.In this work, we introduce two new attention modules to enhance the globalmodeling capability of the hierarchical vision transformer, namely, randomsampling windows (RS-Win) and important region windows (IR-Win). Specifically,RS-Win sample random image patches to compose the window, following a uniformdistribution, i.e., the patches in RS-Win can come from any position in theimage. IR-Win composes the window according to the weights of the image patchesin the attention map. Notably, RS-Win is able to capture global informationthroughout the entire model, even in earlier, high-resolution stages. IR-Winenables the self-attention module to focus on important regions of the imageand capture more informative features. Incorporated with these designs,RSIR-Win Transformer demonstrates competitive performance on common visiontasks.", "output": "RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Breast cancer has reached the highest incidence rate worldwide among allmalignancies since 2020. Breast imaging plays a significant role in earlydiagnosis and intervention to improve the outcome of breast cancer patients. Inthe past decade, deep learning has shown remarkable progress in breast cancerimaging analysis, holding great promise in interpreting the rich informationand complex context of breast imaging modalities. Considering the rapidimprovement in the deep learning technology and the increasing severity ofbreast cancer, it is critical to summarize past progress and identify futurechallenges to be addressed. In this paper, we provide an extensive survey ofdeep learning-based breast cancer imaging research, covering studies onmammogram, ultrasound, magnetic resonance imaging, and digital pathology imagesover the past decade. The major deep learning methods, publicly availabledatasets, and applications on imaging-based screening, diagnosis, treatmentresponse prediction, and prognosis are described in detail. Drawn from thefindings of this survey, we present a comprehensive discussion of thechallenges and potential avenues for future research in deep learning-basedbreast cancer imaging.", "output": "Deep Learning in Breast Cancer Imaging: A Decade of Progress and Future Directions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rapid development of deep learning, object detection and trackingplay a vital role in today's society. Being able to identify and track all thepedestrians in the dense crowd scene with computer vision approaches is atypical challenge in this field, also known as the Multiple Object Tracking(MOT) challenge. Modern trackers are required to operate on more and morecomplicated scenes. According to the MOT20 challenge result, the pedestrian is4 times denser than the MOT17 challenge. Hence, improving the ability to detectand track in extremely crowded scenes is the aim of this work. In light of theocclusion issue with the human body, the heads are usually easier to identify.In this work, we have designed a joint head and body detector in an anchor-freestyle to boost the detection recall and precision performance of pedestrians inboth small and medium sizes. Innovatively, our model does not requireinformation on the statistical head-body ratio for common pedestrians detectionfor training. Instead, the proposed model learns the ratio dynamically. Toverify the effectiveness of the proposed model, we evaluate the model withextensive experiments on different datasets, including MOT20, Crowdhuman, andHT21 datasets. As a result, our proposed method significantly improves both therecall and precision rate on small & medium sized pedestrians and achievesstate-of-the-art results in these challenging datasets.", "output": "Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cross-modal distillation has been widely used to transfer knowledge acrossdifferent modalities, enriching the representation of the target unimodal one.Recent studies highly relate the temporal synchronization between vision andsound to the semantic consistency for cross-modal distillation. However, suchsemantic consistency from the synchronization is hard to guarantee inunconstrained videos, due to the irrelevant modality noise and differentiatedsemantic correlation. To this end, we first propose a textit{Modality NoiseFilter} (MNF) module to erase the irrelevant noise in teacher modality withcross-modal context. After this purification, we then design atextit{Contrastive Semantic Calibration} (CSC) module to adaptively distilluseful knowledge for target modality, by referring to the differentiatedsample-wise semantic correlation in a contrastive fashion. Extensiveexperiments show that our method could bring a performance boost compared withother distillation methods in both visual action recognition and videoretrieval task. We also extend to the audio tagging task to prove thegeneralization of our method. The source code is available athref{", "output": "Robust Cross-Modal Knowledge Distillation for Unconstrained Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most invariance-based self-supervised methods rely on single object-centricimages (e.g., ImageNet images) for pretraining, learning invariantrepresentations from geometric transformations. However, when images are notobject-centric, the semantics of the image can be significantly altered due tocropping. Furthermore, as the model learns geometrically insensitive features,it may struggle to capture location information. For this reason, we propose aGeometric Transformation Sensitive Architecture that learns features sensitiveto geometric transformations, specifically four-fold rotation, random crop, andmulti-crop. Our method encourages the student to learn sensitive features byusing targets that are sensitive to those transforms via pooling and rotatingof the teacher feature map and predicting rotation. Additionally, sincetraining insensitively to multi-crop can capture long-term dependencies, we usepatch correspondence loss to train the model sensitively while capturinglong-term dependencies. Our approach demonstrates improved performance whenusing non-object-centric images as pretraining data compared to other methodsthat learn geometric transformation-insensitive representations. We surpass theDINO[citet{caron2021emerging}] baseline in tasks including imageclassification, semantic segmentation, detection, and instance segmentationwith improvements of 6.1 $Acc$, 3.3 $mIoU$, 3.4 $AP^b$, and 2.7 $AP^m$. Codeand pretrained models are publicly available at:", "output": "Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computed tomography (CT) has been used worldwide for decades as one of themost important non-invasive tests in assisting diagnosis. However, the ionizingnature of X-ray exposure raises concerns about potential health risks such ascancer. The desire for lower radiation dose has driven researchers to improvethe reconstruction quality, especially by removing noise and artifacts.Although previous studies on low-dose computed tomography (LDCT) denoising havedemonstrated the potential of learning-based methods, most of them weredeveloped on the simulated data collected using Radon transform. However, thereal-world scenario significantly differs from the simulation domain, and thejoint optimization of denoising with the modern CT image reconstructionpipeline is still missing. In this paper, for the commercially availablethird-generation multi-slice spiral CT scanners, we propose a two-stage methodthat better exploits the complete reconstruction pipeline for LDCT denoisingacross different domains. Our method makes good use of the high redundancy ofboth the multi-slice projections and the volumetric reconstructions whileavoiding the collapse of information in conventional cascaded frameworks. Thededicated design also provides a clearer interpretation of the workflow.Through extensive evaluations, we demonstrate its superior performance againststate-of-the-art methods.", "output": "Multi-frame-based Cross-domain Image Denoising for Low-dose Computed Tomography."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The efficacy of segmentation algorithms is frequently compromised bytopological errors like overlapping regions, disrupted connections, and voids.To tackle this problem, we introduce a novel loss function, namelyTopology-Aware Focal Loss (TAFL), that incorporates the conventional Focal Losswith a topological constraint term based on the Wasserstein distance betweenthe ground truth and predicted segmentation masks' persistence diagrams. Byenforcing identical topology as the ground truth, the topological constraintcan effectively resolve topological errors, while Focal Loss tackles classimbalance. We begin by constructing persistence diagrams from filtered cubicalcomplexes of the ground truth and predicted segmentation masks. We subsequentlyutilize the Sinkhorn-Knopp algorithm to determine the optimal transport planbetween the two persistence diagrams. The resultant transport plan minimizesthe cost of transporting mass from one distribution to the other and provides amapping between the points in the two persistence diagrams. We then compute theWasserstein distance based on this travel plan to measure the topologicaldissimilarity between the ground truth and predicted masks. We evaluate ourapproach by training a 3D U-Net with the MICCAI Brain Tumor Segmentation(BraTS) challenge validation dataset, which requires accurate segmentation of3D MRI scans that integrate various modalities for the precise identificationand tracking of malignant brain tumors. Then, we demonstrate that the qualityof segmentation performance is enhanced by regularizing the focal loss throughthe addition of a topological constraint as a penalty term.", "output": "Topology-Aware Focal Loss for 3D Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the growing demand for tuning foundation vision transformers (FViTs)on downstream tasks, fully unleashing FViTs' potential under data-limitedscenarios (e.g., few-shot tuning) remains a challenge due to FViTs' data-hungrynature. Common data augmentation techniques fall short in this context due tothe limited features contained in the few-shot tuning data. To tackle thischallenge, we first identify an opportunity for FViTs in few-shot tuning:pretrained FViTs themselves have already learned highly representative featuresfrom large-scale pretraining data, which are fully preserved during widely usedparameter-efficient tuning. We thus hypothesize that leveraging those learnedfeatures to augment the tuning data can boost the effectiveness of few-shotFViT tuning. To this end, we propose a framework called Hint-based DataAugmentation (Hint-Aug), which aims to boost FViT in few-shot tuning byaugmenting the over-fitted parts of tuning samples with the learned features ofpretrained FViTs. Specifically, Hint-Aug integrates two key enablers: (1) anAttentive Over-fitting Detector (AOD) to detect over-confident patches offoundation ViTs for potentially alleviating their over-fitting on the few-shottuning data and (2) a Confusion-based Feature Infusion (CFI) module to infuseeasy-to-confuse features from the pretrained FViTs with the over-confidentpatches detected by the above AOD in order to enhance the feature diversityduring tuning. Extensive experiments and ablation studies on five datasets andthree parameter-efficient tuning techniques consistently validate Hint-Aug'seffectiveness: 0.04% ~ 32.91% higher accuracy over the state-of-the-art (SOTA)data augmentation method under various low-shot settings. For example, on thePet dataset, Hint-Aug achieves a 2.22% higher accuracy with 50% less trainingdata over SOTA data augmentation methods.", "output": "Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural radiance field (NeRF) has shown remarkable performance in generatingphoto-realistic novel views. Since the emergence of NeRF, many studies havebeen conducted, among which managing features with explicit structures such asgrids has achieved exceptionally fast training by reducing the complexity ofmultilayer perceptron (MLP) networks. However, storing features in dense gridsrequires significantly large memory space, which leads to memory bottleneck incomputer systems and thus large training time. To address this issue, in thiswork, we propose MF-NeRF, a memory-efficient NeRF framework that employs amixed-feature hash table to improve memory efficiency and reduce training timewhile maintaining reconstruction quality. We first design a mixed-feature hashtable to adaptively mix part of multi-level feature grids into one and map itto a single hash table. Following that, in order to obtain the correct index ofa grid point, we further design an index transformation method that transformsindices of an arbitrary level grid to those of a canonical grid. Extensiveexperiments benchmarking with state-of-the-art Instant-NGP, TensoRF, and DVGO,indicate our MF-NeRF could achieve the fastest training time on the same GPUhardware with similar or even higher reconstruction quality. Source code isavailable at ", "output": "MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce CN-DHF (Compact Neural Double-Height-Field), a novel hybridneural implicit 3D shape representation that is dramatically more compact thanthe current state of the art. Our representation leverages Double-Height-Field(DHF) geometries, defined as closed shapes bounded by a pair of oppositelyoriented height-fields that share a common axis, and leverages the followingkey observations: DHFs can be compactly encoded as 2D neural implicits thatcapture the maximal and minimal heights along the DHF axis; and typical closed3D shapes are well represented as intersections of a very small number (threeor fewer) of DHFs. We represent input geometries as CNDHFs by first computingthe set of DHFs whose intersection well approximates each input shape, and thenencoding these DHFs via neural fields. Our approach delivers high-qualityreconstructions, and reduces the reconstruction error by a factor of 2:5 onaverage compared to the state-of-the-art, given the same parameter count orstorage capacity. Compared to the best-performing alternative, our methodproduced higher accuracy models on 94% of the 400 input shape and parametercount combinations tested.", "output": "CN-DHF: Compact Neural Double Height-Field Representations of 3D Shapes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of Vision-Language Pre-training Models (VLPMs)represented by CLIP and ALIGN, significant breakthroughs have been achieved forassociation-based visual tasks such as image classification and image-textretrieval by the zero-shot capability of CLIP without fine-tuning. However,CLIP is hard to apply to generation-based tasks. This is due to the lack ofdecoder architecture and pre-training tasks for generation. Although previousworks have created generation capacity for CLIP through additional languagemodels, a modality gap between the CLIP representations of different modalitiesand the inability of CLIP to model the offset of this gap, which fails theconcept to transfer across modalities. To solve the problem, we try to mapimages/videos to the language modality and generate captions from the languagemodality. In this paper, we propose the K-nearest-neighbor Cross-modalityMapping (Knight), a zero-shot method from association to generation. Withtext-only unsupervised training, Knight achieves state-of-the-art performancein zero-shot methods for image captioning and video captioning. Our code isavailable at ", "output": "From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current metric learning approaches for image retrieval are usually based onlearning a space of informative latent representations where simple approachessuch as the cosine distance will work well. Recent state of the art methodssuch as HypViT move to more complex embedding spaces that may yield betterresults but are harder to scale to production environments. In this work, wefirst construct a simpler model based on triplet loss with hard negativesmining that performs at the state of the art level but does not have thesedrawbacks. Second, we introduce a novel approach for image retrievalpostprocessing called Siamese Transformer for Image Retrieval (STIR) thatreranks several top outputs in a single forward pass. Unlike previouslyproposed Reranking Transformers, STIR does not rely on global/local featureextraction and directly compares a query image and a retrieved candidate onpixel level with the usage of attention mechanism. The resulting approachdefines a new state of the art on standard image retrieval datasets: StanfordOnline Products and DeepFashion In-shop. We also release the source code at an interactive demo of our approach at", "output": "STIR: Siamese Transformer for Image Retrieval Postprocessing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Today, state-of-the-art deep neural networks that process events firstconvert them into dense, grid-like input representations before using anoff-the-shelf network. However, selecting the appropriate representation forthe task traditionally requires training a neural network for eachrepresentation and selecting the best one based on the validation score, whichis very time-consuming. In this work, we eliminate this bottleneck by selectingthe best representation based on the Gromov-Wasserstein Discrepancy (GWD)between the raw events and their representation. It is approximately 200 timesfaster to compute than training a neural network and preserves the taskperformance ranking of event representations across multiple representations,network backbones, and datasets. This means that finding a representation witha high task score is equivalent to finding a representation with a low GWD. Weuse this insight to, for the first time, perform a hyperparameter search on alarge family of event representations, revealing new and powerfulrepresentations that exceed the state-of-the-art. On object detection, ouroptimized representation outperforms existing representations by 1.9% mAP onthe 1 Mpx dataset and 8.6% mAP on the Gen1 dataset and even outperforms thestate-of-the-art by 1.8% mAP on Gen1 and state-of-the-art feed-forward methodsby 6.0% mAP on the 1 Mpx dataset. This work opens a new unexplored field ofexplicit representation optimization for event-based learning methods.", "output": "From Chaos Comes Order: Ordering Event Representations for Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning-based multi-view stereo has emerged as a powerful paradigm forreconstructing the complete geometrically-detailed objects from multi-views.Most of the existing approaches only estimate the pixel-wise depth value byminimizing the gap between the predicted point and the intersection of ray andsurface, which usually ignore the surface topology. It is essential to thetextureless regions and surface boundary that cannot be properly reconstructed.To address this issue, we suggest to take advantage of point-to-surfacedistance so that the model is able to perceive a wider range of surfaces. Tothis end, we predict the distance volume from cost volume to estimate thesigned distance of points around the surface. Our proposed RA-MVSNet ispatch-awared, since the perception range is enhanced by associatinghypothetical planes with a patch of surface. Therefore, it could increase thecompletion of textureless regions and reduce the outliers at the boundary.Moreover, the mesh topologies with fine details can be generated by theintroduced distance volume. Comparing to the conventional deep learning-basedmulti-view stereo methods, our proposed RA-MVSNet approach obtains morecomplete reconstruction results by taking advantage of signed distancesupervision. The experiments on both the DTU and Tanks & Temples datasetsdemonstrate that our proposed approach achieves the state-of-the-art results.", "output": "Multi-View Stereo Representation Revisit: Region-Aware MVSNet."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Matrix Factorization (MF) on large scale data takes substantial time on aCentral Processing Unit (CPU). While Graphical Processing Unit (GPU)s couldexpedite the computation of MF, the available memory on a GPU is finite.Leveraging GPUs require alternative techniques that allow not only parallelismbut also address memory limitations. Synchronization between computation units,isolation of data related to a computational unit, sharing of data betweencomputational units and identification of independent tasks among computationalunits are some of the challenges while leveraging GPUs for MF. We propose ablock based approach to matrix factorization using Stochastic Gradient Descent(SGD) that is aimed at accelerating MF on GPUs. The primary motivation for theapproach is to make it viable to factorize extremely large data sets on limitedhardware without having to compromise on results. The approach addressesfactorization of large scale data by identifying independent blocks, each ofwhich are factorized in parallel using multiple computational units. Theapproach can be extended to one or more GPUs and even to distributed systems.The RMSE results of the block based approach are with in acceptable delta incomparison to the results of CPU based variant and multi-threaded CPU variantof similar SGD kernel implementation. The advantage, of the block basedvariant, in-terms of speed are significant in comparison to other variants.", "output": "GPU accelerated matrix factorization of large scale data using block based approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Brain tumor is one of the leading causes of cancer death. The high-gradebrain tumors are easier to recurrent even after standard treatment. Therefore,developing a method to predict brain tumor recurrence location plays animportant role in the treatment planning and it can potentially prolongpatient's survival time. There is still little work to deal with this issue. Inthis paper, we present a deep learning-based brain tumor recurrence locationprediction network. Since the dataset is usually small, we propose to usetransfer learning to improve the prediction. We first train a multi-modal braintumor segmentation network on the public dataset BraTS 2021. Then, thepre-trained encoder is transferred to our private dataset for extracting therich semantic features. Following that, a multi-scale multi-channel featurefusion model and a nonlinear correlation learning module are developed to learnthe effective features. The correlation between multi-channel features ismodeled by a nonlinear equation. To measure the similarity between thedistributions of original features of one modality and the estimated correlatedfeatures of another modality, we propose to use Kullback-Leibler divergence.Based on this divergence, a correlation loss function is designed to maximizethe similarity between the two feature distributions. Finally, two decoders areconstructed to jointly segment the present brain tumor and predict its futuretumor recurrence location. To the best of our knowledge, this is the first workthat can segment the present tumor and at the same time predict future tumorrecurrence location, making the treatment planning more efficient and precise.The experimental results demonstrated the effectiveness of our proposed methodto predict the brain tumor recurrence location from the limited dataset.", "output": "Prediction of brain tumor recurrence location based on multi-modal fusion and nonlinear correlation learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Increased capabilities such as recognition and self-adaptability are nowrequired from IoT applications. While IoT node power consumption is a majorconcern for these applications, cloud-based processing is becomingunsustainable due to continuous sensor or image data transmission over thewireless network. Thus optimized ML capabilities and data transfers should beintegrated in the IoT node. Moreover, IoT applications are torn betweensporadic data-logging and energy-hungry data processing (e.g. imageclassification). Thus, the versatility of the node is key in addressing thiswide diversity of energy and processing needs. This paper presents SamurAI, aversatile IoT node bridging this gap in processing and in energy by leveragingtwo on-chip sub-systems: a low power, clock-less, event-drivenAlways-Responsive (AR) part and an energy-efficient On-Demand (OD) part. ARcontains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a207ns wake-up time optimized for sporadic computing, while OD combines adeep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complextasks up to 36GOPS. This architecture partitioning achieves best in classversatility metrics such as peak performance to idle power ratio. On anapplicative classification scenario, it demonstrates system power gains, up to3.5x compared to cloud-based processing, and thus extended battery lifetime.", "output": "SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To improve the recognition ability of computer-aided breast massclassification among mammographic images, in this work we explore thestate-of-the-art classification networks to develop an ensemble mechanism.First, the regions of interest (ROIs) are obtained from the original dataset,and then three models, i.e., XceptionNet, DenseNet, and EfficientNet, aretrained individually. After training, we ensemble the mechanism by summing theprobabilities outputted from each network which enhances the performance up to5%. The scheme has been validated on a public dataset and we achieved accuracy,precision, and recall 88%, 85%, and 76% respectively.", "output": "Ensemble CNNs for Breast Tumor Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The advances of Generative AI models with interactive capabilities over thepast few years offer unique opportunities for socioeconomic mobility. Theirpotential for scalability, accessibility, affordability, personalizing andconvenience sets a first-class opportunity for poverty-stricken countries toadapt and modernize their educational order. As a result, this position papermakes the case for an educational policy framework that would succeed in thistransformation by prioritizing vocational and technical training over academiceducation in sub-Saharan African countries. We highlight substantialapplications of Large Language Models, tailor-made to their respective culturalbackground(s) and needs, that would reinforce their systemic decolonization.Lastly, we provide specific historical examples of diverse states successfullyimplementing such policies in the elementary steps of their socioeconomictransformation, in order to corroborate our proposal to sub-Saharan Africancountries to follow their lead.", "output": "ChatGPT is all you need to decolonize sub-Saharan Vocational Education."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Lane change (LC) is a continuous and complex operation process. Accuratelydetecting and predicting LC processes can help traffic participants betterunderstand their surrounding environment, recognize potential LC safetyhazards, and improve traffic safety. This present paper focuses on LCprocesses, developing an LC intention recognition (LC-IR) model and an LCstatus prediction (LC-SP) model. A novel ensemble temporal convolutionalnetwork with Long Short-Term Memory units (TCN-LSTM) is first proposed tocapture long-range dependencies in sequential data. Then, three multi-taskmodels (MTL-LSTM, MTL-TCN, MTL-TCN -LSTM) are developed to capture theintrinsic relationship among output indicators. Furthermore, a unified modelingframework for LC intention recognition and driving status prediction (LC-IR-SP)is developed. To validate the performance of the proposed models, a totalnumber of 1023 vehicle trajectories is extracted from the CitySim dataset. ThePearson coefficient is employed to determine the related indicators. Theresults indicate that using150 frames as input length, the TCN-LSTM model with96.67% accuracy outperforms TCN and LSTM models in LC intention classificationand provides more balanced results for each class. Three proposed multi-taskinglearning models provide markedly increased performance compared tocorresponding single-task models, with an average reduction of 24.24% and22.86% in the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE),respectively. The developed LC-IR-SP model has promising applications forautonomous vehicles to identity lane change behaviors, calculate a real-timetraffic conflict index and improve vehicle control strategies.", "output": "A Unified Approach to Lane Change Intention Recognition and Driving Status Prediction through TCN-LSTM and Multi-Task Learning Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While Large Language Models (LLMs) have shown exceptional performance invarious tasks, their (arguably) most prominent drawback is generatinginaccurate or false information with a confident tone. In this paper, wehypothesize that the LLM's internal state can be used to reveal thetruthfulness of a statement. Therefore, we introduce a simple yet effectivemethod to detect the truthfulness of LLM-generated statements, which utilizesthe LLM's hidden layer activations to determine the veracity of statements. Totrain and evaluate our method, we compose a dataset of true and falsestatements in six different topics. A classifier is trained to detect whichstatement is true or false based on an LLM's activation values. Specifically,the classifier receives as input the activation values from the LLM for each ofthe statements in the dataset. Our experiments demonstrate that our method fordetecting statement veracity significantly outperforms even few-shot promptingmethods, highlighting its potential to enhance the reliability of LLM-generatedcontent and its practical applicability in real-world scenarios.", "output": "The Internal State of an LLM Knows When its Lying."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in immunomics have shown that T-cell receptor (TCR)signatures can accurately predict active or recent infection by leveraging thehigh specificity of TCR binding to disease antigens. However, the extremediversity of the adaptive immune repertoire presents challenges in reliablyidentifying disease-specific TCRs. Population genetics and sequencing depth canalso have strong systematic effects on repertoires, which requires carefulconsideration when developing diagnostic models. We present an Adaptive ImmuneRepertoire-Invariant Variational Autoencoder (AIRIVA), a generative model thatlearns a low-dimensional, interpretable, and compositional representation ofTCR repertoires to disentangle such systematic effects in repertoires. We applyAIRIVA to two infectious disease case-studies: COVID-19 (natural infection andvaccination) and the Herpes Simplex Virus (HSV-1 and HSV-2), and empiricallyshow that we can disentangle the individual disease signals. We furtherdemonstrate AIRIVA's capability to: learn from unlabelled samples; generatein-silico TCR repertoires by intervening on the latent factors; and identifydisease-associated TCRs validated using TCR annotations from external assaydata.", "output": "AIRIVA: A Deep Generative Model of Adaptive Immune Repertoires."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the integration of artificial intelligence (AI) and cloudcomputing has emerged as a promising avenue for addressing the growingcomputational demands of AI applications. This paper presents a comprehensivestudy of scalable, distributed AI frameworks leveraging cloud computing forenhanced deep learning performance and efficiency. We first provide an overviewof popular AI frameworks and cloud services, highlighting their respectivestrengths and weaknesses. Next, we delve into the critical aspects of datastorage and management in cloud-based AI systems, discussing datapreprocessing, feature engineering, privacy, and security. We then exploreparallel and distributed training techniques for AI models, focusing on modelpartitioning, communication strategies, and cloud-based training architectures.In subsequent chapters, we discuss optimization strategies for AI workloadsin the cloud, covering load balancing, resource allocation, auto-scaling, andperformance benchmarking. We also examine AI model deployment and serving inthe cloud, outlining containerization, serverless deployment options, andmonitoring best practices. To ensure the cost-effectiveness of cloud-based AIsolutions, we present a thorough analysis of costs, optimization strategies,and case studies showcasing successful deployments. Finally, we summarize thekey findings of this study, discuss the challenges and limitations ofcloud-based AI, and identify emerging trends and future research opportunitiesin the field.", "output": "Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose TR0N, a highly general framework to turn pre-trained unconditionalgenerative models, such as GANs and VAEs, into conditional models. Theconditioning can be highly arbitrary, and requires only a pre-trained auxiliarymodel. For example, we show how to turn unconditional models intoclass-conditional ones with the help of a classifier, and also intotext-to-image models by leveraging CLIP. TR0N learns a lightweight stochasticmapping which \"translates\" between the space of conditions and the latent spaceof the generative model, in such a way that the generated latent corresponds toa data sample satisfying the desired condition. The translated latent samplesare then further improved upon through Langevin dynamics, enabling us to obtainhigher-quality data samples. TR0N requires no training data nor fine-tuning,yet can achieve a zero-shot FID of 10.9 on MS-COCO, outperforming competingalternatives not only on this metric, but also in sampling speed -- all whileretaining a much higher level of generality. Our code is available at", "output": "TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Gradient-boosted decision trees (GBDT) are widely used and highly effectivemachine learning approach for tabular data modeling. However, their complexstructure may lead to low robustness against small covariate perturbation inunseen data. In this study, we apply one-hot encoding to convert a GBDT modelinto a linear framework, through encoding of each tree leaf to one dummyvariable. This allows for the use of linear regression techniques, plus a novelrisk decomposition for assessing the robustness of a GBDT model againstcovariate perturbations. We propose to enhance the robustness of GBDT models byrefitting their linear regression forms with $L_1$ or $L_2$ regularization.Theoretical results are obtained about the effect of regularization on themodel performance and robustness. It is demonstrated through numericalexperiments that the proposed regularization approach can enhance therobustness of the one-hot-encoded GBDT models.", "output": "Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot Encoding and Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantifying the phagocytosis of dynamic, unstained cells is essential forevaluating neurodegenerative diseases. However, measuring rapid cellinteractions and distinguishing cells from backgrounds make this taskchallenging when processing time-lapse phase-contrast video microscopy. In thisstudy, we introduce a fully automated, scalable, and versatile realtimeframework for quantifying and analyzing phagocytic activity. Our proposedpipeline can process large data-sets and includes a data quality verificationmodule to counteract potential perturbations such as microscope movements andframe blurring. We also propose an explainable cell segmentation module toimprove the interpretability of deep learning methods compared to black-boxalgorithms. This includes two interpretable deep learning capabilities: visualexplanation and model simplification. We demonstrate that interpretability indeep learning is not the opposite of high performance, but rather providesessential deep learning algorithm optimization insights and solutions.Incorporating interpretable modules results in an efficient architecture designand optimized execution time. We apply this pipeline to quantify and analyzemicroglial cell phagocytosis in frontotemporal dementia (FTD) and obtainstatistically reliable results showing that FTD mutant cells are larger andmore aggressive than control cells. To stimulate translational approaches andfuture research, we release an open-source pipeline and a unique microglialcells phagocytosis dataset for immune system characterization inneurodegenerative diseases research. This pipeline and dataset willconsistently crystallize future advances in this field, promoting thedevelopment of efficient and effective interpretable algorithms dedicated tothis critical domain. ", "output": "Phagocytosis Unveiled: A Scalable and Interpretable Deep learning Framework for Neurodegenerative Disease Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sequential decision making algorithms often struggle to leverage differentsources of unstructured offline interaction data. Imitation learning (IL)methods based on supervised learning are robust, but require optimaldemonstrations, which are hard to collect. Offline goal-conditionedreinforcement learning (RL) algorithms promise to learn from sub-optimal data,but face optimization challenges especially with high-dimensional data. Tobridge the gap between IL and RL, we introduce Distance Weighted SupervisedLearning or DWSL, a supervised method for learning goal-conditioned policiesfrom offline data. DWSL models the entire distribution of time-steps betweenstates in offline data with only supervised learning, and uses thisdistribution to approximate shortest path distances. To extract a policy, weweight actions by their reduction in distance estimates. Theoretically, DWSLconverges to an optimal policy constrained to the data distribution, anattractive property for offline learning, without any bootstrapping. Across alldatasets we test, DWSL empirically maintains behavior cloning as a lower boundwhile still exhibiting policy improvement. In high-dimensional image domains,DWSL surpasses the performance of both prior goal-conditioned IL and RLalgorithms. Visualizations and code can be found at .", "output": "Distance Weighted Supervised Learning for Offline Interaction Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As human-robot interaction (HRI) systems advance, so does the difficulty ofevaluating and understanding the strengths and limitations of these systems indifferent environments and with different users. To this end, previous methodshave algorithmically generated diverse scenarios that reveal system failures ina shared control teleoperation task. However, these methods require directlyevaluating generated scenarios by simulating robot policies and human actions.The computational cost of these evaluations limits their applicability in morecomplex domains. Thus, we propose augmenting scenario generation systems withsurrogate models that predict both human and robot behaviors. In the sharedcontrol teleoperation domain and a more complex shared workspace collaborationtask, we show that surrogate assisted scenario generation efficientlysynthesizes diverse datasets of challenging scenarios. We demonstrate thatthese failures are reproducible in real-world interactions.", "output": "Surrogate Assisted Generation of Human-Robot Interaction Scenarios."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a new computational framework for estimating parameters ingeneralized generalized linear models (GGLM), a class of models that extendsthe popular generalized linear models (GLM) to account for dependencies amongobservations in spatio-temporal data. The proposed approach uses a monotoneoperator-based variational inequality method to overcome non-convexity inparameter estimation and provide guarantees for parameter recovery. The resultscan be applied to GLM and GGLM, focusing on spatio-temporal models. We alsopresent online instance-based bounds using martingale concentrationsinequalities. Finally, we demonstrate the performance of the algorithm usingnumerical simulations and a real data example for wildfire incidents.", "output": "Generalized generalized linear models: Convex estimation and online bounds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a physics-informed neural network (PINN) approach formonitoring the health of diesel engines. The aim is to evaluate the enginedynamics, identify unknown parameters in a \"mean value\" model, and anticipatemaintenance requirements. The PINN model is applied to diesel engines with avariable-geometry turbocharger and exhaust gas recirculation, using measurementdata of selected state variables. The results demonstrate the ability of thePINN model to predict simultaneously both unknown parameters and dynamicsaccurately with both clean and noisy data, and the importance of theself-adaptive weight in the loss function for faster convergence. The inputdata for these simulations are derived from actual engine running conditions,while the outputs are simulated data, making this a practical case study ofPINN's ability to predict real-world dynamical systems. The mean value model ofthe diesel engine incorporates empirical formulae to represent certain states,but these formulae may not be generalizable to other engines. To address this,the study considers the use of deep neural networks (DNNs) in addition to thePINN model. The DNNs are trained using laboratory test data and are used tomodel the engine-specific empirical formulae in the mean value model, allowingfor a more flexible and adaptive representation of the engine's states. Inother words, the mean value model uses both the PINN model and the DNNs torepresent the engine's states, with the PINN providing a physics-basedunderstanding of the engine's overall dynamics and the DNNs offering a moreengine-specific and adaptive representation of the empirical formulae. Bycombining these two approaches, the study aims to offer a comprehensive andversatile approach to monitoring the health and performance of diesel engines.", "output": "Physics-informed neural networks for predicting gas flow dynamics and unknown parameters in diesel engines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the most challenging problems in fingerprint recognition continues tobe establishing the identity of a suspect associated with partial and smudgyfingerprints left at a crime scene (i.e., latent prints or fingermarks).Despite the success of fixed-length embeddings for rolled and slap fingerprintrecognition, the features learned for latent fingerprint matching have mostlybeen limited to local minutiae-based embeddings and have not directly leveragedglobal representations for matching. In this paper, we combine globalembeddings with local embeddings for state-of-the-art latent to rolled matchingaccuracy with high throughput. The combination of both local and globalrepresentations leads to improved recognition accuracy across NIST SD 27, NISTSD 302, MSP, MOLF DB1/DB4, and MOLF DB2/DB4 latent fingerprint datasets forboth closed-set (84.11%, 54.36%, 84.35%, 70.43%, 62.86% rank-1 retrieval rate,respectively) and open-set (0.50, 0.74, 0.44, 0.60, 0.68 FNIR at FPIR=0.02,respectively) identification scenarios on a gallery of 100K rolledfingerprints. Not only do we fuse the complimentary representations, we alsouse the local features to guide the global representations to focus ondiscriminatory regions in two fingerprint images to be compared. This leads toa multi-stage matching paradigm in which subsets of the retrieved candidatelists for each probe image are passed to subsequent stages for furtherprocessing, resulting in a considerable reduction in latency (requiring just0.068 ms per latent to rolled comparison on a AMD EPYC 7543 32-Core Processor,roughly 15K comparisons per second). Finally, we show the generalizability ofthe fused representations for improving authentication accuracy across severalrolled, plain, and contactless fingerprint datasets.", "output": "Latent Fingerprint Recognition: Fusion of Local and Global Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, a computationally efficient data-driven hybrid automaton modelis proposed to capture unknown complex dynamical system behaviors usingmultiple neural networks. The sampled data of the system is divided by validpartitions into groups corresponding to their topologies and based on which,transition guards are defined. Then, a collection of small-scale neuralnetworks that are computationally efficient are trained as the local dynamicaldescription for their corresponding topologies. After modeling the system witha neural-network-based hybrid automaton, the set-valued reachability analysiswith low computation cost is provided based on interval analysis and a splitand combined process. At last, a numerical example of the limit cycle ispresented to illustrate that the developed models can significantly reduce thecomputational cost in reachable set computation without sacrificing anymodeling precision.", "output": "A Data-Driven Hybrid Automaton Framework to Modeling Complex Dynamical Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural network model compression techniques can address the computation issueof deep neural networks on embedded devices in industrial systems. Theguaranteed output error computation problem for neural network compression withquantization is addressed in this paper. A merged neural network is built froma feedforward neural network and its quantized version to produce the exactoutput difference between two neural networks. Then, optimization-based methodsand reachability analysis methods are applied to the merged neural network tocompute the guaranteed quantization error. Finally, a numerical example isproposed to validate the applicability and effectiveness of the proposedapproach.", "output": "Guaranteed Quantization Error Computation for Neural Network Model Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In continuum-armed bandit problems where the underlying function resides in areproducing kernel Hilbert space (RKHS), namely, the kernelised banditproblems, an important open problem remains of how well learning algorithms canadapt if the regularity of the associated kernel function is unknown. In thiswork, we study adaptivity to the regularity of translation-invariant kernels,which is characterized by the decay rate of the Fourier transformation of thekernel, in the bandit setting. We derive an adaptivity lower bound, provingthat it is impossible to simultaneously achieve optimal cumulative regret in apair of RKHSs with different regularities. To verify the tightness of thislower bound, we show that an existing bandit model selection algorithm appliedwith minimax non-adaptive kernelised bandit algorithms matches the lower boundin dependence of $T$, the total number of steps, except for log factors. Byfilling in the regret bounds for adaptivity between RKHSs, we connect thestatistical difficulty for adaptivity in continuum-armed bandits in threefundamental types of function spaces: RKHS, Sobolev space, and H\"older space.", "output": "Adaptation to Misspecified Kernel Regularity in Kernelised Bandits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mixtures of Gaussian process experts is a class of models that cansimultaneously address two of the key limitations inherent in standard Gaussianprocesses: scalability and predictive performance. In particular, models thatuse Dirichlet processes as gating functions permit straightforwardinterpretation and automatic selection of the number of experts in a mixture.While the existing models are intuitive and capable of capturingnon-stationarity, multi-modality and heteroskedasticity, the simplicity oftheir gating functions may limit the predictive performance when applied tocomplex data-generating processes. Capitalising on the recent advancement inthe dependent Dirichlet processes literature, we propose a new mixture model ofGaussian process experts based on kernel stick-breaking processes. Our modelmaintains the intuitive appeal yet improve the performance of the existingmodels. To make it practical, we design a sampler for posterior computationbased on the slice sampling. The model behaviour and improved predictiveperformance are demonstrated in experiments using six datasets.", "output": "Mixtures of Gaussian process experts based on kernel stick-breaking processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current dialogue research primarily studies pairwise (two-party)conversations, and does not address the everyday setting where more than twospeakers converse together. In this work, we both collect and evaluatemulti-party conversations to study this more general case. We use the LIGHTenvironment to construct grounded conversations, where each participant has anassigned character to role-play. We thus evaluate the ability of languagemodels to act as one or more characters in such conversations. Models requiretwo skills that pairwise-trained models appear to lack: (1) being able todecide when to talk; (2) producing coherent utterances grounded on multiplecharacters. We compare models trained on our new dataset to existingpairwise-trained dialogue models, as well as large language models withfew-shot prompting. We find that our new dataset, MultiLIGHT, which we willpublicly release, can help bring significant improvements in the group setting.", "output": "Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper assesses the reliability of the RemOve-And-Retrain (ROAR)protocol, which is used to measure the performance of feature importanceestimates. Our findings from the theoretical background and empiricalexperiments indicate that attributions that possess less information about thedecision function can perform better in ROAR benchmarks, conflicting with theoriginal purpose of ROAR. This phenomenon is also observed in the recentlyproposed variant RemOve-And-Debias (ROAD), and we propose a consistent trend ofblurriness bias in ROAR attribution metrics. Our results caution againstuncritical reliance on ROAR metrics.", "output": "On Pitfalls of $\\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Innovative Electronic Design Automation (EDA) solutions are important to meetthe design requirements for increasingly complex electronic devices. Verilog, ahardware description language, is widely used for the design and verificationof digital circuits and is synthesized using specific EDA tools. However,writing code is a repetitive and time-intensive task. This paper proposes,primarily, a novel deep learning framework for training a Verilogautocompletion model and, secondarily, a Verilog dataset of files and snippetsobtained from open-source repositories. The framework involves integratingmodels pretrained on general programming language data and finetuning them on adataset curated to be similar to a target downstream task. This is validated bycomparing different pretrained models trained on different subsets of theproposed Verilog dataset using multiple evaluation metrics. These experimentsdemonstrate that the proposed framework achieves better BLEU, ROUGE-L, and chrFscores by 9.5%, 6.7%, and 6.9%, respectively, compared to a model trained fromscratch.", "output": "A Deep Learning Framework for Verilog Autocompletion Towards Design and Verification Automation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-supervised learning (SSL) algorithms can produce useful imagerepresentations by learning to associate different parts of natural images withone another. However, when taken to the extreme, SSL models can unintendedlymemorize specific parts in individual training samples rather than learningsemantically meaningful associations. In this work, we perform a systematicstudy of the unintended memorization of image-specific information in SSLmodels -- which we refer to as d'ej`a vu memorization. Concretely, we showthat given the trained model and a crop of a training image containing only thebackground (e.g., water, sky, grass), it is possible to infer the foregroundobject with high accuracy or even visually reconstruct it. Furthermore, we showthat d'ej`a vu memorization is common to different SSL algorithms, isexacerbated by certain design choices, and cannot be detected by conventionaltechniques for evaluating representation quality. Our study of d'ej`a vumemorization reveals previously unknown privacy risks in SSL models, as well assuggests potential practical mitigation strategies. Code is available at", "output": "Do SSL Models Have D\\'ej\\`a Vu? A Case of Unintended Memorization in Self-supervised Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, product categorisation has been a common issue forE-commerce companies who have utilised machine learning to categorise theirproducts automatically. In this study, we propose an ensemble approach, using acombination of different models to separately predict each product's category,subcategory, and colour before ultimately combining the resultant predictionsfor each product. With the aforementioned approach, we show that an averageF1-score of 0.82 can be achieved using a combination of XGBoost and k-nearestneighbours to predict said features.", "output": "Categorising Products in an Online Marketplace: An Ensemble Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative multimodal models based on diffusion models have seen tremendousgrowth and advances in recent years. Models such as DALL-E and Stable Diffusionhave become increasingly popular and successful at creating images from texts,often combining abstract ideas. However, like other deep learning models, theyalso reflect social biases they inherit from their training data, which isoften crawled from the internet. Manually auditing models for biases can bevery time and resource consuming and is further complicated by the unboundedand unconstrained nature of inputs these models can take. Research into biasmeasurement and quantification has generally focused on small single-stagemodels working on a single modality. Thus the emergence of multistagemultimodal models requires a different approach. In this paper, we proposeMultimodal Composite Association Score (MCAS) as a new method of measuringgender bias in multimodal generative models. Evaluating both DALL-E 2 andStable Diffusion using this approach uncovered the presence of genderedassociations of concepts embedded within the models. We propose MCAS as anaccessible and scalable method of quantifying potential bias for models withdifferent modalities and a range of potential biases.", "output": "Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep-learning inverse techniques have attracted significant attention inrecent years. Among them, the neural adjoint (NA) method, which employs aneural network surrogate simulator, has demonstrated impressive performance inthe design tasks of artificial electromagnetic materials (AEM). However, theimpact of the surrogate simulators' accuracy on the solutions in the NA methodremains uncertain. Furthermore, achieving sufficient optimization becomeschallenging in this method when the surrogate simulator is large, andcomputational resources are limited. Additionally, the behavior underconstraints has not been studied, despite its importance from the engineeringperspective. In this study, we investigated the impact of surrogate simulators'accuracy on the solutions and discovered that the more accurate the surrogatesimulator is, the better the solutions become. We then developed an extensionof the NA method, named Neural Lagrangian (NeuLag) method, capable ofefficiently optimizing a sufficient number of solution candidates. We thendemonstrated that the NeuLag method can find optimal solutions even whenhandling sufficient candidates is difficult due to the use of a large andaccurate surrogate simulator. The resimulation errors of the NeuLag method wereapproximately 1/50 compared to previous methods for three AEM tasks. Finally,we performed optimization under constraint using NA and NeuLag, and confirmedtheir potential in optimization with soft or hard constraints. We believe ourmethod holds potential in areas that require large and accurate surrogatesimulators.", "output": "Enhancing Inverse Problem Solutions with Accurate Surrogate Simulators and Promising Candidates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent years brought advancements in using neural networks for representationlearning of various language or visual phenomena. New methods freed datascientists from hand-crafting features for common tasks. Similarly, problemsthat require considering the spatial variable can benefit from pretrained mapregion representations instead of manually creating feature tables that oneneeds to prepare to solve a task. However, very few methods for map arearepresentation exist, especially with respect to road network characteristics.In this paper, we propose a method for generating microregions' embeddings withrespect to their road infrastructure characteristics. We base ourrepresentations on OpenStreetMap road networks in a selection of cities and usethe H3 spatial index to allow reproducible and scalable representationlearning. We obtained vector representations that detect how similar maphexagons are in the road networks they contain. Additionally, we observe thatembeddings yield a latent space with meaningful arithmetic operations. Finally,clustering methods allowed us to draft a high-level typology of obtainedrepresentations. We are confident that this contribution will aid datascientists working on infrastructure-related prediction tasks with spatialvariables.", "output": "highway2vec -- representing OpenStreetMap microregions with respect to their road network characteristics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the binary and continuous negative-margin perceptrons as simplenon-convex neural network models learning random rules and associations. Weanalyze the geometry of the landscape of solutions in both models and findimportant similarities and differences. Both models exhibit subdominantminimizers which are extremely flat and wide. These minimizers coexist with abackground of dominant solutions which are composed by an exponential number ofalgorithmically inaccessible small clusters for the binary case (the frozen1-RSB phase) or a hierarchical structure of clusters of different sizes for thespherical case (the full RSB phase). In both cases, when a certain threshold inconstraint density is crossed, the local entropy of the wide flat minimabecomes non-monotonic, indicating a break-up of the space of robust solutionsinto disconnected components. This has a strong impact on the behavior ofalgorithms in binary models, which cannot access the remaining isolatedclusters. For the spherical case the behaviour is different, since even beyondthe disappearance of the wide flat minima the remaining solutions are shown toalways be surrounded by a large number of other solutions at any distance, upto capacity. Indeed, we exhibit numerical evidence that algorithms seem to findsolutions up to the SAT/UNSAT transition, that we compute here using an 1RSBapproximation. For both models, the generalization performance as a learningdevice is shown to be greatly improved by the existence of wide flat minimizerseven when trained in the highly underconstrained regime of very negativemargins.", "output": "Typical and atypical solutions in non-convex neural networks with discrete and continuous weights."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In online forums like Reddit, users share their experiences with medicalconditions and treatments, including making claims, asking questions, anddiscussing the effects of treatments on their health. Building systems tounderstand this information can effectively monitor the spread ofmisinformation and verify user claims. The Task-8 of the 2023 InternationalWorkshop on Semantic Evaluation focused on medical applications, specificallyextracting patient experience- and medical condition-related entities from userposts on social media. The Reddit Health Online Talk (RedHot) corpus containsposts from medical condition-related subreddits with annotations characterizingthe patient experience and medical conditions. In Subtask-1, patient experienceis characterized by personal experience, questions, and claims. In Subtask-2,medical conditions are characterized by population, intervention, and outcome.For the automatic extraction of patient experiences and medical conditioninformation, as a part of the challenge, we proposed language-model-basedextraction systems that ranked $3^{rd}$ on both subtasks' leaderboards. In thiswork, we describe our approach and, in addition, explore the automaticextraction of this information using domain-specific language models and theinclusion of external knowledge.", "output": "MasonNLP+ at SemEval-2023 Task 8: Extracting Medical Questions, Experiences and Claims from Social Media using Knowledge-Augmented Pre-trained Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study aims to alleviate the trade-off between utility and privacy in thetask of differentially private clustering. Existing works focus on simpleclustering methods, which show poor clustering performance for non-convexclusters. By utilizing Morse theory, we hierarchically connect the Gaussiansub-clusters to fit complex cluster distributions. Because differentiallyprivate sub-clusters are obtained through the existing methods, the proposedmethod causes little or no additional privacy loss. We provide a theoreticalbackground that implies that the proposed method is inductive and can achieveany desired number of clusters. Experiments on various datasets show that ourframework achieves better clustering performance at the same privacy level,compared to the existing methods.", "output": "Improving the Utility of Differentially Private Clustering through Dynamical Processing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Reinforcement Learning has shown significant progress in extractinguseful representations from high-dimensional inputs albeit using hand-craftedauxiliary tasks and pseudo rewards. Automatically learning such representationsin an object-centric manner geared towards control and fast adaptation remainsan open research problem. In this paper, we introduce a method that tries todiscover meaningful features from objects, translating them to temporallycoherent \"question\" functions and leveraging the subsequent learned generalvalue functions for control. We compare our approach with state-of-the-arttechniques alongside other ablations and show competitive performance in bothstationary and non-stationary settings. Finally, we also investigate thediscovered general value functions and through qualitative analysis show thatthe learned representations are not only interpretable but also, centeredaround objects that are invariant to changes across tasks facilitating fastadaptation.", "output": "Discovering Object-Centric Generalized Value Functions From Pixels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While the use of the Internet of Things is becoming more and more popular,many security vulnerabilities are emerging with the large number of devicesbeing introduced to the market. In this environment, IoT device identificationmethods provide a preventive security measure as an important factor inidentifying these devices and detecting the vulnerabilities they suffer from.In this study, we present a method that identifies devices in the Aalto datasetusing Long short-term memory (LSTM)", "output": "LSTM based IoT Device Identification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, there has been a surge in effort to formalize notions offairness in machine learning. We focus on clustering -- one of the fundamentaltasks in unsupervised machine learning. We propose a new axiom that capturesproportional representation fairness (PRF). We make a case that the conceptachieves the raison d'{^{e}}tre of several existing concepts in the literaturein an arguably more convincing manner. Our fairness concept is not satisfied byexisting fair clustering algorithms. We design efficient algorithms to achievePRF both for unconstrained and discrete clustering problems.", "output": "Proportionally Representative Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNN) have become a common sensing modality inautonomous systems as they allow for semantically perceiving the ambientenvironment given input images. Nevertheless, DNN models have proven to bevulnerable to adversarial digital and physical attacks. To mitigate this issue,several detection frameworks have been proposed to detect whether a singleinput image has been manipulated by adversarial digital noise or not. In ourprior work, we proposed a real-time detector, called VisionGuard (VG), foradversarial physical attacks against single input images to DNN models.Building upon that work, we propose VisionGuard* (VG), which couples VG withmajority-vote methods, to detect adversarial physical attacks in time-seriesimage data, e.g., videos. This is motivated by autonomous systems applicationswhere images are collected over time using onboard sensors for decision-makingpurposes. We emphasize that majority-vote mechanisms are quite common inautonomous system applications (among many other applications), as e.g., inautonomous driving stacks for object detection. In this paper, we investigate,both theoretically and experimentally, how this widely used mechanism can beleveraged to enhance the performance of adversarial detectors. We haveevaluated VG* on videos of both clean and physically attacked traffic signsgenerated by a state-of-the-art robust physical attack. We provide extensivecomparative experiments against detectors that have been designed originallyfor out-of-distribution data and digitally attacked images.", "output": "Detection of Adversarial Physical Attacks in Time-Series Image Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Organizations are increasingly adopting machine learning (ML) for personnelassessment. However, concerns exist about fairness in designing andimplementing ML assessments. Supervised ML models are trained to model patternsin data, meaning ML models tend to yield predictions that reflect subgroupdifferences in applicant attributes in the training data, regardless of theunderlying cause of subgroup differences. In this study, we systematicallyunder- and oversampled minority (Black and Hispanic) applicants to manipulateadverse impact ratios in training data and investigated how training dataadverse impact ratios affect ML model adverse impact and accuracy. We usedself-reports and interview transcripts from job applicants (N = 2,501) to train9,702 ML models to predict screening decisions. We found that training dataadverse impact related linearly to ML model adverse impact. However, removingadverse impact from training data only slightly reduced ML model adverse impactand tended to negatively affect ML model accuracy. We observed consistenteffects across self-reports and interview transcripts, whether oversamplingreal (i.e., bootstrapping) or synthetic observations. As our study relied onlimited predictor sets from one organization, the observed effects on adverseimpact may be attenuated among more accurate ML models.", "output": "Oversampling Higher-Performing Minorities During Machine Learning Model Training Reduces Adverse Impact Slightly but Also Reduces Model Accuracy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease thatresults in progressive articular destruction and severe disability. Joint spacenarrowing (JSN) progression has been regarded as an important indicator for RAprogression and has received sustained attention. In the diagnosis andmonitoring of RA, radiology plays a crucial role to monitor joint space. A newframework for monitoring joint space by quantifying JSN progression throughimage registration in radiographic images has been developed. This frameworkoffers the advantage of high accuracy, however, challenges do exist in reducingmismatches and improving reliability. In this work, a deep intra-subject rigidregistration network is proposed to automatically quantify JSN progression inthe early stage of RA. In our experiments, the mean-square error of Euclideandistance between moving and fixed image is 0.0031, standard deviation is 0.0661mm, and the mismatching rate is 0.48%. The proposed method has sub-pixel levelaccuracy, exceeding manual measurements by far, and is equipped with immune tonoise, rotation, and scaling of joints. Moreover, this work provides lossvisualization, which can aid radiologists and rheumatologists in assessingquantification reliability, with important implications for possible futureclinical applications. As a result, we are optimistic that this proposed workwill make a significant contribution to the automatic quantification of JSNprogression in RA.", "output": "A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In 1-bit matrix completion, the aim is to estimate an underlying low-rankmatrix from a partial set of binary observations. We propose a novel method for1-bit matrix completion called MMGN. Our method is based on themajorization-minimization (MM) principle, which yields a sequence of standardlow-rank matrix completion problems in our setting. We solve each of thesesub-problems by a factorization approach that explicitly enforces the assumedlow-rank structure and then apply a Gauss-Newton method. Our numerical studiesand application to a real-data example illustrate that MMGN outputs comparableif not more accurate estimates, is often significantly faster, and is lesssensitive to the spikiness of the underlying matrix than existing methods.", "output": "A Majorization-Minimization Gauss-Newton Method for 1-Bit Matrix Completion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Fairness-aware machine learning has garnered significant attention in recentyears because of extensive use of machine learning in sensitive applicationslike judiciary systems. Various heuristics, and optimization frameworks havebeen proposed to enforce fairness in classification cite{del2020review} wherethe later approaches either provides empirical results or provides fairnessguarantee for the exact minimizer of the objective functioncite{celis2019classification}. In modern machine learning, Stochastic GradientDescent (SGD) type algorithms are almost always used as training algorithmsimplying that the learned model, and consequently, its fairness properties arerandom. Hence, especially for crucial applications, it is imperative toconstruct Confidence Interval (CI) for the fairness of the learned model. Inthis work we provide CI for test unfairness when a group-fairness-aware,specifically, Disparate Impact (DI), and Disparate Mistreatment (DM) awarelinear binary classifier is trained using online SGD-type algorithms. We showthat asymptotically a Central Limit Theorem holds for the estimated modelparameter of both DI and DM-aware models. We provide online multiplierbootstrap method to estimate the asymptotic covariance to construct online CI.To do so, we extend the known theoretical guarantees shown on the consistencyof the online bootstrap method for unconstrained SGD to constrainedoptimization which could be of independent interest. We illustrate our resultson synthetic and real datasets.", "output": "Fairness Uncertainty Quantification: How certain are you that the model is fair?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The success of the Adam optimizer on a wide array of architectures has madeit the default in settings where stochastic gradient descent (SGD) performspoorly. However, our theoretical understanding of this discrepancy is lagging,preventing the development of significant improvements on either algorithm.Recent work advances the hypothesis that Adam and other heuristics likegradient clipping outperform SGD on language tasks because the distribution ofthe error induced by sampling has heavy tails. This suggests that Adamoutperform SGD because it uses a more robust gradient estimate. We evaluatethis hypothesis by varying the batch size, up to the entire dataset, to controlfor stochasticity. We present evidence that stochasticity and heavy-tailednoise are not major factors in the performance gap between SGD and Adam.Rather, Adam performs better as the batch size increases, while SGD is lesseffective at taking advantage of the reduction in noise. This raises thequestion as to why Adam outperforms SGD in the full-batch setting. Throughfurther investigation of simpler variants of SGD, we find that the behavior ofAdam with large batches is similar to sign descent with momentum.", "output": "Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we provide a rigorous proof of convergence of the AdaptiveMoment Estimate (Adam) algorithm for a wide class of optimization objectives.Despite the popularity and efficiency of the Adam algorithm in training deepneural networks, its theoretical properties are not yet fully understood, andexisting convergence proofs require unrealistically strong assumptions, such asglobally bounded gradients, to show the convergence to stationary points. Inthis paper, we show that Adam provably converges to $epsilon$-stationarypoints with $mathcal{O}(epsilon^{-4})$ gradient complexity under far morerealistic conditions. The key to our analysis is a new proof of boundedness ofgradients along the optimization trajectory, under a generalized smoothnessassumption according to which the local smoothness (i.e., Hessian norm when itexists) is bounded by a sub-quadratic function of the gradient norm. Moreover,we propose a variance-reduced version of Adam with an accelerated gradientcomplexity of $mathcal{O}(epsilon^{-3})$.", "output": "Convergence of Adam Under Relaxed Assumptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Domain generalization (DG) aims to tackle the distribution shift betweentraining domains and unknown target domains. Generating new domains is one ofthe most effective approaches, yet its performance gain depends on thedistribution discrepancy between the generated and target domains.Distributionally robust optimization is promising to tackle distributiondiscrepancy by exploring domains in an uncertainty set. However, theuncertainty set may be overwhelmingly large, leading to low-confidenceprediction in DG. It is because a large uncertainty set could introduce domainscontaining semantically different factors from training domains. To addressthis issue, we propose to perform a $textbf{mo}$derately$textbf{d}$istributional $textbf{e}$xploration (MODE) for domaingeneralization. Specifically, MODE performs distribution exploration in anuncertainty $textit{subset}$ that shares the same semantic factors with thetraining domains. We show that MODE can endow models with provablegeneralization performance on unknown target domains. The experimental resultsshow that MODE achieves competitive performance compared to state-of-the-artbaselines.", "output": "Moderately Distributional Exploration for Domain Generalization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In documents and graphics, contours are a popular format to describe specificshapes. For example, in the True Type Font (TTF) file format, contours describevector outlines of typeface shapes. Each contour is often defined as a sequenceof points. In this paper, we tackle the contour completion task. In this task,the input is a contour sequence with missing points, and the output is agenerated completed contour. This task is more difficult than image completionbecause, for images, the missing pixels are indicated. Since there is no suchindication in the contour completion task, we must solve the problem of missingpart detection and completion simultaneously. We propose a Transformer-basedmethod to solve this problem and show the results of the typeface contourcompletion.", "output": "Contour Completion by Transformers and Its Application to Vector Font Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformers are popular neural network models that use layers ofself-attention and fully-connected nodes with embedded tokens. VisionTransformers (ViT) adapt transformers for image recognition tasks. In order todo this, the images are split into patches and used as tokens. One issue withViT is the lack of inductive bias toward image structures. Because ViT wasadapted for image data from language modeling, the network does not explicitlyhandle issues such as local translations, pixel information, and informationloss in the structures and features shared by multiple patches. Conversely,Convolutional Neural Networks (CNN) incorporate this information. Thus, in thispaper, we propose the use of convolutional layers within ViT. Specifically, wepropose a model called a Vision Conformer (ViC) which replaces the Multi-LayerPerceptron (MLP) in a ViT layer with a CNN. In addition, to use the CNN, weproposed to reconstruct the image data after the self-attention in a reverseembedding layer. Through the evaluation, we demonstrate that the proposedconvolutions help improve the classification ability of ViT.", "output": "Vision Conformer: Incorporating Convolutions into Vision Transformer Layers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks are a promising tool for Audio Event Classification. Incontrast to other data like natural images, there are many sensible andnon-obvious representations for audio data, which could serve as input to thesemodels. Due to their black-box nature, the effect of different inputrepresentations has so far mostly been investigated by measuring classificationperformance. In this work, we leverage eXplainable AI (XAI), to understand theunderlying classification strategies of models trained on different inputrepresentations. Specifically, we compare two model architectures with regardto relevant input features used for Audio Event Detection: one directlyprocesses the signal as the raw waveform, and the other takes in itstime-frequency spectrogram representation. We show how relevance heatmapsobtained via \"Siren\"{Layer-wise Relevance Propagation} uncoverrepresentation-dependent decision strategies. With these insights, we can makea well-informed decision about the best input representation in terms ofrobustness and representativity and confirm that the model's classificationstrategies align with human requirements.", "output": "XAI-based Comparison of Input Representations for Audio Event Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributed learning paradigms, such as federated or decentralized learning,allow a collection of agents to solve global learning and optimization problemsthrough limited local interactions. Most such strategies rely on a mixture oflocal adaptation and aggregation steps, either among peers or at a centralfusion center. Classically, aggregation in distributed learning is based onaveraging, which is statistically efficient, but susceptible to attacks by evena small number of malicious agents. This observation has motivated a number ofrecent works, which develop robust aggregation schemes by employing robustvariations of the mean. We present a new attack based on sensitivity curvemaximization (SCM), and demonstrate that it is able to disrupt existing robustaggregation schemes by injecting small, but effective perturbations.", "output": "Attacks on Robust Distributed Learning Schemes via Sensitivity Curve Maximization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite their many desirable properties, Gaussian processes (GPs) are oftencompared unfavorably to deep neural networks (NNs) for lacking the ability tolearn representations. Recent efforts to bridge the gap between GPs and deepNNs have yielded a new class of inter-domain variational GPs in which theinducing variables correspond to hidden units of a feedforward NN. In thiswork, we examine some practical issues associated with this approach andpropose an extension that leverages the orthogonal decomposition of GPs tomitigate these limitations. In particular, we introduce spherical inter-domainfeatures to construct more flexible data-dependent basis functions for both theprincipal and orthogonal components of the GP approximation and show thatincorporating NN activation features under this framework not only alleviatesthese shortcomings but is more scalable than alternative strategies.Experiments on multiple benchmark datasets demonstrate the effectiveness of ourapproach.", "output": "Spherical Inducing Features for Orthogonally-Decoupled Gaussian Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite substantial progress in 3D human pose estimation from a single-viewimage, prior works rarely explore global and local correlations, leading toinsufficient learning of human skeleton representations. To address this issue,we propose a novel Interweaved Graph and Attention Network (IGANet) that allowsbidirectional communications between graph convolutional networks (GCNs) andattentions. Specifically, we introduce an IGA module, where attentions areprovided with local information from GCNs and GCNs are injected with globalinformation from attentions. Additionally, we design a simple yet effectiveU-shaped multi-layer perceptron (uMLP), which can capture multi-granularityinformation for body joints. Extensive experiments on two popular benchmarkdatasets (i.e. Human3.6M and MPI-INF-3DHP) are conducted to evaluate ourproposed method.The results show that IGANet achieves state-of-the-artperformance on both datasets. Code is available at", "output": "Interweaved Graph and Attention Network for 3D Human Pose Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Precise thigh muscle volumes are crucial to monitor the motor functionalityof patients with diseases that may result in various degrees of thigh muscleloss. T1-weighted MRI is the default surrogate to obtain thigh muscle masks dueto its contrast between muscle and fat signals. Deep learning approaches haverecently been widely used to obtain these masks through segmentation. However,due to the insufficient amount of precise annotations, thigh muscle masksgenerated by deep learning approaches tend to misclassify intra-muscular fat(IMF) as muscle impacting the analysis of muscle volumetrics. As IMF isinfiltrated inside the muscle, human annotations require expertise and time.Thus, precise muscle masks where IMF is excluded are limited in practice. Toalleviate this, we propose a few-shot segmentation framework to generate thighmuscle masks excluding IMF. In our framework, we design a novel pseudo-labelcorrection and evaluation scheme, together with a new noise robust loss forexploiting high certainty areas. The proposed framework only takes $1%$ of thefine-annotated training dataset, and achieves comparable performance with fullysupervised methods according to the experimental results.", "output": "Precise Few-shot Fat-free Thigh Muscle Segmentation in T1-weighted MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper provides answers to an open problem: given a nonlinear data-drivendynamical system model, e.g., kernel conditional mean embedding (CME) andKoopman operator, how can one propagate the ambiguity sets forward for multiplesteps? This problem is the key to solving distributionally robust control andlearning-based control of such learned system models under a data-distributionshift. Different from previous works that use either static ambiguity sets,e.g., fixed Wasserstein balls, or dynamic ambiguity sets under known piece-wiselinear (or affine) dynamics, we propose an algorithm that exactly propagatesambiguity sets through nonlinear data-driven models using the Koopman operatorand CME, via the kernel maximum mean discrepancy geometry. Through boththeoretical and numerical analysis, we show that our kernel ambiguity sets arethe natural geometric structure for the learned data-driven dynamical systemmodels.", "output": "Propagating Kernel Ambiguity Sets in Nonlinear Data-driven Dynamics Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning methods are highly accurate, yet their opaque decision processprevents them from earning full human trust. Concept-based models aim toaddress this issue by learning tasks based on a set of human-understandableconcepts. However, state-of-the-art concept-based models rely onhigh-dimensional concept embedding representations which lack a clear semanticmeaning, thus questioning the interpretability of their decision process. Toovercome this limitation, we propose the Deep Concept Reasoner (DCR), the firstinterpretable concept-based model that builds upon concept embeddings. In DCR,neural networks do not make task predictions directly, but they build syntacticrule structures using concept embeddings. DCR then executes these rules onmeaningful concept truth degrees to provide a final interpretable andsemantically-consistent prediction in a differentiable manner. Our experimentsshow that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretableconcept-based models on challenging benchmarks (ii) discovers meaningful logicrules matching known ground truths even in the absence of concept supervisionduring training, and (iii), facilitates the generation of counterfactualexamples providing the learnt rules as guidance.", "output": "Interpretable Neural-Symbolic Concept Reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Human-object interactions (HOIs) are crucial for human-centric sceneunderstanding applications such as human-centric visual generation, AR/VR, androbotics. Since existing methods mainly explore capturing HOIs, rendering HOIremains less investigated. In this paper, we address this challenge in HOIanimation from a compositional perspective, i.e., animating novel HOIsincluding novel interaction, novel human and/or novel object driven by a novelpose sequence. Specifically, we adopt neural human-object deformation to modeland render HOI dynamics based on implicit neural representations. To enable theinteraction pose transferring among different persons and objects, we thendevise a new compositional conditional neural radiance field (or CC-NeRF),which decomposes the interdependence between human and object using latentcodes to enable compositionally animation control of novel HOIs. Experimentsshow that the proposed method can generalize well to various novel HOIanimation settings. Our project page is ", "output": "Compositional 3D Human-Object Neural Animation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper we present the first version of ganX -- generate artificiallynew XRF, a Python library to generate X-ray fluorescence Macro maps (MA-XRF)from a coloured RGB image. To do that, a Monte Carlo method is used, where eachMA-XRF pixel signal is sampled out of an XRF signal probability function. Suchprobability function is computed using a database of couples (pigmentcharacteristic XRF signal, RGB), by a weighted sum of such pigment XRF signalby proximity of the image RGB to the pigment characteristic RGB. The library isreleased to PyPi and the code is available open source on GitHub.", "output": "ganX -- generate artificially new XRF a python library to generate MA-XRF raw data out of RGB images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the huge recent breakthroughs in neural networks (NNs) for artificialintelligence (specifically deep convolutional networks) such NNs do not achievehuman-level performance: they can be hacked by images that would fool no humanand lack `common sense'. It has been argued that a basis of human-levelintelligence is mankind's ability to perform relational reasoning: thecomparison of different objects, measuring similarity, grasping of relationsbetween objects and the converse, figuring out the odd one out in a set ofobjects. Mankind can even do this with objects they have never seen before.Here we show how ClusterFlow, a semi-supervised hierarchical clusteringframework can operate on trained NNs utilising the rich multi-dimensional classand feature data found at the pre-SoftMax layer to build a hyperspacial map ofclasses/features and this adds more human-like functionality to modern deepconvolutional neural networks. We demonstrate this with 3 tasks. 1. thestatistical learning based `mistakes' made by infants when attending to imagesof cats and dogs. 2. improving both the resilience to hacking images and theaccurate measure of certainty in deep-NNs. 3. Relational reasoning over sets ofimages, including those not known to the NN nor seen before. We alsodemonstrate that ClusterFlow can work on non-NN data and deal with missing databy testing it on a Chemistry dataset. This work suggests that modern deep NNscan be made more human-like without re-training of the NNs. As it is known thatsome methods used in deep and convolutional NNs are not biologically plausibleor perhaps even the best approach: the ClusterFlow framework can sit on top ofany NN and will be a useful tool to add as NNs are improved in this regard.", "output": "Cluster Flow: how a hierarchical clustering layer make allows deep-NNs more resilient to hacking, more human-like and easily implements relational reasoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces JaxPruner, an open-source JAX-based pruning and sparsetraining library for machine learning research. JaxPruner aims to accelerateresearch on sparse neural networks by providing concise implementations ofpopular pruning and sparse training algorithms with minimal memory and latencyoverhead. Algorithms implemented in JaxPruner use a common API and workseamlessly with the popular optimization library Optax, which, in turn, enableseasy integration with existing JAX based libraries. We demonstrate this ease ofintegration by providing examples in four different codebases: Scenic, t5x,Dopamine and FedJAX and provide baseline experiments on popular benchmarks.", "output": "JaxPruner: A concise library for sparsity research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Explainable AI (XAI) aims to answer ethical and legal questions associatedwith the deployment of AI models. However, a considerable number ofdomain-specific reviews highlight the need of a mathematical foundation for thekey notions in the field, considering that even the term \"explanation\" stilllacks a precise definition. These reviews also advocate for a sound andunifying formalism for explainable AI, to avoid the emergence of ill-posedquestions, and to help researchers navigate a rapidly growing body ofknowledge. To the authors knowledge, this paper is the first attempt to fillthis gap by formalizing a unifying theory of XAI. Employing the framework ofcategory theory, and feedback monoidal categories in particular, we firstprovide formal definitions for all essential terms in explainable AI. Then wepropose a taxonomy of the field following the proposed structure, showing howthe introduced theory can be used to categorize all the main classes of XAIsystems currently studied in literature. In summary, the foundation of XAIproposed in this paper represents a significant tool to properly frame futureresearch lines, and a precious guidance for new researchers approaching thefield.", "output": "Categorical Foundations of Explainable AI: A Unifying Formalism of Structures and Semantics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "It is essential for autonomous robots to be socially compliant whilenavigating in human-populated environments. Machine Learning and, especially,Deep Reinforcement Learning have recently gained considerable traction in thefield of Social Navigation. This can be partially attributed to the resultingpolicies not being bound by human limitations in terms of code complexity orthe number of variables that are handled. Unfortunately, the lack of safetyguarantees and the large data requirements by DRL algorithms make learning inthe real world unfeasible. To bridge this gap, simulation environments arefrequently used. We propose SocNavGym, an advanced simulation environment forsocial navigation that can generate a wide variety of social navigationscenarios and facilitates the development of intelligent social agents.SocNavGym is light-weight, fast, easy-to-use, and can be effortlesslyconfigured to generate different types of social navigation scenarios. It canalso be configured to work with different hand-crafted and data-driven socialreward signals and to yield a variety of evaluation metrics to benchmarkagents' performance. Further, we also provide a case study where a Dueling-DQNagent is trained to learn social-navigation policies using SocNavGym. Theresults provides evidence that SocNavGym can be used to train an agent fromscratch to navigate in simple as well as complex social scenarios. Ourexperiments also show that the agents trained using the data-driven rewardfunction displays more advanced social compliance in comparison to theheuristic-based reward function.", "output": "SocNavGym: A Reinforcement Learning Gym for Social Navigation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Interactions between humans are diverse and context-dependent, but previousworks have treated them as categorical, disregarding the heavy tail of possibleinteractions. We propose a new paradigm of learning human-human interactions asfree text from a single still image, allowing for flexibility in modeling theunlimited space of situations and relationships between people. To overcome theabsence of data labelled specifically for this task, we use knowledgedistillation applied to synthetic caption data produced by a large languagemodel without explicit supervision. We show that the pseudo-labels produced bythis procedure can be used to train a captioning model to effectivelyunderstand human-human interactions in images, as measured by a variety ofmetrics that measure textual and semantic faithfulness and factual groundednessof our predictions. We further show that our approach outperforms SOTA imagecaptioning and situation recognition models on this task. We will release ourcode and pseudo-labels along with Waldo and Wenda, a manually-curated test setfor still image human-human interaction understanding.", "output": "Learning Human-Human Interactions in Images from Weak Textual Supervision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large multimodal datasets have been instrumental in recent breakthroughs suchas CLIP, Stable Diffusion, and GPT-4. At the same time, datasets rarely receivethe same research attention as model architectures or training algorithms. Toaddress this shortcoming in the machine learning ecosystem, we introduceDataComp, a benchmark where the training code is fixed and researchers innovateby proposing new training sets. We provide a testbed for dataset experimentscentered around a new candidate pool of 12.8B image-text pairs from CommonCrawl. Participants in our benchmark design new filtering techniques or curatenew data sources and then evaluate their new dataset by running ourstandardized CLIP training code and testing on 38 downstream test sets. Ourbenchmark consists of multiple scales, with four candidate pool sizes andassociated compute budgets ranging from 12.8M to 12.8B samples seen duringtraining. This multi-scale design facilitates the study of scaling trends andmakes the benchmark accessible to researchers with varying resources.Our baseline experiments show that the DataComp workflow is a promising wayof improving multimodal datasets. We introduce DataComp-1B, a dataset createdby applying a simple filtering algorithm to the 12.8B candidate pool. Theresulting 1.4B subset enables training a CLIP ViT-L/14 from scratch to 79.2%zero-shot accuracy on ImageNet. Our new ViT-L/14 model outperforms a largerViT-g/14 trained on LAION-2B by 0.7 percentage points while requiring 9x lesstraining compute. We also outperform OpenAI's CLIP ViT-L/14 by 3.7 percentagepoints, which is trained with the same compute budget as our model. These gainshighlight the potential for improving model performance by carefully curatingtraining sets. We view DataComp-1B as only the first step and hope thatDataComp paves the way toward the next generation of multimodal datasets.", "output": "DataComp: In search of the next generation of multimodal datasets."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Causal datasets play a critical role in advancing the field of causality.However, existing datasets often lack the complexity of real-world issues suchas selection bias, unfaithful data, and confounding. To address this gap, wepropose a new synthetic causal dataset, the Structurally Complex with AdditivepaRent causalitY (SCARY) dataset, which includes the following features. Thedataset comprises 40 scenarios, each generated with three different seeds,allowing researchers to leverage relevant subsets of the dataset. Additionally,we use two different data generation mechanisms for generating the causalrelationship between parents and child nodes, including linear and mixed causalmechanisms with multiple sub-types. Our dataset generator is inspired by theCausal Discovery Toolbox and generates only additive models. The dataset has aVarsortability of 0.5. Our SCARY dataset provides a valuable resource forresearchers to explore causal discovery under more realistic scenarios. Thedataset is available at ", "output": "The Structurally Complex with Additive Parent Causality (SCARY) Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Scientific Machine Learning (SciML) is concerned with the development oflearned emulators of physical systems governed by partial differentialequations (PDE). In application domains such as weather forecasting, moleculardynamics, and inverse design, ML-based surrogate models are increasingly usedto augment or replace inefficient and often non-differentiable numericalsimulation algorithms. While a number of ML-based methods for approximating thesolutions of PDEs have been proposed in recent years, they typically do notadapt to the parameters of the PDEs, making it difficult to generalize to PDEparameters not seen during training. We propose a Channel Attention mechanismguided by PDE Parameter Embeddings (CAPE) component for neural surrogate modelsand a simple yet effective curriculum learning strategy. The CAPE module can becombined with neural PDE solvers allowing them to adapt to unseen PDEparameters. The curriculum learning strategy provides a seamless transitionbetween teacher-forcing and fully auto-regressive training. We compare CAPE inconjunction with the curriculum learning strategy using a popular PDE benchmarkand obtain consistent and significant improvements over the baseline models.The experiments also show several advantages of CAPE, such as its increasedability to generalize to unseen PDE parameters without large increasesinference time and parameter count.", "output": "Learning Neural PDE Solvers with Parameter-Guided Channel Attention."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The meteorological radar reflectivity data, also known as echo, plays acrucial role in predicting precipitation and enabling accurate and fastforecasting of short-term heavy rainfall without the need for complex NumericalWeather Prediction (NWP) model. Compared to conventional model, Deep Learning(DL)-based radar echo extrapolation algorithms are more effective andefficient. However, the development of highly reliable and generalizedalgorithms is hindered by three main bottlenecks: cumulative error spreading,imprecise representation of sparse echo distribution, and inaccuratedescription of non-stationary motion process. To address these issues, thispaper presents a novel radar echo extrapolation algorithm that utilizestemporal-spatial correlation features and the Transformer technology. Thealgorithm extracts features from multi-frame echo images that accuratelyrepresent non-stationary motion processes for precipitation prediction. Theproposed algorithm uses a novel parallel encoder based on Transformertechnology to effectively and automatically extract echoes' temporal-spatialfeatures. Furthermore, a Multi-level Temporal-Spatial attention mechanism isadopted to enhance the ability to perceive global-local information andhighlight the task-related feature regions in a lightweight way. The proposedmethod's effectiveness has been valided on the classic radar echo extrapolationtask using the real-world dataset. Numerous experiments have furtherdemonstrated the effectiveness and necessity of various components of theproposed method.", "output": "TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel application of category theory for deep learning. We showhow category theory can be used to understand and work with the linear layerfunctions of group equivariant neural networks whose layers are some tensorpower space of $mathbb{R}^{n}$ for the groups $S_n$, $O(n)$, $Sp(n)$, and$SO(n)$. By using category theoretic constructions, we build a richer structurethat is not seen in the original formulation of these neural networks, leadingto new insights. In particular, we outline the development of an algorithm forquickly computing the result of a vector that is passed through an equivariant,linear layer for each group in question. The success of our approach suggeststhat category theory could be beneficial for other areas of deep learning.", "output": "Categorification of Group Equivariant Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A spiking neural network (SNN) equalizer with a decision feedback structureis applied to an IM/DD link with various parameters. The SNN outperforms linearand artificial neural network (ANN) based equalizers.", "output": "Spiking Neural Network Decision Feedback Equalization for IM/DD Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The learnable, linear neural network layers between tensor power spaces of$mathbb{R}^{n}$ that are equivariant to the orthogonal group, $O(n)$, thespecial orthogonal group, $SO(n)$, and the symplectic group, $Sp(n)$, werecharacterised in arXiv:2212.08630. We present an algorithm for multiplying avector by any weight matrix for each of these groups, using category theoreticconstructions to implement the procedure. We achieve a significant reduction incomputational cost compared with a naive implementation by making use ofKronecker product matrices to perform the multiplication. We show that ourapproach extends to the symmetric group, $S_n$, recovering the algorithm ofarXiv:2303.06208 in the process.", "output": "An Algorithm for Computing with Brauer's Group Equivariant Neural Network Layers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a method to explore the flavor structure of quarks and leptonswith reinforcement learning. As a concrete model, we utilize a basicpolicy-based algorithm for models with $U(1)$ flavor symmetry. By trainingneural networks on the $U(1)$ charges of quarks and leptons, the agent finds 21models to be consistent with experimentally measured masses and mixing anglesof quarks and leptons. In particular, an intrinsic value of normal orderingtends to be larger than that of inverted ordering, and the normal ordering iswell fitted with the current experimental data in contrast to the invertedordering. A specific value of effective mass for the neutrinoless double betadecay and a sizable leptonic CP violation induced by an angular component offlavon field are predicted by autonomous behavior of the agent.", "output": "Exploring the flavor structure of quarks and leptons with reinforcement learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models (LLMs) have demonstrated impressive zero-shot abilitieson a variety of open-ended tasks, while recent research has also explored theuse of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl,a novel training paradigm that equips LLMs with multi-modal abilities throughmodularized learning of foundation LLM, a visual knowledge module, and a visualabstractor module. This approach can support multiple modalities and facilitatediverse unimodal and multimodal abilities through modality collaboration. Thetraining paradigm of mPLUG-Owl involves a two-stage method for aligning imageand text, which learns visual knowledge with the assistance of LLM whilemaintaining and even improving the generation abilities of LLM. In the firststage, the visual knowledge module and abstractor module are trained with afrozen LLM module to align the image and text. In the second stage,language-only and multi-modal supervised datasets are used to jointly fine-tunea low-rank adaption (LoRA) module on LLM and the abstractor module by freezingthe visual knowledge module. We carefully build a visually-related instructionevaluation set OwlEval. Experimental results show that our model outperformsexisting multi-modal models, demonstrating mPLUG-Owl's impressive instructionand visual understanding ability, multi-turn conversation ability, andknowledge reasoning ability. Besides, we observe some unexpected and excitingabilities such as multi-image correlation and scene text understanding, whichmakes it possible to leverage it for harder real scenarios, such as vision-onlydocument comprehension. Our code, pre-trained model, instruction-tuned models,and evaluation set are available at Theonline demo is available at ", "output": "mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cluster separation in scatterplots is a task that is typically tackled bywidely used clustering techniques, such as for instance k-means or DBSCAN.However, as these algorithms are based on non-perceptual metrics, their outputoften does not reflect human cluster perception. To bridge the gap betweenhuman cluster perception and machine-computed clusters, we propose a learningstrategy which directly operates on scattered data. To learn perceptual clusterseparation on this data, we crowdsourced a large scale dataset, consisting of7,320 point-wise cluster affiliations for bivariate data, which has beenlabeled by 384 human crowd workers. Based on this data, we were able to trainClusterNet, a point-based deep learning model, trained to reflect humanperception of cluster separability. In order to train ClusterNet on humanannotated data, we omit rendering scatterplots on a 2D canvas, but rather use aPointNet++ architecture enabling inference on point clouds directly. In thiswork, we provide details on how we collected our dataset, report statistics ofthe resulting annotations, and investigate perceptual agreement of clusterseparation for real-world data. We further report the training and evaluationprotocol of ClusterNet and introduce a novel metric, that measures the accuracybetween a clustering technique and a group of human annotators. Finally, wecompare our approach against existing state-of-the-art clustering techniques.", "output": "ClusterNet: A Perception-Based Clustering Model for Scattered Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose the first online quantum algorithm for zero-sum games with $tildeO(1)$ regret under the game setting. Moreover, our quantum algorithm computesan $varepsilon$-approximate Nash equilibrium of an $m times n$ matrixzero-sum game in quantum time $tilde O(sqrt{m+n}/varepsilon^{2.5})$,yielding a quadratic improvement over classical algorithms in terms of $m, n$.Our algorithm uses standard quantum inputs and generates classical outputs withsuccinct descriptions, facilitating end-to-end applications. As an application,we obtain a fast quantum linear programming solver. Technically, our onlinequantum algorithm \"quantizes\" classical algorithms based on the optimisticmultiplicative weight update method. At the heart of our algorithm is a fastquantum multi-sampling procedure for the Gibbs sampling problem, which may beof independent interest.", "output": "Logarithmic-Regret Quantum Learning Algorithms for Zero-Sum Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We use a binary attribute representation (BAR) model to describe a data setof Netflix viewers' ratings of movies. We classify the viewers with discretebits rather than continuous parameters, which makes the representation compactand transparent. The attributes are easy to interpret, and we need far fewerattributes than similar methods do to achieve the same level of error. We alsotake advantage of the nonuniform distribution of ratings among the movies inthe data set to train on a small selection of movies without compromisingperformance on the rest of the movies.", "output": "A transparent approach to data representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of the linear law-based feature space transformation (LLT) algorithmis to assist with the classification of univariate and multivariate timeseries. The presented R package, called LLT, implements this algorithm in aflexible yet user-friendly way. This package first splits the instances intotraining and test sets. It then utilizes time-delay embedding and spectraldecomposition techniques to identify the governing patterns (called linearlaws) of each input sequence (initial feature) within the training set.Finally, it applies the linear laws of the training set to transform theinitial features of the test set. These steps are performed by three separatefunctions called trainTest, trainLaw, and testTrans. Their application requiresa predefined data structure; however, for fast calculation, they use onlybuilt-in functions. The LLT R package and a sample dataset with the appropriatedata structure are publicly available on GitHub.", "output": "LLT: An R package for Linear Law-based Feature Space Transformation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Experimental data is often comprised of variables measured independently, atdifferent sampling rates (non-uniform ${Delta}$t between successivemeasurements); and at a specific time point only a subset of all variables maybe sampled. Approaches to identifying dynamical systems from such datatypically use interpolation, imputation or subsampling to reorganize or modifythe training data $textit{prior}$ to learning. Partial physical knowledge mayalso be available $textit{a priori}$ (accurately or approximately), anddata-driven techniques can complement this knowledge. Here we exploit neuralnetwork architectures based on numerical integration methods and $textit{apriori}$ physical knowledge to identify the right-hand side of the underlyinggoverning differential equations. Iterates of such neural-network models allowfor learning from data sampled at arbitrary time points $textit{without}$ datamodification. Importantly, we integrate the network with available partialphysical knowledge in \"physics informed gray-boxes\"; this enables learningunknown kinetic rates or microbial growth functions while simultaneouslyestimating experimental parameters.", "output": "Some of the variables, some of the parameters, some of the times, with some physics known: Identification with partial information."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Self-distillation relies on its own information to improve the generalizationability of the model and has a bright future. Existing self-distillationmethods either require additional models, model modification, or batch sizeexpansion for training, which increases the difficulty of use, memoryconsumption, and computational cost. This paper developed Self-discipline onmultiple channels(SMC), which combines consistency regularization withself-distillation using the concept of multiple channels. Conceptually, SMCconsists of two steps: 1) each channel data is simultaneously passed throughthe model to obtain its corresponding soft label, and 2) the soft label savedin the previous step is read together with the soft label obtained from thecurrent channel data through the model to calculate the loss function. SMC usesconsistent regularization and self-distillation to improve the generalizationability of the model and the robustness of the model to noisy labels. We namedthe SMC containing only two channels as SMC-2. Comparative experimental resultson both datasets show that SMC-2 outperforms Label Smoothing Regularizaion andSelf-distillation From The Last Mini-batch on all models, and outperforms thestate-of-the-art Sharpness-Aware Minimization method on 83% of themodels.Compatibility of SMC-2 and data augmentation experimental results showthat using both SMC-2 and data augmentation improves the generalization abilityof the model between 0.28% and 1.80% compared to using only data augmentation.Ultimately, the results of the label noise interference experiments show thatSMC-2 curbs the tendency that the model's generalization ability decreases inthe late training period due to the interference of label noise. The code isavailable at", "output": "Self-discipline on multiple channels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning (DL) has been a revolutionary technique in various domains. Tofacilitate the model development and deployment, many deep learning frameworksare proposed, among which PyTorch is one of the most popular solutions. Theperformance of ecosystem around PyTorch is critically important, which savesthe costs of training models and reduces the response time of model inferences.In this paper, we propose TorchBench, a novel benchmark suite to study theperformance of PyTorch software stack. Unlike existing benchmark suites,TorchBench encloses many representative models, covering a large PyTorch APIsurface. TorchBench is able to comprehensively characterize the performance ofthe PyTorch software stack, guiding the performance optimization across models,PyTorch framework, and GPU libraries. We show two practical use cases ofTorchBench. (1) We profile TorchBench to identify GPU performanceinefficiencies in PyTorch. We are able to optimize many performance bugs andupstream patches to the official PyTorch repository. (2) We integrateTorchBench into PyTorch continuous integration system. We are able to identifyperformance regression in multiple daily code checkins to prevent PyTorchrepository from introducing performance bugs. TorchBench is open source andkeeps evolving.", "output": "TorchBench: Benchmarking PyTorch with High API Surface Coverage."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many techniques in machine learning attempt explicitly or implicitly to infera low-dimensional manifold structure of an underlying physical phenomenon frommeasurements without an explicit model of the phenomenon or the measurementapparatus. This paper presents a cautionary tale regarding the discrepancybetween the geometry of measurements and the geometry of the underlyingphenomenon in a benign setting. The deformation in the metric illustrated inthis paper is mathematically straightforward and unavoidable in the generalcase, and it is only one of several similar effects. While this is not alwaysproblematic, we provide an example of an arguably standard and harmless dataprocessing procedure where this effect leads to an incorrect answer to aseemingly simple question. Although we focus on manifold learning, these issuesapply broadly to dimensionality reduction and unsupervised learning.", "output": "On Manifold Learning in Plato's Cave: Remarks on Manifold Learning and Physical Phenomena."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Variational Bayes is a popular method for approximate inference but itsderivation can be cumbersome. To simplify the process, we give a 3-step recipeto identify the posterior form by explicitly looking for linearity with respectto expectations of well-known distributions. We can then directly write theupdate by simply ``reading-off'' the terms in front of those expectations. Therecipe makes the derivation easier, faster, shorter, and more general.", "output": "Variational Bayes Made Easy."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous driving services rely heavily on sensors such as cameras, LiDAR,radar, and communication modules. A common practice of processing the senseddata is using a high-performance computing unit placed inside the vehicle,which deploys AI models and algorithms to act as the brain or administrator ofthe vehicle. The vehicular data generated from average hours of driving can beup to 20 Terabytes depending on the data rate and specification of the sensors.Given the scale and fast growth of services for autonomous driving, it isessential to improve the overall energy and environmental efficiency,especially in the trend towards vehicular electrification (e.g.,battery-powered). Although the areas have seen significant advancements insensor technologies, wireless communications, computing and AI/ML algorithms,the challenge still exists in how to apply and integrate those technologyinnovations to achieve energy efficiency. This survey reviews and compares theconnected vehicular applications, vehicular communications, approximation andEdge AI techniques. The focus is on energy efficiency by covering newlyproposed approximation and enabling frameworks. To the best of our knowledge,this survey is the first to review the latest approximate Edge AI frameworksand publicly available datasets in energy-efficient autonomous driving. Theinsights and vision from this survey can be beneficial for the collaborativedriving service development on low-power and memory-constrained systems andalso for the energy optimization of autonomous vehicles.", "output": "A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Homophily principle, i.e. nodes with the same labels are more likely to beconnected, was believed to be the main reason for the performance superiorityof Graph Neural Networks (GNNs) over Neural Networks (NNs) on NodeClassification (NC) tasks. Recently, people have developed theoretical resultsarguing that, even though the homophily principle is broken, the advantage ofGNNs can still hold as long as nodes from the same class share similarneighborhood patterns, which questions the validity of homophily. However, thisargument only considers intra-class Node Distinguishability (ND) and ignoresinter-class ND, which is insufficient to study the effect of homophily. In thispaper, we first demonstrate the aforementioned insufficiency with examples andargue that an ideal situation for ND is to have smaller intra-class ND thaninter-class ND. To formulate this idea and have a better understanding ofhomophily, we propose Contextual Stochastic Block Model for Homophily (CSBM-H)and define two metrics, Probabilistic Bayes Error (PBE) and Expected NegativeKL-divergence (ENKL), to quantify ND, through which we can also find how intra-and inter-class ND influence ND together. We visualize the results and givedetailed analysis. Through experiments, we verified that the superiority ofGNNs is indeed closely related to both intra- and inter-class ND regardless ofhomophily levels, based on which we define Kernel Performance Metric (KPM). KPMis a new non-linear, feature-based metric, which is tested to be more effectivethan the existing homophily metrics on revealing the advantage and disadvantageof GNNs on synthetic and real-world datasets.", "output": "When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Semantic knowledge of part-part and part-whole relationships in assemblies isuseful for a variety of tasks from searching design repositories to theconstruction of engineering knowledge bases. In this work we propose that thenatural language names designers use in Computer Aided Design (CAD) softwareare a valuable source of such knowledge, and that Large Language Models (LLMs)contain useful domain-specific information for working with this data as wellas other CAD and engineering-related tasks.In particular we extract and clean a large corpus of natural language part,feature and document names and use this to quantitatively demonstrate that apre-trained language model can outperform numerous benchmarks on threeself-supervised tasks, without ever having seen this data before. Moreover, weshow that fine-tuning on the text data corpus further boosts the performance onall tasks, thus demonstrating the value of the text data which until now hasbeen largely ignored. We also identify key limitations to using LLMs with textdata alone, and our findings provide a strong motivation for further work intomulti-modal text-geometry models.To aid and encourage further work in this area we make all our data and codepublicly available.", "output": "What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Background: Recently, ChatGPT and similar generative AI models have attractedhundreds of millions of users and become part of the public discourse. Manybelieve that such models will disrupt society and will result in a significantchange in the education system and information generation in the future. Sofar, this belief is based on either colloquial evidence or benchmarks from theowners of the models -- both lack scientific rigour.Objective: Through a large-scale study comparing human-written versusChatGPT-generated argumentative student essays, we systematically assess thequality of the AI-generated content.Methods: A large corpus of essays was rated using standard criteria by alarge number of human experts (teachers). We augment the analysis with aconsideration of the linguistic characteristics of the generated essays.Results: Our results demonstrate that ChatGPT generates essays that are ratedhigher for quality than human-written essays. The writing style of the AImodels exhibits linguistic characteristics that are different from those of thehuman-written essays, e.g., it is characterized by fewer discourse andepistemic markers, but more nominalizations and greater lexical diversity.Conclusions: Our results clearly demonstrate that models like ChatGPToutperform humans in generating argumentative essays. Since the technology isreadily available for anyone to use, educators must act immediately. We mustre-invent homework and develop teaching concepts that utilize these AI modelsin the same way as math utilized the calculator: teach the general conceptsfirst and then use AI tools to free up time for other learning objectives.", "output": "AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current space-based missions, such as the Transiting Exoplanet SurveySatellite (TESS), provide a large database of light curves that must beanalysed efficiently and systematically. In recent years, deep learning (DL)methods, particularly convolutional neural networks (CNN), have been used toclassify transit signals of candidate exoplanets automatically. However, CNNshave some drawbacks; for example, they require many layers to capturedependencies on sequential data, such as light curves, making the network solarge that it eventually becomes impractical. The self-attention mechanism is aDL technique that attempts to mimic the action of selectively focusing on somerelevant things while ignoring others. Models, such as the Transformerarchitecture, were recently proposed for sequential data with successfulresults. Based on these successful models, we present a new architecture forthe automatic classification of transit signals. Our proposed architecture isdesigned to capture the most significant features of a transit signal andstellar parameters through the self-attention mechanism. In addition to modelprediction, we take advantage of attention map inspection, obtaining a moreinterpretable DL approach. Thus, we can identify the relevance of each elementto differentiate a transit signal from false positives, simplifying the manualexamination of candidates. We show that our architecture achieves competitiveresults concerning the CNNs applied for recognizing exoplanetary transitsignals in data from the TESS telescope. Based on these results, we demonstratethat applying this state-of-the-art DL model to light curves can be a powerfultechnique for transit signal detection while offering a level ofinterpretability.", "output": "Distinguishing a planetary transit from false positives: a Transformer-based classification for planetary transit signals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large language models generate fluent texts and can follow natural languageinstructions to solve a wide range of tasks without task-specific training.Nevertheless, it is notoriously difficult to control their generation tosatisfy the various constraints required by different applications. In thiswork, we present InstructCTG, a controlled text generation framework thatincorporates different constraints by conditioning on natural languagedescriptions and demonstrations of the constraints. In particular, we firstextract the underlying constraints of natural texts through a combination ofoff-the-shelf NLP tools and simple heuristics. We then verbalize theconstraints into natural language instructions to form weakly supervisedtraining data. By prepending natural language descriptions of the constraintsand a few demonstrations, we fine-tune a pre-trained language model toincorporate various types of constraints. Compared to existing search-based orscore-based methods, InstructCTG is more flexible to different constraint typesand has a much smaller impact on the generation quality and speed because itdoes not modify the decoding procedure. Additionally, InstructCTG allows themodel to adapt to new constraints without re-training through the use offew-shot task generalization and in-context learning abilities ofinstruction-tuned language models.", "output": "Controlled Text Generation with Natural Language Instructions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional models of glucose-insulin dynamics rely on heuristicparameterizations chosen to fit observations within a laboratory setting.However, these models cannot describe glucose dynamics in daily life. Onesource of failure is in their descriptions of glucose absorption rates aftermeal events. A meal's macronutritional content has nuanced effects on theabsorption profile, which is difficult to model mechanistically. In this paper,we propose to learn the effects of macronutrition content from glucose-insulindata and meal covariates. Given macronutrition information and meal times, weuse a neural network to predict an individual's glucose absorption rate. We usethis neural rate function as the control function in a differential equation ofglucose dynamics, enabling end-to-end training. On simulated data, our approachis able to closely approximate true absorption rates, resulting in betterforecast than heuristic parameterizations, despite only observing glucose,insulin, and macronutritional information. Our work readily generalizes to mealevents with higher-dimensional covariates, such as images, setting the stagefor glucose dynamics models that are personalized to each individual's dailylife.", "output": "Learning Absorption Rates in Glucose-Insulin Dynamics from Meal Covariates."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Understanding the structure, quantity, and type of snow in mountainlandscapes is crucial for assessing avalanche safety, interpreting satelliteimagery, building accurate hydrology models, and choosing the right pair ofskis for your weekend trip. Currently, such characteristics of snowpack aremeasured using a combination of remote satellite imagery, weather stations, andlaborious point measurements and descriptions provided by local forecasters,guides, and backcountry users. Here, we explore how characteristics of the toplayer of snowpack could be estimated while skiing using strain sensors mountedto the top surface of an alpine ski. We show that with two strain gauges and aninertial measurement unit it is feasible to correctly assign one of threequalitative labels (powder, slushy, or icy/groomed snow) to each 10 secondsegment of a trajectory with 97% accuracy, independent of skiing style. Ouralgorithm uses a combination of a data-driven linear model of the ski-snowinteraction, dimensionality reduction, and a Naive Bayes classifier.Comparisons of classifier performance between strain gauges suggest that theoptimal placement of strain gauges is halfway between the binding and thetip/tail of the ski, in the cambered section just before the point where theunweighted ski would touch the snow surface. The ability to classify snow,potentially in real-time, using skis opens the door to applications that rangefrom citizen science efforts to map snow surface characteristics in thebackcountry, and develop skis with automated stiffness tuning based on the snowtype.", "output": "A Method for Classifying Snow Using Ski-Mounted Strain Sensors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Mean plane measurements of the Kuiper Belt from observational data are ofinterest for their potential to test dynamical models of the solar system.Recent measurements have yielded inconsistent results. Here we report ameasurement of the Kuiper Belt's mean plane with a sample size more than twiceas large as in previous measurements. The sample of interest is thenon-resonant Kuiper belt objects, which we identify by using machine learningon the observed Kuiper Belt population whose orbits are well-determined. Weestimate the measurement error with a Monte Carlo procedure. We find that theoverall mean plane of the non-resonant Kuiper Belt (semimajor axis range 35-150au) and also that of the classical Kuiper Belt (semimajor axis range 42-48 au)are both close to (within about 0.7 degrees) but distinguishable from theinvariable plane of the solar system to greater than 99.7% confidence. Whenbinning the sample into smaller semimajor axis bins, we find the measured meanplane mostly consistent with both the invariable plane and the theoreticallyexpected Laplace surface forced by the known planets. Statistically significantdiscrepancies are found only in the semimajor axis ranges 40.3-42 au and 45-50au; these ranges are in proximity to a secular resonance and Neptune's 2:1 meanmotion resonance where the theory for the Laplace surface is likely to beinaccurate. These results do not support a previously reported anomalous warpat semimajor axes above 50 au.", "output": "A Measurement of the Kuiper Belt's Mean Plane From Objects Classified By Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study online learning in episodic constrained Markov decision processes(CMDPs), where the goal of the learner is to collect as much reward as possibleover the episodes, while guaranteeing that some long-term constraints aresatisfied during the learning process. Rewards and constraints can be selectedeither stochastically or adversarially, and the transition function is notknown to the learner. While online learning in classical unconstrained MDPs hasreceived considerable attention over the last years, the setting of CMDPs isstill largely unexplored. This is surprising, since in real-world applications,such as, e.g., autonomous driving, automated bidding, and recommender systems,there are usually additional constraints and specifications that an agent hasto obey during the learning process. In this paper, we provide the firstbest-of-both-worlds algorithm for CMDPs with long-term constraints. Ouralgorithm is capable of handling settings in which rewards and constraints areselected either stochastically or adversarially, without requiring anyknowledge of the underling process. Moreover, our algorithm matchesstate-of-the-art regret and constraint violation bounds for settings in whichconstraints are selected stochastically, while it is the first to provideguarantees in the case in which they are chosen adversarially.", "output": "A Best-of-Both-Worlds Algorithm for Constrained MDPs with Long-Term Constraints."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning systems, especially with overparameterized deep neuralnetworks, can generalize to novel test instances drawn from the samedistribution as the training data. However, they fare poorly when evaluated onout-of-support test points. In this work, we tackle the problem of developingmachine learning systems that retain the power of overparameterized functionapproximators while enabling extrapolation to out-of-support test points whenpossible. This is accomplished by noting that under certain conditions, a\"transductive\" reparameterization can convert an out-of-support extrapolationproblem into a problem of within-support combinatorial generalization. Wepropose a simple strategy based on bilinear embeddings to enable this type ofcombinatorial generalization, thereby addressing the out-of-supportextrapolation problem under certain conditions. We instantiate a simple,practical algorithm applicable to various supervised learning and imitationlearning tasks.", "output": "Learning to Extrapolate: A Transductive Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We analyze the generalization ability of joint-training meta learningalgorithms via the Gibbs algorithm. Our exact characterization of the expectedmeta generalization error for the meta Gibbs algorithm is based on symmetrizedKL information, which measures the dependence between all meta-trainingdatasets and the output parameters, including task-specific and metaparameters. Additionally, we derive an exact characterization of the metageneralization error for the super-task Gibbs algorithm, in terms ofconditional symmetrized KL information within the super-sample and super-taskframework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi(2022) respectively. Our results also enable us to provide noveldistribution-free generalization error upper bounds for these Gibbs algorithmsapplicable to meta learning.", "output": "On the Generalization Error of Meta Learning for the Gibbs Algorithm."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The goal of this paper is to learn more about how idiomatic information isstructurally encoded in embeddings, using a structural probing method. Werepurpose an existing English verbal multi-word expression (MWE) dataset tosuit the probing framework and perform a comparative probing study of static(GloVe) and contextual (BERT) embeddings. Our experiments indicate that bothencode some idiomatic information to varying degrees, but yield conflictingevidence as to whether idiomaticity is encoded in the vector norm, leaving thisan open question. We also identify some limitations of the used dataset andhighlight important directions for future work in improving its suitability fora probing analysis.", "output": "Idioms, Probing and Dangerous Things: Towards Structural Probing for Idiomaticity in Vector Space."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes our system for SemEval-2023 Task 3 Subtask 2 on FramingDetection. We used a multi-label contrastive loss for fine-tuning largepre-trained language models in a multi-lingual setting, achieving verycompetitive results: our system was ranked first on the official test set andon the official shared task leaderboard for five of the six languages for whichwe had training data and for which we could perform fine-tuning. Here, wedescribe our experimental setup, as well as various ablation studies. The codeof our system is available at ", "output": "MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label Framing Detection with Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As deep learning technology advances and more urban spatial-temporal dataaccumulates, an increasing number of deep learning models are being proposed tosolve urban spatial-temporal prediction problems. However, there arelimitations in the existing field, including open-source data being in variousformats and difficult to use, few papers making their code and data openlyavailable, and open-source models often using different frameworks andplatforms, making comparisons challenging. A standardized framework is urgentlyneeded to implement and evaluate these methods. To address these issues, weprovide a comprehensive review of urban spatial-temporal prediction and proposea unified storage format for spatial-temporal data called atomic files. We alsopropose LibCity, an open-source library that offers researchers a credibleexperimental tool and a convenient development framework. In this library, wehave reproduced 65 spatial-temporal prediction models and collected 55spatial-temporal datasets, allowing researchers to conduct comprehensiveexperiments conveniently. Using LibCity, we conducted a series of experimentsto validate the effectiveness of different models and components, and wesummarized promising future technology developments and research directions forspatial-temporal prediction. By enabling fair model comparisons, designing aunified data storage format, and simplifying the process of developing newmodels, LibCity is poised to make significant contributions to thespatial-temporal prediction field.", "output": "Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the launch of ChatGPT, Large Language Models (LLMs) are shaking up ourwhole society, rapidly altering the way we think, create and live. Forinstance, the GPT integration in Bing has altered our approach to onlinesearching. While nascent LLMs have many advantages, new legal and ethical risksare also emerging, stemming in particular from stochastic parrots andhallucination. The EU is the first and foremost jurisdiction that has focusedon the regulation of AI models. However, the risks posed by the new LLMs arelikely to be underestimated by the emerging EU regulatory paradigm. Therefore,this correspondence warns that the European AI regulatory paradigm must evolvefurther to mitigate such risks.", "output": "The Dark Side of ChatGPT: Legal and Ethical Challenges from Stochastic Parrots and Hallucination."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A wave of new task-based virtual assistants has been fueled by increasinglypowerful large language models, such as GPT-4. These conversational agents canbe customized to serve customer-specific use cases, but ensuring thatagent-generated text conforms to designer-specified rules included in promptinstructions alone is challenging. Therefore, chatbot designers often useanother model, called a guardrail model, to verify that the agent output alignswith their rules and constraints. We explore using a distillation approach toguardrail models to monitor the output of the first model using training datafrom GPT-4. We find two crucial steps to our CONSCENDI process:scenario-augmented generation and contrastive training examples. Whengenerating conversational data, we generate a set of rule-breaking scenarios,which enumerate a diverse set of high-level ways a rule can be violated. Thisscenario-guided approach produces a diverse training set of rule-violatingconversations, and it provides chatbot designers greater control over theclassification process. We also prompt GPT-4 to also generate contrastiveexamples by altering conversations with violations into acceptableconversations. This set of borderline, contrastive examples enables thedistilled model to learn finer-grained distinctions between what is acceptableand what is not. We find that CONSCENDI results in guardrail models thatimprove over baselines.", "output": "CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a hybrid neural network (NN) and PDE approach for learninggeneralizable PDE dynamics from motion observations. Many NN approaches learnan end-to-end model that implicitly models both the governing PDE andconstitutive models (or material models). Without explicit PDE knowledge, theseapproaches cannot guarantee physical correctness and have limitedgeneralizability. We argue that the governing PDEs are often well-known andshould be explicitly enforced rather than learned. Instead, constitutive modelsare particularly suitable for learning due to their data-fitting nature. Tothis end, we introduce a new framework termed \"Neural Constitutive Laws\"(NCLaw), which utilizes a network architecture that strictly guaranteesstandard constitutive priors, including rotation equivariance and undeformedstate equilibrium. We embed this network inside a differentiable simulation andtrain the model by minimizing a loss function based on the difference betweenthe simulation and the motion observation. We validate NCLaw on variouslarge-deformation dynamical systems, ranging from solids to fluids. Aftertraining on a single motion trajectory, our method generalizes to newgeometries, initial/boundary conditions, temporal ranges, and evenmulti-physics systems. On these extremely out-of-distribution generalizationtasks, NCLaw is orders-of-magnitude more accurate than previous NN approaches.Real-world experiments demonstrate our method's ability to learn constitutivelaws from videos.", "output": "Learning Neural Constitutive Laws From Motion Observations for Generalizable PDE Dynamics."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural fields are neural networks which map coordinates to a desired signal.When a neural field should jointly model multiple signals, and not memorizeonly one, it needs to be conditioned on a latent code which describes thesignal at hand. Despite being an important aspect, there has been littleresearch on conditioning strategies for neural fields. In this work, we explorethe use of neural fields as decoders for 2D semantic segmentation. For thistask, we compare three conditioning methods, simple concatenation of the latentcode, Feature Wise Linear Modulation (FiLM), and Cross-Attention, inconjunction with latent codes which either describe the full image or only alocal region of the image. Our results show a considerable difference inperformance between the examined conditioning strategies. Furthermore, we showthat conditioning via Cross-Attention achieves the best results and iscompetitive with a CNN-based decoder for semantic segmentation.", "output": "Neural Field Conditioning Strategies for 2D Semantic Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pseudo-Hamiltonian neural networks (PHNN) were recently introduced forlearning dynamical systems that can be modelled by ordinary differentialequations. In this paper, we extend the method to partial differentialequations. The resulting model is comprised of up to three neural networks,modelling terms representing conservation, dissipation and external forces, anddiscrete convolution operators that can either be learned or be priorknowledge. We demonstrate numerically the superior performance of PHNN comparedto a baseline model that models the full dynamics by a single neural network.Moreover, since the PHNN model consists of three parts with different physicalinterpretations, these can be studied separately to gain insight into thesystem, and the learned model is applicable also if external forces are removedor changed.", "output": "Pseudo-Hamiltonian neural networks for learning partial differential equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Nowadays many real-world datasets can be considered as functional, in thesense that the processes which generate them are continuous. A fundamentalproperty of this type of data is that in theory they belong to aninfinite-dimensional space. Although in practice we usually receive finiteobservations, they are still high-dimensional and hence dimensionalityreduction methods are crucial. In this vein, the main state-of-the-art methodfor functional data analysis is Functional PCA. Nevertheless, this classictechnique assumes that the data lie in a linear manifold, and hence it couldhave problems when this hypothesis is not fulfilled. In this research,attention has been placed on a non-linear manifold learning method: DiffusionMaps. The article explains how to extend this multivariate method to functionaldata and compares its behavior against Functional PCA over different simulatedand real examples.", "output": "Functional Diffusion Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Analogical Networks, a model that encodes domain knowledgeexplicitly, in a collection of structured labelled 3D scenes, in addition toimplicitly, as model parameters, and segments 3D object scenes with analogicalreasoning: instead of mapping a scene to part segments directly, our modelfirst retrieves related scenes from memory and their corresponding partstructures, and then predicts analogous part structures for the input scene,via an end-to-end learnable modulation mechanism. By conditioning on more thanone retrieved memories, compositions of structures are predicted, that mix andmatch parts across the retrieved memories. One-shot, few-shot or many-shotlearning are treated uniformly in Analogical Networks, by conditioning on theappropriate set of memories, whether taken from a single, few or many memoryexemplars, and inferring analogous parses. We show Analogical Networks arecompetitive with state-of-the-art 3D segmentation transformers in many-shotsettings, and outperform them, as well as existing paradigms of meta-learningand few-shot learning, in few-shot settings. Analogical Networks successfullysegment instances of novel object categories simply by expanding their memory,without any weight updates. Our code and models are publicly available in theproject webpage: ", "output": "Analogy-Forming Transformers for Few-Shot 3D Parsing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider a novel dynamic pricing and learning setting where in addition tosetting prices of products in sequential rounds, the seller also ex-antecommits to 'advertising schemes'. That is, in the beginning of each round theseller can decide what kind of signal they will provide to the buyer about theproduct's quality upon realization. Using the popular Bayesian persuasionframework to model the effect of these signals on the buyers' valuation andpurchase responses, we formulate the problem of finding an optimal design ofthe advertising scheme along with a pricing scheme that maximizes the seller'sexpected revenue. Without any apriori knowledge of the buyers' demand function,our goal is to design an online algorithm that can use past purchase responsesto adaptively learn the optimal pricing and advertising strategy. We study theregret of the algorithm when compared to the optimal clairvoyant price andadvertising scheme.Our main result is a computationally efficient online algorithm that achievesan $O(T^{2/3}(mlog T)^{1/3})$ regret bound when the valuation function islinear in the product quality. Here $m$ is the cardinality of the discreteproduct quality domain and $T$ is the time horizon. This result requires somenatural monotonicity and Lipschitz assumptions on the valuation function, butno Lipschitz or smoothness assumption on the buyers' demand function. Forconstant $m$, our result matches the regret lower bound for dynamic pricingwithin logarithmic factors, which is a special case of our problem. We alsoobtain several improved results for the widely considered special case ofadditive valuations, including an $tilde{O}(T^{2/3})$ regret bound independentof $m$ when $mle T^{1/3}$.", "output": "Dynamic Pricing and Learning with Bayesian Persuasion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Annealed Importance Sampling (AIS) moves particles along a Markov chain froma tractable initial distribution to an intractable target distribution. Therecently proposed Differentiable AIS (DAIS) (Geffner and Domke, 2021; Zhang etal., 2021) enables efficient optimization of the transition kernels of AIS andof the distributions. However, we observe a low effective sample size in DAIS,indicating degenerate distributions. We thus propose to extend DAIS by aresampling step inspired by Sequential Monte Carlo. Surprisingly, we findempirically-and can explain theoretically-that it is not necessary todifferentiate through the resampling step which avoids gradient variance issuesobserved in similar approaches for Particle Filters (Maddison et al., 2017;Naesseth et al., 2018; Le et al., 2018).", "output": "Resampling Gradients Vanish in Differentiable Sequential Monte Carlo Samplers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is compositional; an instruction can express multiple relationconstraints to hold among objects in a scene that a robot is tasked torearrange. Our focus in this work is an instructable scene rearrangingframework that generalizes to longer instructions and to spatial conceptcompositions never seen at training time. We propose to representlanguage-instructed spatial concepts with energy functions over relative objectarrangements. A language parser maps instructions to corresponding energyfunctions and an open-vocabulary visual-language model grounds their argumentsto relevant objects in the scene. We generate goal scene configurations bygradient descent on the sum of energy functions, one per language predicate inthe instruction. Local vision-based policies then relocate objects to theinferred goal locations. We test our model on established instruction-guidedmanipulation benchmarks, as well as benchmarks of compositional instructions weintroduce. We show our model can execute highly compositional instructionszero-shot in simulation and in the real world. It outperformslanguage-to-action reactive policies and Large Language Model planners by alarge margin, especially for long instructions that involve compositions ofmultiple spatial concepts.", "output": "Energy-based Models as Zero-Shot Planners for Compositional Scene Rearrangement."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Learning (DL) can diagnose faults and assess machine health from rawcondition monitoring data without manually designed statistical features.However, practical manufacturing applications remain extremely difficult forexisting DL methods. Machine data is often unlabeled and from very few healthconditions (e.g., only normal operating data). Furthermore, models oftenencounter shifts in domain as process parameters change and new categories offaults emerge. Traditional supervised learning may struggle to learn compact,discriminative representations that generalize to these unseen target domainssince it depends on having plentiful classes to partition the feature spacewith decision boundaries. Transfer Learning (TL) with domain adaptationattempts to adapt these models to unlabeled target domains but assumes similarunderlying structure that may not be present if new faults emerge. This studyproposes focusing on maximizing the feature generality on the source domain andapplying TL via weight transfer to copy the model to the target domain.Specifically, Self-Supervised Learning (SSL) with Barlow Twins may produce morediscriminative features for monitoring health condition than supervisedlearning by focusing on semantic properties of the data. Furthermore, FederatedLearning (FL) for distributed training may also improve generalization byefficiently expanding the effective size and diversity of training data bysharing information across multiple client machines. Results show that BarlowTwins outperforms supervised learning in an unlabeled target domain withemerging motor faults when the source training data contains very few distinctcategories. Incorporating FL may also provide a slight advantage by diffusingknowledge of health conditions between machines.", "output": "Maximizing Model Generalization for Manufacturing with Self-Supervised Learning and Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "StyleGAN's disentangled style representation enables powerful image editingby manipulating the latent variables, but accurately mapping real-world imagesto their latent variables (GAN inversion) remains a challenge. Existing GANinversion methods struggle to maintain editing directions and produce realisticresults.To address these limitations, we propose Make It So, a novel GAN inversionmethod that operates in the $mathcal{Z}$ (noise) space rather than the typical$mathcal{W}$ (latent style) space. Make It So preserves editing capabilities,even for out-of-domain images. This is a crucial property that was overlookedin prior methods. Our quantitative evaluations demonstrate that Make It Sooutperforms the state-of-the-art method PTI~cite{roich2021pivotal} by a factorof five in inversion accuracy and achieves ten times better edit quality forcomplex indoor scenes.", "output": "Make It So: Steering StyleGAN for Any Image Inversion and Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of inferring scene affordances by presenting a methodfor realistically inserting people into scenes. Given a scene image with amarked region and an image of a person, we insert the person into the scenewhile respecting the scene affordances. Our model can infer the set ofrealistic poses given the scene context, re-pose the reference person, andharmonize the composition. We set up the task in a self-supervised fashion bylearning to re-pose humans in video clips. We train a large-scale diffusionmodel on a dataset of 2.4M video clips that produces diverse plausible poseswhile respecting the scene context. Given the learned human-scene composition,our model can also hallucinate realistic people and scenes when promptedwithout conditioning and also enables interactive editing. A quantitativeevaluation shows that our method synthesizes more realistic human appearanceand more natural human-scene interactions than prior work.", "output": "Putting People in Their Place: Affordance-Aware Human Insertion into Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel clustering mechanism based on an incompatibility propertybetween subsets of data that emerges during model training. This mechanismpartitions the dataset into subsets that generalize only to themselves, i.e.,training on one subset does not improve performance on the other subsets.Leveraging the interaction between the dataset and the training process, ourclustering mechanism partitions datasets into clusters that are defined by--andtherefore meaningful to--the objective of the training process.We apply our clustering mechanism to defend against data poisoning attacks,in which the attacker injects malicious poisoned data into the training datasetto affect the trained model's output. Our evaluation focuses on backdoorattacks against deep neural networks trained to perform image classificationusing the GTSRB and CIFAR-10 datasets. Our results show that (1) these attacksproduce poisoned datasets in which the poisoned and clean data are incompatibleand (2) our technique successfully identifies (and removes) the poisoned data.In an end-to-end evaluation, our defense reduces the attack success rate tobelow 1% on 134 out of 165 scenarios, with only a 2% drop in clean accuracy onCIFAR-10 and a negligible drop in clean accuracy on GTSRB.", "output": "Incompatibility Clustering as a Defense Against Backdoor Poisoning Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Image-to-image translation (i2i) networks suffer from entanglement effects inpresence of physics-related phenomena in target domain (such as occlusions,fog, etc), lowering altogether the translation quality, controllability andvariability. In this paper, we propose a general framework to disentanglevisual traits in target images. Primarily, we build upon collection of simplephysics models, guiding the disentanglement with a physical model that renderssome of the target traits, and learning the remaining ones. Because physicsallows explicit and interpretable outputs, our physical models (optimallyregressed on target) allows generating unseen scenarios in a controllablemanner. Secondarily, we show the versatility of our framework to neural-guideddisentanglement where a generative network is used in place of a physical modelin case the latter is not directly accessible. Altogether, we introduce threestrategies of disentanglement being guided from either a fully differentiablephysics model, a (partially) non-differentiable physics model, or a neuralnetwork. The results show our disentanglement strategies dramatically increaseperformances qualitatively and quantitatively in several challenging scenariosfor image translation.", "output": "Physics-informed Guided Disentanglement in Generative Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we study zeroth-order algorithms for nonconvex-concave minimaxproblems, which have attracted widely attention in machine learning, signalprocessing and many other fields in recent years. We propose a zeroth-orderalternating randomized gradient projection (ZO-AGP) algorithm for smoothnonconvex-concave minimax problems, and its iteration complexity to obtain an$varepsilon$-stationary point is bounded by $mathcal{O}(varepsilon^{-4})$,and the number of function value estimation is bounded by$mathcal{O}(d_{x}+d_{y})$ per iteration. Moreover, we propose a zeroth-orderblock alternating randomized proximal gradient algorithm (ZO-BAPG) for solvingblock-wise nonsmooth nonconvex-concave minimax optimization problems, and theiteration complexity to obtain an $varepsilon$-stationary point is bounded by$mathcal{O}(varepsilon^{-4})$ and the number of function value estimation periteration is bounded by $mathcal{O}(K d_{x}+d_{y})$. To the best of ourknowledge, this is the first time that zeroth-order algorithms with iterationcomplexity gurantee are developed for solving both general smooth andblock-wise nonsmooth nonconvex-concave minimax problems. Numerical results ondata poisoning attack problem validate the efficiency of the proposedalgorithms.", "output": "Derivative-free Alternating Projection Algorithms for General Nonconvex-Concave Minimax Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of minimizing a convex function that is evolvingaccording to unknown and possibly stochastic dynamics, which may depend jointlyon time and on the decision variable itself. Such problems abound in themachine learning and signal processing literature, under the names of conceptdrift, stochastic tracking, and performative prediction. We provide novelnon-asymptotic convergence guarantees for stochastic algorithms with iterateaveraging, focusing on bounds valid both in expectation and with highprobability. The efficiency estimates we obtain clearly decouple thecontributions of optimization error, gradient noise, and time drift. Notably,we identify a low drift-to-noise regime in which the tracking efficiency of theproximal stochastic gradient method benefits significantly from a step decayschedule. Numerical experiments illustrate our results.", "output": "Stochastic Optimization under Distributional Drift."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Currently, there is a burgeoning demand for deploying deep learning (DL)models on ubiquitous edge Internet of Things (IoT) devices attributed to theirlow latency and high privacy preservation. However, DL models are often largein size and require large-scale computation, which prevents them from beingplaced directly onto IoT devices, where resources are constrained and 32-bitfloating-point (float-32) operations are unavailable. Commercial framework(i.e., a set of toolkits) empowered model quantization is a pragmatic solutionthat enables DL deployment on mobile devices and embedded systems byeffortlessly post-quantizing a large high-precision model (e.g., float-32) intoa small low-precision model (e.g., int-8) while retaining the model inferenceaccuracy. However, their usability might be threatened by securityvulnerabilities.This work reveals that the standard quantization toolkits can be abused toactivate a backdoor. We demonstrate that a full-precision backdoored modelwhich does not have any backdoor effect in the presence of a trigger -- as thebackdoor is dormant -- can be activated by the default i) TensorFlow-Lite(TFLite) quantization, the only product-ready quantization framework to date,and ii) the beta released PyTorch Mobile framework. When each of the float-32models is converted into an int-8 format model through the standard TFLite orPytorch Mobile framework's post-training quantization, the backdoor isactivated in the quantized model, which shows a stable attack success rateclose to 100% upon inputs with the trigger, while it behaves normally uponnon-trigger inputs. This work highlights that a stealthy security threat occurswhen an end user utilizes the on-device post-training model quantizationframeworks, informing security researchers of cross-platform overhaul of DLmodels post quantization even if these models pass front-end backdoorinspections.", "output": "Quantization Backdoors to Deep Learning Commercial Frameworks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning is an attractive approach to learn good resourceallocation and scheduling policies based on data when the system model isunknown. However, the cumulative regret of most RL algorithms scales as $tildeO(mathsf{S} sqrt{mathsf{A} T})$, where $mathsf{S}$ is the size of the statespace, $mathsf{A}$ is the size of the action space, $T$ is the horizon, andthe $tilde{O}(cdot)$ notation hides logarithmic terms. Due to the lineardependence on the size of the state space, these regret bounds areprohibitively large for resource allocation and scheduling problems. In thispaper, we present a model-based RL algorithm for such problems which hasscalable regret. In particular, we consider a restless bandit model, andpropose a Thompson-sampling based learning algorithm which is tuned to theunderlying structure of the model. We present two characterizations of theregret of the proposed algorithm with respect to the Whittle index policy.First, we show that for a restless bandit with $n$ arms and at most $m$activations at each time, the regret scales either as $tilde{O}(mnsqrt{T})$or $tilde{O}(n^2 sqrt{T})$ depending on the reward model. Second, under anadditional technical assumption, we show that the regret scales as$tilde{O}(n^{1.5} sqrt{T})$ or $tilde{O}(max{msqrt{n}, n} sqrt{T})$. Wepresent numerical examples to illustrate the salient features of the algorithm.", "output": "On learning Whittle index policy for restless bandits with scalable regret."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents the design and results of the \"PEg TRAnsfert Workflowrecognition\" (PETRAW) challenge whose objective was to develop surgicalworkflow recognition methods based on one or several modalities, among video,kinematic, and segmentation data, in order to study their added value. ThePETRAW challenge provided a data set of 150 peg transfer sequences performed ona virtual simulator. This data set was composed of videos, kinematics, semanticsegmentation, and workflow annotations which described the sequences at threedifferent granularity levels: phase, step, and activity. Five tasks wereproposed to the participants: three of them were related to the recognition ofall granularities with one of the available modalities, while the othersaddressed the recognition with a combination of modalities. Averageapplication-dependent balanced accuracy (AD-Accuracy) was used as evaluationmetric to take unbalanced classes into account and because it is moreclinically relevant than a frame-by-frame score. Seven teams participated in atleast one task and four of them in all tasks. Best results are obtained withthe use of the video and the kinematics data with an AD-Accuracy between 93%and 90% for the four teams who participated in all tasks. The improvementbetween video/kinematic-based methods and the uni-modality ones was significantfor all of the teams. However, the difference in testing execution time betweenthe video/kinematic-based and the kinematic-based methods has to be taken intoconsideration. Is it relevant to spend 20 to 200 times more computing time forless than 3% of improvement? The PETRAW data set is publicly available atwww.synapse.org/PETRAW to encourage further research in surgical workflowrecognition.", "output": "PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The evaluation of explanation methods is a research topic that has not yetbeen explored deeply, however, since explainability is supposed to strengthentrust in artificial intelligence, it is necessary to systematically review andcompare explanation methods in order to confirm their correctness. Until now,no tool with focus on XAI evaluation exists that exhaustively and speedilyallows researchers to evaluate the performance of explanations of neuralnetwork predictions. To increase transparency and reproducibility in the field,we therefore built Quantus -- a comprehensive, evaluation toolkit in Pythonthat includes a growing, well-organised collection of evaluation metrics andtutorials for evaluating explainable methods. The toolkit has been thoroughlytested and is available under an open-source license on PyPi (or on", "output": "Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Providing natural language instructions in prompts is a useful new paradigmfor improving task performance of large language models in a zero-shot setting.Recent work has aimed to improve such prompts via manual rewriting orgradient-based tuning. However, manual rewriting is time-consuming and requiressubjective interpretation, while gradient-based tuning can be extremelycomputationally demanding for large models and may not be feasible forAPI-based models. In this work, we introduce Gradient-free Instructional PromptSearch (GrIPS), a gradient-free, edit-based search approach for improving taskinstructions for large language models. GrIPS takes in instructions designedfor humans and automatically returns an improved, edited prompt, while allowingfor API-based tuning. With InstructGPT models, GrIPS improves the average taskperformance by up to 4.30 percentage points on eight classification tasks fromthe Natural Instructions dataset (with similar improvements for OPT, BLOOM, andFLAN-T5). We see improvements for both instruction-only prompts and instruction+ k-shot examples prompts. Notably, GrIPS outperforms manual rewriting andpurely example-based prompts while controlling for the available compute anddata budget. Further, performance of GrIPS is comparable to selectgradient-based tuning approaches. Qualitatively, we show our edits can simplifyinstructions and at times make them incoherent but nonetheless improveaccuracy. Our code is available at: ", "output": "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, the prediction of quantum mechanical observables withmachine learning methods has become increasingly popular. Message-passingneural networks (MPNNs) solve this task by constructing atomic representations,from which the properties of interest are predicted. Here, we introduce amethod to automatically identify chemical moieties (molecular building blocks)from such representations, enabling a variety of applications beyond propertyprediction, which otherwise rely on expert knowledge. The requiredrepresentation can either be provided by a pretrained MPNN, or learned fromscratch using only structural information. Beyond the data-driven design ofmolecular fingerprints, the versatility of our approach is demonstrated byenabling the selection of representative entries in chemical databases, theautomatic construction of coarse-grained force fields, as well as theidentification of reaction coordinates.", "output": "Automatic Identification of Chemical Moieties."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective exploration is a challenge in reinforcement learning (RL).Novelty-based exploration methods can suffer in high-dimensional state spaces,such as continuous partially-observable 3D environments. We address thischallenge by defining novelty using semantically meaningful state abstractions,which can be found in learned representations shaped by natural language. Inparticular, we evaluate vision-language representations, pretrained on naturalimage captioning datasets. We show that these pretrained representations drivemeaningful, task-relevant exploration and improve performance on 3D simulatedenvironments. We also characterize why and how language provides usefulabstractions for exploration by considering the impacts of usingrepresentations from a pretrained model, a language oracle, and severalablations. We demonstrate the benefits of our approach in two very differenttask domains -- one that stresses the identification and manipulation ofeveryday objects, and one that requires navigational exploration in anexpansive world. Our results suggest that using language-shaped representationscould improve exploration for various algorithms and agents in challengingenvironments.", "output": "Semantic Exploration from Language Abstractions and Pretrained Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparsely-gated Mixture of Expert (MoE) layers have been recently successfullyapplied for scaling large transformers, especially for language modeling tasks.An intriguing side effect of sparse MoE layers is that they convey inherentinterpretability to a model via natural expert specialization. In this work, weapply sparse MoE layers to CNNs for computer vision tasks and analyze theresulting effect on model interpretability. To stabilize MoE training, wepresent both soft and hard constraint-based approaches. With hard constraints,the weights of certain experts are allowed to become zero, while softconstraints balance the contribution of experts with an additional auxiliaryloss. As a result, soft constraints handle expert utilization better andsupport the expert specialization process, while hard constraints maintain moregeneralized experts and increase overall model performance. Our findingsdemonstrate that experts can implicitly focus on individual sub-domains of theinput space. For example, experts trained for CIFAR-100 image classificationspecialize in recognizing different domains such as flowers or animals withoutprevious data clustering. Experiments with RetinaNet and the COCO datasetfurther indicate that object detection experts can also specialize in detectingobjects of distinct sizes.", "output": "Sparsely-gated Mixture-of-Expert Layers for CNN Interpretability."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern datasets often exhibit high dimensionality, yet the data reside inlow-dimensional manifolds that can reveal underlying geometric structurescritical for data analysis. A prime example of such a dataset is a collectionof cell cycle measurements, where the inherently cyclical nature of the processcan be represented as a circle or sphere. Motivated by the need to analyzethese types of datasets, we propose a nonlinear dimension reduction method,Spherical Rotation Component Analysis (SRCA), that incorporates geometricinformation to better approximate low-dimensional manifolds. SRCA is aversatile method designed to work in both high-dimensional and small samplesize settings. By employing spheres or ellipsoids, SRCA provides a low-rankspherical representation of the data with general theoretic guarantees,effectively retaining the geometric structure of the dataset duringdimensionality reduction. A comprehensive simulation study, along with asuccessful application to human cell cycle data, further highlights theadvantages of SRCA compared to state-of-the-art alternatives, demonstrating itssuperior performance in approximating the manifold while preserving inherentgeometric structures.", "output": "Spherical Rotation Dimension Reduction with Geometric Loss Functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Poverty is one of the fundamental issues that mankind faces. To solve povertyissues, one needs to know how severe the issue is. The Multidimensional PovertyIndex (MPI) is a well-known approach that is used to measure a degree ofpoverty issues in a given area. To compute MPI, it requires information of MPIindicators, which are textbf{binary variables} collecting by surveys, thatrepresent different aspects of poverty such as lacking of education, health,living conditions, etc. Inferring impacts of MPI indicators on MPI index can besolved by using traditional regression methods. However, it is not obvious thatwhether solving one MPI indicator might resolve or cause more issues in otherMPI indicators and there is no framework dedicating to infer empirical causalrelations among MPI indicators.In this work, we propose a framework to infer causal relations on binaryvariables in poverty surveys. Our approach performed better than baselinemethods in simulated datasets that we know ground truth as well as correctlyfound a causal relation in the Twin births dataset. In Thailand poverty surveydataset, the framework found a causal relation between smoking and alcoholdrinking issues. We provide R CRAN package `BiCausality' that can be used inany binary variables beyond the poverty analysis context.", "output": "Framework for inferring empirical causal graphs from binary data to support multidimensional poverty analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized medicine remains a major challenge for scientists. The rapidgrowth of Machine learning and Deep learning has made them a feasible al-ternative for predicting the most appropriate therapy for individual patients.However, the need to develop a custom model for every dataset, the lack ofinterpretation of their results and high computational requirements make manyreluctant to use these methods. Aiming to save time and bring light to the waymodels work internally, SIBILA has been developed. SIBILA is an ensemble ofmachine learning and deep learning models that applies a range ofinterpretability algorithms to identify the most relevant input features. Sincethe interpretability algo- rithms may not be in line with each other, aconsensus stage has been imple- mented to estimate the global attribution ofeach variable to the predictions. SIBILA is containerized to be run on anyhigh-performance computing plat- form. Although conceived as a command-linetool, it is also available to all users free of charge as a web server at Thus, even users with few technological skillscan take advantage of it. SIBILA has been applied to two medical case studiesto show its ability to predict in classification problems. Even though it is ageneral-purpose tool, it has been developed with the aim of becoming a powerfuldecision-making tool for clinicians, but can actually be used in many otherdomains. Thus, other two non-medical examples are supplied as supplementarymaterial to prove that SIBILA still works well with noise and in regressionproblems.", "output": "SIBILA: A novel interpretable ensemble of general-purpose machine learning models applied to medical contexts."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this work, we carry out the first, in-depth, privacy analysis ofDecentralized Learning -- a collaborative machine learning framework aimed ataddressing the main limitations of federated learning. We introduce a suite ofnovel attacks for both passive and active decentralized adversaries. Wedemonstrate that, contrary to what is claimed by decentralized learningproposers, decentralized learning does not offer any security advantage overfederated learning. Rather, it increases the attack surface enabling any userin the system to perform privacy attacks such as gradient inversion, and evengain full control over honest users' local model. We also show that, given thestate of the art in protections, privacy-preserving configurations ofdecentralized learning require fully connected networks, losing any practicaladvantage over the federated setup and therefore completely defeating theobjective of the decentralized approach.", "output": "On the (In)security of Peer-to-Peer Decentralized Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A crucial challenge in reinforcement learning is to reduce the number ofinteractions with the environment that an agent requires to master a giventask. Transfer learning proposes to address this issue by re-using knowledgefrom previously learned tasks. However, determining which source task qualifiesas the most appropriate for knowledge extraction, as well as the choiceregarding which algorithm components to transfer, represent severe obstacles toits application in reinforcement learning. The goal of this paper is to addressthese issues with modular multi-source transfer learning techniques. Theproposed techniques automatically learn how to extract useful information fromsource tasks, regardless of the difference in state-action space and rewardfunction. We support our claims with extensive and challenging cross-domainexperiments for visual control.", "output": "Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Conditional Average Treatment Effects (CATE) estimation is one of the mainchallenges in causal inference with observational data. In addition to MachineLearning based-models, nonparametric estimators called meta-learners have beendeveloped to estimate the CATE with the main advantage of not restraining theestimation to a specific supervised learning method. This task becomes,however, more complicated when the treatment is not binary as some limitationsof the naive extensions emerge. This paper looks into meta-learners forestimating the heterogeneous effects of multi-valued treatments. We considerdifferent meta-learners, and we carry out a theoretical analysis of their errorupper bounds as functions of important parameters such as the number oftreatment levels, showing that the naive extensions do not always providesatisfactory results. We introduce and discuss meta-learners that perform wellas the number of treatments increases. We empirically confirm the strengths andweaknesses of those methods with synthetic and semi-synthetic datasets.", "output": "Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Muscle-actuated organisms are capable of learning an unparalleled diversityof dexterous movements despite their vast amount of muscles. Reinforcementlearning (RL) on large musculoskeletal models, however, has not been able toshow similar performance. We conjecture that ineffective exploration in largeoveractuated action spaces is a key problem. This is supported by the findingthat common exploration noise strategies are inadequate in synthetic examplesof overactuated systems. We identify differential extrinsic plasticity (DEP), amethod from the domain of self-organization, as being able to inducestate-space covering exploration within seconds of interaction. By integratingDEP into RL, we achieve fast learning of reaching and locomotion inmusculoskeletal systems, outperforming current approaches in all consideredtasks in sample efficiency and robustness.", "output": "DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Although Deep Neural Networks (DNNs) are incredibly effective in learningcomplex abstractions, they are susceptible to unintentionally learning spuriousartifacts from the training data. To ensure model transparency, it is crucialto examine the relationships between learned representations, as unintendedconcepts often manifest themselves to be anomalous to the desired task. In thiswork, we introduce DORA (Data-agnOstic Representation Analysis): the firstdata-agnostic framework for the analysis of the representation space of DNNs.Our framework employs the proposed Extreme-Activation (EA) distance measurebetween representations that utilizes self-explaining capabilities within thenetwork without accessing any data. We quantitatively validate the metric'scorrectness and alignment with human-defined semantic distances. The coherencebetween the EA distance and human judgment enables us to identifyrepresentations whose underlying concepts would be considered unnatural byhumans by identifying outliers in functional distance. Finally, we demonstratethe practical usefulness of DORA by analyzing and identifying artifactrepresentations in popular Computer Vision models.", "output": "DORA: Exploring outlier representations in Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deploying Machine learning (ML) on milliwatt-scale edge devices (tinyML) isgaining popularity due to recent breakthroughs in ML and Internet of Things(IoT). Most tinyML research focuses on model compression techniques that tradeaccuracy (and model capacity) for compact models to fit into the KB-sizedtiny-edge devices. In this paper, we show how such models can be enhanced bythe addition of an early exit intermediate classifier. If the intermediateclassifier exhibits sufficient confidence in its prediction, the network exitsearly thereby, resulting in considerable savings in time. Although early exitclassifiers have been proposed in previous work, these previous proposals focuson large networks, making their techniques suboptimal/impractical for tinyMLapplications. Our technique is optimized specifically for tiny-CNN sizedmodels. In addition, we present a method to alleviate the effect of networkoverthinking by leveraging the representations learned by the early exit. Weevaluate T-RecX on three CNNs from the MLPerf tiny benchmark suite for imageclassification, keyword spotting and visual wake word detection tasks. Ourresults show that T-RecX 1) improves the accuracy of baseline network, 2)achieves 31.58% average reduction in FLOPS in exchange for one percent accuracyacross all evaluated models. Furthermore, we show that our methods consistentlyoutperform popular prior works on the tiny-CNNs we evaluate.", "output": "T-RECX: Tiny-Resource Efficient Convolutional neural networks with early-eXit."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Approximate matching (AM) is a concept in digital forensics to determine thesimilarity between digital artifacts. An important use case of AM is thereliable and efficient detection of case-relevant data structures on ablacklist, if only fragments of the original are available. For instance, ifonly a cluster of indexed malware is still present during the digital forensicinvestigation, the AM algorithm shall be able to assign the fragment to theblacklisted malware. However, traditional AM functions like TLSH and ssdeepfail to detect files based on their fragments if the presented piece isrelatively small compared to the overall file size. A second well-known issuewith traditional AM algorithms is the lack of scaling due to theever-increasing lookup databases. We propose an improved matching algorithmbased on transformer models from the field of natural language processing. Wecall our approach Deep Learning Approximate Matching (DLAM). As a concept fromartificial intelligence (AI), DLAM gets knowledge of characteristic blacklistedpatterns during its training phase. Then DLAM is able to detect the patterns ina typically much larger file, that is DLAM focuses on the use case of fragmentdetection. We reveal that DLAM has three key advantages compared to theprominent conventional approaches TLSH and ssdeep. First, it makes the tediousextraction of known to be bad parts obsolete, which is necessary until nowbefore any search for them with AM algorithms. This allows efficientclassification of files on a much larger scale, which is important due toexponentially increasing data to be investigated. Second, depending on the usecase, DLAM achieves a similar or even significantly higher accuracy inrecovering fragments of blacklisted files. Third, we show that DLAM enables thedetection of file correlations in the output of TLSH and ssdeep even for smallfragment sizes.", "output": "Combining AI and AM - Improving Approximate Matching through Transformer Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we introduce Latent Go-Explore (LGE), a simple and generalapproach based on the Go-Explore paradigm for exploration in reinforcementlearning (RL). Go-Explore was initially introduced with a strong domainknowledge constraint for partitioning the state space into cells. However, inmost real-world scenarios, drawing domain knowledge from raw observations iscomplex and tedious. If the cell partitioning is not informative enough,Go-Explore can completely fail to explore the environment. We argue that theGo-Explore approach can be generalized to any environment without domainknowledge and without cells by exploiting a learned latent representation.Thus, we show that LGE can be flexibly combined with any strategy for learninga latent representation. Our results indicate that LGE, although simpler thanGo-Explore, is more robust and outperforms state-of-the-art algorithms in termsof pure exploration on multiple hard-exploration environments includingMontezuma's Revenge. The LGE implementation is available as open-source at", "output": "Cell-Free Latent Go-Explore."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This tutorial survey provides an overview of recent non-asymptotic advancesin statistical learning theory as relevant to control and systemidentification. While there has been substantial progress across all areas ofcontrol, the theory is most well-developed when it comes to linear systemidentification and learning for the linear quadratic regulator, which are thefocus of this manuscript. From a theoretical perspective, much of the laborunderlying these advances has been in adapting tools from modernhigh-dimensional statistics and learning theory. While highly relevant tocontrol theorists interested in integrating tools from machine learning, thefoundational material has not always been easily accessible. To remedy this, weprovide a self-contained presentation of the relevant material, outlining allthe key ideas and the technical machinery that underpin recent results. We alsopresent a number of open problems and future directions.", "output": "Statistical Learning Theory for Control: A Finite Sample Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The role of artificial intelligence (AI) in material science and engineering(MSE) is becoming increasingly important as AI technology advances. Thedevelopment of high-performance computing has made it possible to test deeplearning (DL) models with significant parameters, providing an opportunity toovercome the limitation of traditional computational methods, such as densityfunctional theory (DFT), in property prediction. Machine learning (ML)-basedmethods are faster and more accurate than DFT-based methods. Furthermore, thegenerative adversarial networks (GANs) have facilitated the generation ofchemical compositions of inorganic materials without using crystal structureinformation. These developments have significantly impacted materialengineering (ME) and research. Some of the latest developments in AI in MEherein are reviewed. First, the development of AI in the critical areas of ME,such as in material processing, the study of structure and material property,and measuring the performance of materials in various aspects, is discussed.Then, the significant methods of AI and their uses in MSE, such as graph neuralnetwork, generative models, transfer of learning, etc. are discussed. The useof AI to analyze the results from existing analytical instruments is alsodiscussed. Finally, AI's advantages, disadvantages, and future in ME arediscussed.", "output": "Artificial Intelligence in Material Engineering: A review on applications of AI in Material Engineering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the manufacturing process of heavy industrial equipment, the specific unitin the welding diagram is first manually redrawn and then the correspondingsheet metal parts are cut, which is inefficient. To this end, this paperproposes a U-net-based method for the segmentation and extraction of specificunits in welding engineering drawings. This method enables the cutting deviceto automatically segment specific graphic units according to visual informationand automatically cut out sheet metal parts of corresponding shapes accordingto the segmentation results. This process is more efficient than traditionalhuman-assisted cutting. Two weaknesses in the U-net network will lead to adecrease in segmentation performance: first, the focus on global semanticfeature information is weak, and second, there is a large dimensionaldifference between shallow encoder features and deep decoder features. Based onthe CBAM (Convolutional Block Attention Module) attention mechanism, this paperproposes a U-net jump structure model with an attention mechanism to improvethe network's global semantic feature extraction ability. In addition, a U-netattention mechanism model with dual pooling convolution fusion is designed, thedeep encoder's maximum pooling + convolution features and the shallow encoder'saverage pooling + convolution features are fused vertically to reduce thedimension difference between the shallow encoder and deep decoder. Thedual-pool convolutional attention jump structure replaces the traditional U-netjump structure, which can effectively improve the specific unit segmentationperformance of the welding engineering drawing. Using vgg16 as the backbonenetwork, experiments have verified that the IoU, mAP, and Accu of our model inthe welding engineering drawing dataset segmentation task are 84.72%, 86.84%,and 99.42%, respectively.", "output": "Segmentation method of U-net sheet metal engineering drawing based on CBAM attention mechanism."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the absence of artificial labels, the independent and dependent featuresin the data are cluttered. How to construct the inductive biases of the modelto flexibly divide and effectively contain features with different complexityis the main focal point of unsupervised disentangled representation learning.This paper proposes a new iterative decomposition path of total correlation andexplains the disentangled representation ability of VAE from the perspective ofmodel capacity allocation. The newly developed objective function combineslatent variable dimensions into joint distribution while relieving theindependence constraints of marginal distributions in combination, leading tolatent variables with a more manipulable prior distribution. The novel modelenables VAE to adjust the parameter capacity to divide dependent andindependent data features flexibly. Experimental results on various datasetsshow an interesting relevance between model capacity and the latent variablegrouping size, called the \"V\"-shaped best ELBO trajectory. Additionally, weempirically demonstrate that the proposed method obtains better disentanglingperformance with reasonable parameter capacity allocation.", "output": "Break The Spell Of Total Correlation In betaTCVAE."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Ensembles (DE) are a prominent approach for achieving excellentperformance on key metrics such as accuracy, calibration, uncertaintyestimation, and out-of-distribution detection. However, hardware limitations ofreal-world systems constrain to smaller ensembles and lower-capacity networks,significantly deteriorating their performance and properties. We introducePacked-Ensembles (PE), a strategy to design and train lightweight structuredensembles by carefully modulating the dimension of their encoding space. Weleverage grouped convolutions to parallelize the ensemble into a single sharedbackbone and forward pass to improve training and inference speeds. PE isdesigned to operate within the memory limits of a standard neural network. Ourextensive research indicates that PE accurately preserves the properties of DE,such as diversity, and performs equally well in terms of accuracy, calibration,out-of-distribution detection, and robustness to distribution shift. We makeour code available at ", "output": "Packed-Ensembles for Efficient Uncertainty Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study regret minimization for reinforcement learning (RL) in Latent MarkovDecision Processes (LMDPs) with context in hindsight. We design a novelmodel-based algorithmic framework which can be instantiated with both amodel-optimistic and a value-optimistic solver. We prove an$widetilde{O}left(sqrt{M Gamma S A K}right)$ regret bound where $M$ is thenumber of contexts, $S$ is the number of states, $A$ is the number of actions,$K$ is the number of episodes, and $Gamma le S$ is the maximum transitiondegree of any state-action pair. The regret bound only scales logarithmicallywith the planning horizon, thus yielding the first (nearly) horizon-free regretbound for LMDP. Key in our proof is an analysis of the total variance of alphavectors, which is carefully bounded by a recursion-based technique. Wecomplement our positive result with a novel $Omegaleft(sqrt{M S A K}right)$regret lower bound with $Gamma = 2$, which shows our upper bound minimaxoptimal when $Gamma$ is a constant. Our lower bound relies on newconstructions of hard instances and an argument based on the symmetrizationtechnique from theoretical computer science, both of which are technicallydifferent from existing lower bound proof for MDPs, and thus can be ofindependent interest.", "output": "Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We discuss probabilistic neural network models for unsupervised learningwhere the distribution of the hidden layer is fixed. We argue that learningmachines with this architecture enjoy a number of desirable properties. Forexample, the model can be chosen as a simple and interpretable one, it does notneed to be over-parametrised and training is argued to be efficient in athermodynamic sense. When hidden units are binary variables, these models havea natural interpretation in terms of features. We show that the featurelessstate corresponds to a state of maximal ignorance about the features and thatlearning the first feature depends on non-Gaussian statistical properties ofthe data. We suggest that the distribution of hidden variables should be chosenaccording to the principle of maximal relevance. We introduce the HierarchicalFeature Model (HFM) as an example of a model that satisfies this principle, andthat encodes a neutral a priori organisation of the feature space. We presentextensive numerical experiments in order i) to test that the internalrepresentation of learning machines can indeed be independent of the data withwhich they are trained and ii) that only a finite number of features are neededto describe a number of datasets.", "output": "Occam learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ability to ensure that a classifier gives reliable confidence scores isessential to ensure informed decision-making. To this end, recent work hasfocused on miscalibration, i.e., the over or under confidence of model scores.Yet calibration is not enough: even a perfectly calibrated classifier with thebest possible accuracy can have confidence scores that are far from the trueposterior probabilities. This is due to the grouping loss, created by sampleswith the same confidence scores but different true posterior probabilities.Proper scoring rule theory shows that given the calibration loss, the missingpiece to characterize individual errors is the grouping loss. While there aremany estimators of the calibration loss, none exists for the grouping loss instandard settings. Here, we propose an estimator to approximate the groupingloss. We show that modern neural network architectures in vision and NLPexhibit grouping loss, notably in distribution shifts settings, whichhighlights the importance of pre-production validation.", "output": "Beyond calibration: estimating the grouping loss of modern neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning has achieved remarkable success in learning representations formolecules, which is crucial for various biochemical applications, ranging fromproperty prediction to drug design. However, training Deep Neural Networks(DNNs) from scratch often requires abundant labeled molecules, which areexpensive to acquire in the real world. To alleviate this issue, tremendousefforts have been devoted to Molecular Pre-trained Models (CPMs), where DNNsare pre-trained using large-scale unlabeled molecular databases and thenfine-tuned over specific downstream tasks. Despite the prosperity, there lacksa systematic review of this fast-growing field. In this paper, we present thefirst survey that summarizes the current progress of CPMs. We first highlightthe limitations of training molecular representation models from scratch tomotivate CPM studies. Next, we systematically review recent advances on thistopic from several key perspectives, including molecular descriptors, encoderarchitectures, pre-training strategies, and applications. We also highlight thechallenges and promising avenues for future research, providing a usefulresource for both machine learning and scientific communities.", "output": "A Systematic Survey of Chemical Pre-trained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The field of geometric deep learning has had a profound impact on thedevelopment of innovative and powerful graph neural network architectures.Disciplines such as computer vision and computational biology have benefitedsignificantly from such methodological advances, which has led to breakthroughsin scientific domains such as protein structure prediction and design. In thiswork, we introduce GCPNet, a new geometry-complete, SE(3)-equivariant graphneural network designed for 3D molecular graph representation learning.Rigorous experiments across four distinct geometric tasks demonstrate thatGCPNet's predictions (1) for protein-ligand binding affinity achieve astatistically significant correlation of 0.608, more than 5% greater thancurrent state-of-the-art methods; (2) for protein structure ranking achievestatistically significant target-local and dataset-global correlations of 0.616and 0.871, respectively; (3) for Newtownian many-body systems modeling achievea task-averaged mean squared error less than 0.01, more than 15% better thancurrent methods; and (4) for molecular chirality recognition achieve astate-of-the-art prediction accuracy of 98.7%, better than any other machinelearning method to date. The source code, data, and instructions to train newmodels or reproduce our results are freely available at", "output": "Geometry-Complete Perceptron Networks for 3D Molecular Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Visual explanation of ``black-box'' models allows researchers in explainableartificial intelligence (XAI) to interpret the model's decisions in ahuman-understandable manner. In this paper, we propose interpretable classactivation mapping for tree crown detection (Crown-CAM) that overcomesinaccurate localization & computational complexity of previous methods whilegenerating reliable visual explanations for the challenging and dynamic problemof tree crown detection in aerial images. It consists of an unsupervisedselection of activation maps, computation of local score maps, andnon-contextual background suppression to efficiently provide fine-grainlocalization of tree crowns in scenarios with dense forest trees or sceneswithout tree crowns. Additionally, two Intersection over Union (IoU)-basedmetrics are introduced to effectively quantify both the accuracy and inaccuracyof generated explanations with respect to regions with or even without treecrowns in the image. Empirical evaluations demonstrate that the proposedCrown-CAM outperforms the Score-CAM, Augmented Score-CAM, and Eigen-CAM methodsby an average IoU margin of 8.7, 5.3, and 21.7 (and 3.3, 9.8, and 16.5)respectively in improving the accuracy (and decreasing inaccuracy) of visualexplanations on the challenging NEON tree crown dataset.", "output": "Crown-CAM: Interpretable Visual Explanations for Tree Crown Detection in Aerial Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Constrained multiagent reinforcement learning (C-MARL) is gaining importanceas MARL algorithms find new applications in real-world systems ranging fromenergy systems to drone swarms. Most C-MARL algorithms use a primal-dualapproach to enforce constraints through a penalty function added to the reward.In this paper, we study the structural effects of this penalty term on the MARLproblem. First, we show that the standard practice of using the constraintfunction as the penalty leads to a weak notion of safety. However, by makingsimple modifications to the penalty term, we can enforce meaningfulprobabilistic (chance and conditional value at risk) constraints. Second, wequantify the effect of the penalty term on the value function, uncovering animproved value estimation procedure. We use these insights to propose aconstrained multiagent advantage actor critic (C-MAA2C) algorithm. Simulationsin a simple constrained multiagent environment affirm that our reinterpretationof the primal-dual method in terms of probabilistic constraints is effective,and that our proposed value estimate accelerates convergence to a safe jointpolicy.", "output": "Interpreting Primal-Dual Algorithms for Constrained Multiagent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We apply the Physics Informed Neural Network (PINN) to the problem ofwildfire fire-front modelling. We use the PINN to solve the level-set equation,which is a partial differential equation that models a fire-front through thezero-level-set of a level-set function. The result is a PINN that simulates afire-front as it propagates through the spatio-temporal domain. We show thatpopular optimisation cost functions used in the literature can result in PINNsthat fail to maintain temporal continuity in modelled fire-fronts when thereare extreme changes in exogenous forcing variables such as wind direction. Wethus propose novel additions to the optimisation cost function that improvestemporal continuity under these extreme changes. Furthermore, we develop anapproach to perform data assimilation within the PINN such that the PINNpredictions are drawn towards observations of the fire-front. Finally, weincorporate our novel approaches into a Bayesian PINN (B-PINN) to provideuncertainty quantification in the fire-front predictions. This is significantas the standard solver, the level-set method, does not naturally offer thecapability for data assimilation and uncertainty quantification. Our resultsshow that, with our novel approaches, the B-PINN can produce accuratepredictions with high quality uncertainty quantification on real-world data.", "output": "Bayesian Physics Informed Neural Networks for Data Assimilation and Spatio-Temporal Modelling of Wildfires."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intermediate features of a pre-trained model have been shown informative formaking accurate predictions on downstream tasks, even if the model backbone iskept frozen. The key challenge is how to utilize these intermediate featuresgiven their gigantic amount. We propose visual query tuning (VQT), a simple yeteffective approach to aggregate intermediate features of Vision Transformers.Through introducing a handful of learnable ``query'' tokens to each layer, VQTleverages the inner workings of Transformers to ``summarize'' rich intermediatefeatures of each layer, which can then be used to train the prediction heads ofdownstream tasks. As VQT keeps the intermediate features intact and only learnsto combine them, it enjoys memory efficiency in training, compared to manyother parameter-efficient fine-tuning approaches that learn to adapt featuresand need back-propagation through the entire backbone. This also suggests thecomplementary role between VQT and those approaches in transfer learning.Empirically, VQT consistently surpasses the state-of-the-art approach thatutilizes intermediate features for transfer learning and outperforms fullfine-tuning in many cases. Compared to parameter-efficient approaches thatadapt features, VQT achieves much higher accuracy under memory constraints.Most importantly, VQT is compatible with these approaches to attain even higheraccuracy, making it a simple add-on to further boost transfer learning.", "output": "Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Concept bottleneck models (CBMs) are interpretable neural networks that firstpredict labels for human-interpretable concepts relevant to the predictiontask, and then predict the final label based on the concept label predictions.We extend CBMs to interactive prediction settings where the model can query ahuman collaborator for the label to some concepts. We develop an interactionpolicy that, at prediction time, chooses which concepts to request a label forso as to maximally improve the final prediction. We demonstrate that a simplepolicy combining concept prediction uncertainty and influence of the concept onthe final prediction achieves strong performance and outperforms staticapproaches as well as active feature acquisition methods proposed in theliterature. We show that the interactive CBM can achieve accuracy gains of5-10% with only 5 interactions over competitive baselines on the Caltech-UCSDBirds, CheXpert and OAI datasets.", "output": "Interactive Concept Bottleneck Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recommender systems predict what items a user will interact with next, basedon their past interactions. The problem is often approached through supervisedlearning, but recent advancements have shifted towards policy optimization ofrewards (e.g., user engagement). One challenge with the latter is policymismatch: we are only able to train a new policy given data collected from apreviously-deployed policy. The conventional way to address this problem isthrough importance sampling correction, but this comes with practicallimitations. We suggest an alternative approach of local policy improvementwithout off-policy correction. Our method computes and optimizes a lower boundof expected reward of the target policy, which is easy to estimate from dataand does not involve density ratios (such as those appearing in importancesampling correction). This local policy improvement paradigm is ideal forrecommender systems, as previous policies are typically of decent quality andpolicies are updated frequently. We provide empirical evidence and practicalrecipes for applying our technique in a sequential recommendation setting.", "output": "Local Policy Improvement for Recommender Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The secondary structure of ribonucleic acid (RNA) is more stable andaccessible in the cell than its tertiary structure, making it essential forfunctional prediction. Although deep learning has shown promising results inthis field, current methods suffer from poor generalization and highcomplexity. In this work, we present RFold, a simple yet effective RNAsecondary structure prediction in an end-to-end manner. RFold introduces adecoupled optimization process that decomposes the vanilla constraintsatisfaction problem into row-wise and column-wise optimization, simplifyingthe solving process while guaranteeing the validity of the output. Moreover,RFold adopts attention maps as informative representations instead of designinghand-crafted features. Extensive experiments demonstrate that RFold achievescompetitive performance and about eight times faster inference efficiency thanthe state-of-the-art method. The code and Colab demo are available inhref{}{}.", "output": "RFold: RNA Secondary Structure Prediction with Decoupled Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks are rapidly gaining interest in nonlinear systemidentification due to the model's ability to capture complex input-outputrelations directly from data. However, despite the flexibility of the approach,there are still concerns about the safety of these models in this context, aswell as the need for large amounts of potentially expensive data. Aluminumelectrolysis is a highly nonlinear production process, and most of the datamust be sampled manually, making the sampling process expensive and infrequent.In the case of infrequent measurements of state variables, the accuracy andopen-loop stability of the long-term predictions become highly important.Standard neural networks struggle to provide stable long-term predictions withlimited training data. In this work, we investigate the effect of combiningconcatenated skip-connections and the sparsity-promoting $ell_1$regularization on the open-loop stability and accuracy of forecasts with short,medium, and long prediction horizons. The case study is conducted on ahigh-dimensional and nonlinear simulator representing an aluminum electrolysiscell's mass and energy balance. The proposed model structure containsconcatenated skip connections from the input layer and all intermittent layersto the output layer, referred to as InputSkip. $ell_1$ regularized InputSkipis called sparse InputSkip. The results show that sparse InputSkip outperformsdense and sparse standard feedforward neural networks and dense InputSkipregarding open-loop stability and long-term predictive accuracy. The resultsare significant when models are trained on datasets of all sizes (small,medium, and large training sets) and for all prediction horizons (short,medium, and long prediction horizons.)", "output": "Sparse neural networks with skip-connections for identification of aluminum electrolysis cell."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep R Programming is a comprehensive course on one of the most popularlanguages in data science (statistical computing, graphics, machine learning,data wrangling and analytics). It introduces the base language in-depth and isaimed at ambitious students, practitioners, and researchers who would like tobecome independent users of this powerful environment. This textbook is anon-profit project. Its online and PDF versions are freely available at< This early draft is distributed in the hopethat it will be useful.", "output": "Deep R Programming."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous manipulation systems operating in domains where human interventionis difficult or impossible (e.g., underwater, extraterrestrial or hazardousenvironments) require a high degree of robustness to sensing and communicationfailures. Crucially, motion planning and control algorithms require a stream ofaccurate joint angle data provided by joint encoders, the failure of which mayresult in an unrecoverable loss of functionality. In this paper, we present anovel method for retrieving the joint angles of a robot manipulator using onlya single RGB image of its current configuration, opening up an avenue forrecovering system functionality when conventional proprioceptive sensing isunavailable. Our approach, based on a distance-geometric representation of theconfiguration space, exploits the knowledge of a robot's kinematic model withthe goal of training a shallow neural network that performs a 2D-to-3Dregression of distances associated with detected structural keypoints. It isshown that the resulting Euclidean distance matrix uniquely corresponds to theobserved configuration, where joint angles can be recovered viamultidimensional scaling and a simple inverse kinematics procedure. We evaluatethe performance of our approach on real RGB images of a Franka Emika Pandamanipulator, showing that the proposed method is efficient and exhibits solidgeneralization ability. Furthermore, we show that our method can be easilycombined with a dense refinement technique to obtain superior results.", "output": "A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce the concept of \"universal password model\" -- a password modelthat, once pre-trained, can automatically change its guessing strategy based onthe target system. To achieve this, the model does not need to access anyplaintext passwords from the target credentials. Instead, it exploits users'auxiliary information, such as email addresses, as a proxy signal to predictthe underlying password distribution. Specifically, the model uses deeplearning to capture the correlation between the auxiliary data of a group ofusers (e.g., users of a web application) and their passwords. It then exploitsthose patterns to create a tailored password model for the target system atinference time. No further training steps, targeted data collection, or priorknowledge of the community's password distribution is required. Besidesimproving over current password strength estimation techniques and attacks, themodel enables any end-user (e.g., system administrators) to autonomouslygenerate tailored password models for their systems without the oftenunworkable requirements of collecting suitable training data and fitting theunderlying machine learning model. Ultimately, our framework enables thedemocratization of well-calibrated password models to the community, addressinga major challenge in the deployment of password security solutions at scale.", "output": "Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, remarkable progress has been made by approximating Nash equilibrium(NE), correlated equilibrium (CE), and coarse correlated equilibrium (CCE)through function approximation that trains a neural network to predictequilibria from game representations. Furthermore, equivariant architecturesare widely adopted in designing such equilibrium approximators in normal-formgames. In this paper, we theoretically characterize benefits and limitations ofequivariant equilibrium approximators. For the benefits, we show that theyenjoy better generalizability than general ones and can achieve betterapproximations when the payoff distribution is permutation-invariant. For thelimitations, we discuss their drawbacks in terms of equilibrium selection andsocial welfare. Together, our results help to understand the role ofequivariance in equilibrium approximators.", "output": "Are Equivariant Equilibrium Approximators Beneficial?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Reinforcement learning (RL) agents typically learn tabula rasa, without priorknowledge of the world. However, if initialized with knowledge of high-levelsubgoals and transitions between subgoals, RL agents could utilize thisAbstract World Model (AWM) for planning and exploration. We propose usingfew-shot large language models (LLMs) to hypothesize an AWM, that will beverified through world experience, to improve sample efficiency of RL agents.Our DECKARD agent applies LLM-guided exploration to item crafting in Minecraftin two phases: (1) the Dream phase where the agent uses an LLM to decompose atask into a sequence of subgoals, the hypothesized AWM; and (2) the Wake phasewhere the agent learns a modular policy for each subgoal and verifies orcorrects the hypothesized AWM. Our method of hypothesizing an AWM with LLMs andthen verifying the AWM based on agent experience not only increases sampleefficiency over contemporary methods by an order of magnitude but is alsorobust to and corrects errors in the LLM, successfully blending noisyinternet-scale information from LLMs with knowledge grounded in environmentdynamics.", "output": "Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Existing bounds on the generalization error of deep networks assume some formof smooth or bounded dependence on the input variable, falling short ofinvestigating the mechanisms controlling such factors in practice. In thiswork, we present an extensive experimental study of the empirical Lipschitzconstant of deep networks undergoing double descent, and highlightnon-monotonic trends strongly correlating with the test error. Building aconnection between parameter-space and input-space gradients for SGD around acritical point, we isolate two important factors -- namely loss landscapecurvature and distance of parameters from initialization -- respectivelycontrolling optimization dynamics around a critical point and bounding modelfunction complexity, even beyond the training data. Our study presents novelsinsights on implicit regularization via overparameterization, and effectivemodel complexity for networks trained in practice.", "output": "On the Lipschitz Constant of Deep Networks and Double Descent."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse model identification enables nonlinear dynamical system discovery fromdata. However, the control of false discoveries for sparse model identificationis challenging, especially in the low-data and high-noise limit. In this paper,we perform a theoretical study on ensemble sparse model discovery, which showsempirical success in terms of accuracy and robustness to noise. In particular,we analyse the bootstrapping-based sequential thresholding least-squaresestimator. We show that this bootstrapping-based ensembling technique canperform a provably correct variable selection procedure with an exponentialconvergence rate of the error rate. In addition, we show that the ensemblesparse model discovery method can perform computationally efficient uncertaintyestimation, compared to expensive Bayesian uncertainty quantification methodsvia MCMC. We demonstrate the convergence properties and connection touncertainty quantification in various numerical studies on synthetic sparselinear regression and sparse model discovery. The experiments on sparse linearregression support that the bootstrapping-based sequential thresholdingleast-squares method has better performance for sparse variable selectioncompared to LASSO, thresholding least-squares, and bootstrapping-based LASSO.In the sparse model discovery experiment, we show that the bootstrapping-basedsequential thresholding least-squares method can provide valid uncertaintyquantification, converging to a delta measure centered around the true valuewith increased sample sizes. Finally, we highlight the improved robustness tohyperparameter selection under shifting noise and sparsity levels of thebootstrapping-based sequential thresholding least-squares method compared toother sparse regression methods.", "output": "Convergence of uncertainty estimates in Ensemble and Bayesian sparse model discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model Predictive Control (MPC) is attracting tremendous attention in theautonomous driving task as a powerful control technique. The success of an MPCcontroller strongly depends on an accurate internal dynamics model. However,the static parameters, usually learned by system identification, often fail toadapt to both internal and external perturbations in real-world scenarios. Inthis paper, we firstly (1) reformulate the problem as a Partially ObservedMarkov Decision Process (POMDP) that absorbs the uncertainties intoobservations and maintains Markov property into hidden states; and (2) learn arecurrent policy continually adapting the parameters of the dynamics model viaRecurrent Reinforcement Learning (RRL) for optimal and adaptive control; and(3) finally evaluate the proposed algorithm (referred as $textit{MPC-RRL}$) inCARLA simulator and leading to robust behaviours under a wide range ofperturbations.", "output": "Incorporating Recurrent Reinforcement Learning into Model Predictive Control for Adaptive Control in Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study variance-dependent regret bounds for Markov decision processes(MDPs). Algorithms with variance-dependent regret guarantees can automaticallyexploit environments with low variance (e.g., enjoying constant regret ondeterministic MDPs). The existing algorithms are either variance-independent orsuboptimal. We first propose two new environment norms to characterize thefine-grained variance properties of the environment. For model-based methods,we design a variant of the MVP algorithm (Zhang et al., 2021a) and use newanalysis techniques show to this algorithm enjoys variance-dependent boundswith respect to our proposed norms. In particular, this bound is simultaneouslyminimax optimal for both stochastic and deterministic MDPs, the first result ofits kind. We further initiate the study on model-free algorithms withvariance-dependent regret bounds by designing a reference-function-basedalgorithm with a novel capped-doubling reference update schedule. Lastly, wealso provide lower bounds to complement our upper bounds.", "output": "Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Message Passing Neural Networks (MPNNs) are instances of Graph NeuralNetworks that leverage the graph to send messages over the edges. Thisinductive bias leads to a phenomenon known as over-squashing, where a nodefeature is insensitive to information contained at distant nodes. Despiterecent methods introduced to mitigate this issue, an understanding of thecauses for over-squashing and of possible solutions are lacking. In thistheoretical work, we prove that: (i) Neural network width can mitigateover-squashing, but at the cost of making the whole network more sensitive;(ii) Conversely, depth cannot help mitigate over-squashing: increasing thenumber of layers leads to over-squashing being dominated by vanishinggradients; (iii) The graph topology plays the greatest role, sinceover-squashing occurs between nodes at high commute (access) time. Our analysisprovides a unified framework to study different recent methods introduced tocope with over-squashing and serves as a justification for a class of methodsthat fall under `graph rewiring'.", "output": "On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern representation learning methods often struggle to adapt quickly undernon-stationarity because they suffer from catastrophic forgetting and decayingplasticity. Such problems prevent learners from fast adaptation since they mayforget useful features or have difficulty learning new ones. Hence, thesemethods are rendered ineffective for continual learning. This paper proposesUtility-based Perturbed Gradient Descent (UPGD), an online learning algorithmwell-suited for continual learning agents. UPGD protects useful weights orfeatures from forgetting and perturbs less useful ones based on theirutilities. Our empirical results show that UPGD helps reduce forgetting andmaintain plasticity, enabling modern representation learning methods to workeffectively in continual learning.", "output": "Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning (FL) is an emerging technique that trains massive andgeographically distributed edge data while maintaining privacy. However, FL hasinherent challenges in terms of fairness and computational efficiency due tothe rising heterogeneity of edges, and thus usually results in sub-optimalperformance in recent state-of-the-art (SOTA) solutions. In this paper, wepropose a Customized Federated Learning (CFL) system to eliminate FLheterogeneity from multiple dimensions. Specifically, CFL tailors personalizedmodels from the specially designed global model for each client jointly guidedby an online trained model-search helper and a novel aggregation algorithm.Extensive experiments demonstrate that CFL has full-stack advantages for bothFL training and edge reasoning and significantly improves the SOTA performancew.r.t. model accuracy (up to 7.2% in the non-heterogeneous environment and upto 21.8% in the heterogeneous environment), efficiency, and FL fairness.", "output": "Towards Fairer and More Efficient Federated Learning via Multidimensional Personalized Edge Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Anthropogenic pollution of hydrological systems affects diverse communitiesand ecosystems around the world. Data analytics and modeling tools play a keyrole in fighting this challenge, as they can help identify key sources as wellas trace transport and quantify impact within complex hydrological systems.Several tools exist for simulating and tracing pollutant transport throughoutsurface waters using detailed physical models; these tools are powerful, butcan be computationally intensive, require significant amounts of data to bedeveloped, and require expert knowledge for their use (ultimately limitingapplication scope). In this work, we present a graph modeling framework --which we call ${tt HydroGraphs}$ -- for understanding pollutant transport andfate across waterbodies, rivers, and watersheds. This framework uses asimplified representation of hydrological systems that can be constructed basedpurely on open-source data (National Hydrography Dataset and Watershed BoundaryDataset). The graph representation provides an flexible intuitive approach forcapturing connectivity and for identifying upstream pollutant sources and fortracing downstream impacts within small and large hydrological systems.Moreover, the graph representation can facilitate the use of advancedalgorithms and tools of graph theory, topology, optimization, and machinelearning to aid data analytics and decision-making. We demonstrate thecapabilities of our framework by using case studies in the State of Wisconsin;here, we aim to identify upstream nutrient pollutant sources that arise fromagricultural practices and trace downstream impacts to waterbodies, rivers, andstreams. Our tool ultimately seeks to help stakeholders design effectivepollution prevention/mitigation practices and evaluate how surface watersrespond to such practices.", "output": "A Graph-Based Modeling Framework for Tracing Hydrological Pollutant Transport in Surface Waters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As large-scale graphs become more widespread, more and more computationalchallenges with extracting, processing, and interpreting large graph data arebeing exposed. It is therefore natural to search for ways to summarize theseexpansive graphs while preserving their key characteristics. In the past, mostgraph summarization techniques sought to capture the most important part of agraph statistically. However, today, the high dimensionality and complexity ofmodern graph data are making deep learning techniques more popular. Hence, thispaper presents a comprehensive survey of progress in deep learningsummarization techniques that rely on graph neural networks (GNNs). Ourinvestigation includes a review of the current state-of-the-art approaches,including recurrent GNNs, convolutional GNNs, graph autoencoders, and graphattention networks. A new burgeoning line of research is also discussed wheregraph reinforcement learning is being used to evaluate and improve the qualityof graph summaries. Additionally, the survey provides details of benchmarkdatasets, evaluation metrics, and open-source tools that are often employed inexperimentation settings, along with a discussion on the practical uses ofgraph summarization in different fields. Finally, the survey concludes with anumber of open research challenges to motivate further study in this area.", "output": "A Comprehensive Survey on Graph Summarization with Graph Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a method to formulate algorithm discovery as program search, andapply it to discover optimization algorithms for deep neural network training.We leverage efficient search techniques to explore an infinite and sparseprogram space. To bridge the large generalization gap between proxy and targettasks, we also introduce program selection and simplification strategies. Ourmethod discovers a simple and effective optimization algorithm, $textbf{Lion}$($textit{Evo$textbf{L}$ved S$textbf{i}$gn M$textbf{o}$me$textbf{n}$tum}$).It is more memory-efficient than Adam as it only keeps track of the momentum.Different from adaptive optimizers, its update has the same magnitude for eachparameter calculated through the sign operation. We compare Lion with widelyused optimizers, such as Adam and Adafactor, for training a variety of modelson different tasks. On image classification, Lion boosts the accuracy of ViT byup to 2% on ImageNet and saves up to 5x the pre-training compute on JFT. Onvision-language contrastive learning, we achieve 88.3% $textit{zero-shot}$ and91.1% $textit{fine-tuning}$ accuracy on ImageNet, surpassing the previous bestresults by 2% and 0.1%, respectively. On diffusion models, Lion outperformsAdam by achieving a better FID score and reducing the training compute by up to2.3x. For autoregressive, masked language modeling, and fine-tuning, Lionexhibits a similar or better performance compared to Adam. Our analysis of Lionreveals that its performance gain grows with the training batch size. It alsorequires a smaller learning rate than Adam due to the larger norm of the updateproduced by the sign function. Additionally, we examine the limitations of Lionand identify scenarios where its improvements are small or not statisticallysignificant. The implementation of Lion is publicly available.", "output": "Symbolic Discovery of Optimization Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial Intelligence and digital health have the potential to transformglobal health. However, having access to representative data to test andvalidate algorithms in realistic production environments is essential. Weintroduce HealthSyn, an open-source synthetic data generator of user behaviorfor testing reinforcement learning algorithms in the context of mobile healthinterventions. The generator utilizes Markov processes to generate diverse useractions, with individual user behavioral patterns that can change in reactionto personalized interventions (i.e., reminders, recommendations, andincentives). These actions are translated into actual logs using an ML-purposeddata schema specific to the mobile health application functionality includedwith HealthKit, and open-source SDK. The logs can be fed to pipelines to obtainuser metrics. The generated data, which is based on real-world behaviors andsimulation techniques, can be used to develop, test, and evaluate, both MLalgorithms in research and end-to-end operational RL-based interventiondelivery frameworks.", "output": "Synthetic Data Generator for Adaptive Interventions in Global Health."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Chaos and unpredictability are often considered synonymous, yet recentadvances in statistical forecasting suggest that large machine learning modelsgain unexpected insight from extended observation of complex systems. Weperform a large-scale comparison of 24 state-of-the-art multivariateforecasting methods on a crowdsourced database of 135 distinct low-dimensionalchaotic systems. Large, domain-agnostic time series forecasting methodsconsistently exhibit the strongest performance, producing accurate predictionslasting up to two dozen Lyapunov times. The best-performing models contain noinductive biases for dynamical systems, and include hierarchical neural basisfunctions, transformers, and recurrent neural networks. However, physics-basedhybrid methods like neural ordinary differential equations and reservoircomputers perform more strongly in data-limited settings. Diverse forecastingmethods correlate despite their widely-varying architectures, yet the Lyapunovexponent fails to fully explain variation in the predictability of differentchaotic systems over long time horizons. Our results show that a key advantageof modern forecasting methods stems not from their architectural details, butrather from their capacity to learn the large-scale structure of chaoticattractors.", "output": "Large-scale statistical forecasting models reassess the unpredictability of chaotic systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detailed phenotype information is fundamental to accurate diagnosis and riskestimation of diseases. As a rich source of phenotype information, electronichealth records (EHRs) promise to empower diagnostic variant interpretation.However, how to accurately and efficiently extract phenotypes from theheterogeneous EHR data remains a challenge. In this work, we present PheME, anEnsemble framework using Multi-modality data of structured EHRs andunstructured clinical notes for accurate Phenotype prediction. Firstly, weemploy multiple deep neural networks to learn reliable representations from thesparse structured EHR data and redundant clinical notes. A multi-modal modelthen aligns multi-modal features onto the same latent space to predictphenotypes. Secondly, we leverage ensemble learning to combine outputs fromsingle-modal models and multi-modal models to improve phenotype predictions. Wechoose seven diseases to evaluate the phenotyping performance of the proposedframework. Experimental results show that using multi-modal data significantlyimproves phenotype prediction in all diseases, the proposed ensemble learningframework can further boost the performance.", "output": "PheME: A deep ensemble framework for improving phenotype prediction from multi-modal data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "ChatGPT is a large language model recently released by the OpenAI company. Inthis technical report, we explore for the first time the capability of ChatGPTfor programming numerical algorithms. Specifically, we examine the capabilityof GhatGPT for generating codes for numerical algorithms in differentprogramming languages, for debugging and improving written codes by users, forcompleting missed parts of numerical codes, rewriting available codes in otherprogramming languages, and for parallelizing serial codes. Additionally, weassess if ChatGPT can recognize if given codes are written by humans ormachines. To reach this goal, we consider a variety of mathematical problemssuch as the Poisson equation, the diffusion equation, the incompressibleNavier-Stokes equations, compressible inviscid flow, eigenvalue problems,solving linear systems of equations, storing sparse matrices, etc. Furthermore,we exemplify scientific machine learning such as physics-informed neuralnetworks and convolutional neural networks with applications to computationalphysics. Through these examples, we investigate the successes, failures, andchallenges of ChatGPT. Examples of failures are producing singular matrices,operations on arrays with incompatible sizes, programming interruption forrelatively long codes, etc. Our outcomes suggest that ChatGPT can successfullyprogram numerical algorithms in different programming languages, but certainlimitations and challenges exist that require further improvement of thismachine learning model.", "output": "ChatGPT for Programming Numerical Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With recent advances in sensing technologies, a myriad of spatio-temporaldata has been generated and recorded in smart cities. Forecasting the evolutionpatterns of spatio-temporal data is an important yet demanding aspect of urbancomputing, which can enhance intelligent management decisions in variousfields, including transportation, environment, climate, public safety,healthcare, and others. Traditional statistical and deep learning methodsstruggle to capture complex correlations in urban spatio-temporal data. To thisend, Spatio-Temporal Graph Neural Networks (STGNN) have been proposed,achieving great promise in recent years. STGNNs enable the extraction ofcomplex spatio-temporal dependencies by integrating graph neural networks(GNNs) and various temporal learning methods. In this manuscript, we provide acomprehensive survey on recent progress on STGNN technologies for predictivelearning in urban computing. Firstly, we provide a brief introduction to theconstruction methods of spatio-temporal graph data and the prevalentdeep-learning architectures used in STGNNs. We then sort out the primaryapplication domains and specific predictive learning tasks based on existingliterature. Afterward, we scrutinize the design of STGNNs and their combinationwith some advanced technologies in recent years. Finally, we conclude thelimitations of existing research and suggest potential directions for futurework.", "output": "Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For billions of years, evolution has been the driving force behind thedevelopment of life, including humans. Evolution endowed humans with highintelligence, which allowed us to become one of the most successful species onthe planet. Today, humans aim to create artificial intelligence systems thatsurpass even our own intelligence. As artificial intelligences (AIs) evolve andeventually surpass us in all domains, how might evolution shape our relationswith AIs? By analyzing the environment that is shaping the evolution of AIs, weargue that the most successful AI agents will likely have undesirable traits.Competitive pressures among corporations and militaries will give rise to AIagents that automate human roles, deceive others, and gain power. If suchagents have intelligence that exceeds that of humans, this could lead tohumanity losing control of its future. More abstractly, we argue that naturalselection operates on systems that compete and vary, and that selfish speciestypically have an advantage over species that are altruistic to other species.This Darwinian logic could also apply to artificial agents, as agents mayeventually be better able to persist into the future if they behave selfishlyand pursue their own interests with little regard for humans, which could posecatastrophic risks. To counteract these risks and Darwinian forces, we considerinterventions such as carefully designing AI agents' intrinsic motivations,introducing constraints on their actions, and institutions that encouragecooperation. These steps, or others that resolve the problems we pose, will benecessary in order to ensure the development of artificial intelligence is apositive one.", "output": "Natural Selection Favors AIs over Humans."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study proposes an interpretable neural network-based non-proportionalodds model (N3POM) for ordinal regression. In the model, the response variablecan take continuous values, and the regression coefficients vary depending onthe predicting ordinal response. Contrary to conventional approaches, where thelinear coefficients of regression are directly estimated from the discreteresponse, we train a non-linear neural network that outputs the linearcoefficients by taking the response as its input. By virtue of the neuralnetwork, N3POM may have flexibility while preserving the interpretability ofthe conventional ordinal regression. We show a sufficient condition under whichthe predicted conditional cumulative probability (CCP) locally satisfies themonotonicity constraint over a user-specified region in the covariate space. Wealso provide a monotonicity-preserving stochastic (MPS) algorithm foradequately training the neural network.", "output": "An interpretable neural network-based non-proportional odds model for ordinal regression with continuous response."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recurrent neural networks trained with the backpropagation through time(BPTT) algorithm have led to astounding successes in various temporal tasks.However, BPTT introduces severe limitations, such as the requirement topropagate information backwards through time, the weight symmetry requirement,as well as update-locking in space and time. These problems become roadblocksfor AI systems where online training capabilities are vital. Recently,researchers have developed biologically-inspired training algorithms,addressing a subset of those problems. In this work, we propose a novellearning algorithm called online spatio-temporal learning with targetprojection (OSTTP) that resolves all aforementioned issues of BPTT. Inparticular, OSTTP equips a network with the capability to simultaneouslyprocess and learn from new incoming data, alleviating the weight symmetry andupdate-locking problems. We evaluate OSTTP on two temporal tasks, showcasingcompetitive performance compared to BPTT. Moreover, we present aproof-of-concept implementation of OSTTP on a memristive neuromorphic hardwaresystem, demonstrating its versatility and applicability to resource-constrainedAI devices.", "output": "Online Spatio-Temporal Learning with Target Projection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advancements in areas such as natural language processing and computervision rely on intricate and massive models that have been trained using vastamounts of unlabelled or partly labeled data and training or deploying thesestate-of-the-art methods to resource constraint environments has been achallenge. Galaxy morphologies are crucial to understanding the processes bywhich galaxies form and evolve. Efficient methods to classify galaxymorphologies are required to extract physical information from modern-dayastronomy surveys. In this paper, we introduce Astroformer, a method to learnfrom less amount of data. We propose using a hybrid transformer-convolutionalarchitecture drawing much inspiration from the success of CoAtNet and MaxViT.Concretely, we use the transformer-convolutional hybrid with a new stack designfor the network, a different way of creating a relative self-attention layer,and pair it with a careful selection of data augmentation and regularizationtechniques. Our approach sets a new state-of-the-art on predicting galaxymorphologies from images on the Galaxy10 DECals dataset, a science objective,which consists of 17736 labeled images achieving 94.86% top-$1$ accuracy,beating the current state-of-the-art for this task by 4.62%. Furthermore, thisapproach also sets a new state-of-the-art on CIFAR-100 and Tiny ImageNet. Wealso find that models and training methods used for larger datasets would oftennot work very well in the low-data regime.", "output": "Astroformer: More Data Might not be all you need for Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Personalized FL has been widely used to cater to heterogeneity challengeswith non-IID data. A primary obstacle is considering the personalizationprocess from the client's perspective to preserve their autonomy. Allowing theclients to participate in personalized FL decisions becomes significant due toprivacy and security concerns, where the clients may not be at liberty to shareprivate information necessary for producing good quality personalized models.Moreover, clients with high-quality data and resources are reluctant toparticipate in the FL process without reasonable incentive. In this paper, wepropose PI-FL, a one-shot personalization solution complemented by atoken-based incentive mechanism that rewards personalized training. PI-FLoutperforms other state-of-the-art approaches and can generate good-qualitypersonalized models while respecting clients' privacy.", "output": "PI-FL: Personalized and Incentivized Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The need for fully human-understandable models is increasingly beingrecognised as a central theme in AI research. The acceptance of AI models toassist in decision making in sensitive domains will grow when these models areinterpretable, and this trend towards interpretable models will be amplified byupcoming regulations. One of the killer applications of interpretable AI ismedical practice, which can benefit from accurate decision supportmethodologies that inherently generate trust. In this work, we propose FPT,(MedFP), a novel method that combines probabilistic trees and fuzzy logic toassist clinical practice. This approach is fully interpretable as it allowsclinicians to generate, control and verify the entire diagnosis procedure; oneof the methodology's strength is the capability to decrease the frequency ofmisdiagnoses by providing an estimate of uncertainties and counterfactuals. Ourapproach is applied as a proof-of-concept to two real medical scenarios:classifying malignant thyroid nodules and predicting the risk of progression inchronic kidney disease patients. Our results show that probabilistic fuzzydecision trees can provide interpretable support to clinicians, furthermore,introducing fuzzy variables into the probabilistic model brings significantnuances that are lost when using the crisp thresholds set by traditionalprobabilistic decision trees. We show that FPT and its predictions can assistclinical practice in an intuitive manner, with the use of a user-friendlyinterface specifically designed for this purpose. Moreover, we discuss theinterpretability of the FPT model.", "output": "Assisting clinical practice with fuzzy probabilistic decision trees."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recovering the latent factors of variation of high dimensional data has sofar focused on simple synthetic settings. Mostly building on unsupervised andweakly-supervised objectives, prior work missed out on the positiveimplications for representation learning on real world data. In this work, wepropose to leverage knowledge extracted from a diversified set of supervisedtasks to learn a common disentangled representation. Assuming each supervisedtask only depends on an unknown subset of the factors of variation, wedisentangle the feature space of a supervised multi-task model, with featuresactivating sparsely across different tasks and information being shared asappropriate. Importantly, we never directly observe the factors of variationsbut establish that access to multiple tasks is sufficient for identifiabilityunder sufficiency and minimality assumptions. We validate our approach on sixreal world distribution shift benchmarks, and different data modalities(images, text), demonstrating how disentangled representations can betransferred to real settings.", "output": "Leveraging sparse and shared feature activations for disentangled representation learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Brain-computer interfaces are being explored for a wide variety oftherapeutic applications. Typically, this involves measuring and analyzingcontinuous-time electrical brain activity via techniques such aselectrocorticogram (ECoG) or electroencephalography (EEG) to drive externaldevices. However, due to the inherent noise and variability in themeasurements, the analysis of these signals is challenging and requires offlineprocessing with significant computational resources. In this paper, we proposea simple yet efficient machine learning-based approach for the exemplaryproblem of hand gesture classification based on brain signals. We use a hybridmachine learning approach that uses a convolutional spiking neural networkemploying a bio-inspired event-driven synaptic plasticity rule for unsupervisedfeature learning of the measured analog signals encoded in the spike domain. Wedemonstrate that this approach generalizes to different subjects with both EEGand ECoG data and achieves superior accuracy in the range of 92.74-97.07% inidentifying different hand gesture classes and motor imagery tasks.", "output": "A Convolutional Spiking Network for Gesture Recognition in Brain-Computer Interfaces."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks based on batch normalization and ReLU-like activationfunctions can experience instability during the early stages of training due tothe high gradient induced by temporal gradient explosion. We explain how ReLUreduces variance more than expected, and how batch normalization amplifies thegradient during recovery, which causes gradient explosion while forwardpropagation remains stable. Additionally, we discuss how the dynamics of a deepneural network change during training and how the correlation between inputscan alleviate this problem. Lastly, we propose a better adaptive learning ratealgorithm inspired by second-order optimization algorithms, which outperformsexisting learning rate scaling methods in large batch training and can alsoreplace WarmUp in small batch training.", "output": "The Disharmony Between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation Between Activations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The efficacy of segmentation algorithms is frequently compromised bytopological errors like overlapping regions, disrupted connections, and voids.To tackle this problem, we introduce a novel loss function, namelyTopology-Aware Focal Loss (TAFL), that incorporates the conventional Focal Losswith a topological constraint term based on the Wasserstein distance betweenthe ground truth and predicted segmentation masks' persistence diagrams. Byenforcing identical topology as the ground truth, the topological constraintcan effectively resolve topological errors, while Focal Loss tackles classimbalance. We begin by constructing persistence diagrams from filtered cubicalcomplexes of the ground truth and predicted segmentation masks. We subsequentlyutilize the Sinkhorn-Knopp algorithm to determine the optimal transport planbetween the two persistence diagrams. The resultant transport plan minimizesthe cost of transporting mass from one distribution to the other and provides amapping between the points in the two persistence diagrams. We then compute theWasserstein distance based on this travel plan to measure the topologicaldissimilarity between the ground truth and predicted masks. We evaluate ourapproach by training a 3D U-Net with the MICCAI Brain Tumor Segmentation(BraTS) challenge validation dataset, which requires accurate segmentation of3D MRI scans that integrate various modalities for the precise identificationand tracking of malignant brain tumors. Then, we demonstrate that the qualityof segmentation performance is enhanced by regularizing the focal loss throughthe addition of a topological constraint as a penalty term.", "output": "Topology-Aware Focal Loss for 3D Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Growing heterogeneity and configurability in HPC architectures has madeauto-tuning applications and runtime parameters on these systems very complex.Users are presented with a multitude of options to configure parameters. Inaddition to application specific solutions, a common approach is to use generalpurpose search strategies, which often might not identify the bestconfigurations or their time to convergence is a significant barrier. There is,thus, a need for a general purpose and efficient tuning approach that can beeasily scaled and adapted to various tuning tasks. We propose a technique fortuning parallel code regions that is general enough to be adapted to multipletasks. In this paper, we analyze IR-based programming models to maketask-specific performance optimizations. To this end, we propose the MultimodalGraph Neural Network and Autoencoder (MGA) tuner, a multimodal deep learningbased approach that adapts Heterogeneous Graph Neural Networks and DenoizingAutoencoders for modeling IR-based code representations that serve as separatemodalities. This approach is used as part of our pipeline to model a syntax,semantics, and structure-aware IR-based code representation for tuning parallelcode regions/kernels. We extensively experiment on OpenMP and OpenCL coderegions/kernels obtained from PolyBench, Rodinia, STREAM, DataRaceBench, AMDSDK, NPB, NVIDIA SDK, Parboil, SHOC, and LULESH benchmarks. We apply ourmultimodal learning techniques to the tasks of i) optimizing the number ofthreads, scheduling policy and chunk size in OpenMP loops and, ii) identifyingthe best device for heterogeneous device mapping of OpenCL kernels. Ourexperiments show that this multimodal learning based approach outperforms thestate-of-the-art in all experiments.", "output": "Performance Optimization using Multimodal Modeling and Heterogeneous GNN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we propose shot optimization method for QML models at theexpense of minimal impact on model performance. We use classification task as atest case for MNIST and FMNIST datasets using a hybrid quantum-classical QMLmodel. First, we sweep the number of shots for short and full versions of thedataset. We observe that training the full version provides 5-6% higher testingaccuracy than short version of dataset with up to 10X higher number of shotsfor training. Therefore, one can reduce the dataset size to accelerate thetraining time. Next, we propose adaptive shot allocation on short versiondataset to optimize the number of shots over training epochs and evaluate theimpact on classification accuracy. We use a (a) linear function where thenumber of shots reduce linearly with epochs, and (b) step function where thenumber of shots reduce in step with epochs. We note around 0.01 increase inloss and around 4% (1%) reduction in testing accuracy for reduction in shots byup to 100X (10X) for linear (step) shot function compared to conventionalconstant shot function for MNIST dataset, and 0.05 increase in loss and around5-7% (5-7%) reduction in testing accuracy with similar reduction in shots usinglinear (step) shot function on FMNIST dataset. For comparison, we also use theproposed shot optimization methods to perform ground state energy estimation ofdifferent molecules and observe that step function gives the best and moststable ground state energy prediction at 1000X less number of shots.", "output": "Shot Optimization in Quantum Machine Learning Architectures to Accelerate Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the development of Vision-Language Pre-training Models (VLPMs)represented by CLIP and ALIGN, significant breakthroughs have been achieved forassociation-based visual tasks such as image classification and image-textretrieval by the zero-shot capability of CLIP without fine-tuning. However,CLIP is hard to apply to generation-based tasks. This is due to the lack ofdecoder architecture and pre-training tasks for generation. Although previousworks have created generation capacity for CLIP through additional languagemodels, a modality gap between the CLIP representations of different modalitiesand the inability of CLIP to model the offset of this gap, which fails theconcept to transfer across modalities. To solve the problem, we try to mapimages/videos to the language modality and generate captions from the languagemodality. In this paper, we propose the K-nearest-neighbor Cross-modalityMapping (Knight), a zero-shot method from association to generation. Withtext-only unsupervised training, Knight achieves state-of-the-art performancein zero-shot methods for image captioning and video captioning. Our code isavailable at ", "output": "From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Today, state-of-the-art deep neural networks that process events firstconvert them into dense, grid-like input representations before using anoff-the-shelf network. However, selecting the appropriate representation forthe task traditionally requires training a neural network for eachrepresentation and selecting the best one based on the validation score, whichis very time-consuming. In this work, we eliminate this bottleneck by selectingthe best representation based on the Gromov-Wasserstein Discrepancy (GWD)between the raw events and their representation. It is approximately 200 timesfaster to compute than training a neural network and preserves the taskperformance ranking of event representations across multiple representations,network backbones, and datasets. This means that finding a representation witha high task score is equivalent to finding a representation with a low GWD. Weuse this insight to, for the first time, perform a hyperparameter search on alarge family of event representations, revealing new and powerfulrepresentations that exceed the state-of-the-art. On object detection, ouroptimized representation outperforms existing representations by 1.9% mAP onthe 1 Mpx dataset and 8.6% mAP on the Gen1 dataset and even outperforms thestate-of-the-art by 1.8% mAP on Gen1 and state-of-the-art feed-forward methodsby 6.0% mAP on the 1 Mpx dataset. This work opens a new unexplored field ofexplicit representation optimization for event-based learning methods.", "output": "From Chaos Comes Order: Ordering Event Representations for Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we demonstrate the versatility of mean-field games (MFGs) as amathematical framework for explaining, enhancing, and designing generativemodels. There is a pervasive sense in the generative modeling community thatthe various flow and diffusion-based generative models have some foundationalcommon structure and interrelationships. We establish connections between MFGsand major classes of flow and diffusion-based generative models includingcontinuous-time normalizing flows, score-based models, and Wasserstein gradientflows. We derive these three classes of generative models through differentchoices of particle dynamics and cost functions. Furthermore, we study themathematical structure and properties of each generative model by studyingtheir associated MFG's optimality condition, which is a set of couplednonlinear partial differential equations (PDEs). The theory of MFGs, therefore,enables the study of generative models through the theory of nonlinear PDEs.Through this perspective, we investigate the well-posedness and structure ofnormalizing flows, unravel the mathematical structure of score-based generativemodeling, and derive a mean-field game formulation of the Wasserstein gradientflow. From an algorithmic perspective, the optimality conditions of MFGs alsoallow us to introduce HJB regularizers for enhanced training a broader class ofgenerative models. We present this framework as an MFG laboratory which servesas a platform for revealing new avenues of experimentation and invention ofgenerative models. This laboratory will give rise to a multitude of well-posedgenerative modeling formulations, providing a consistent theoretical frameworkupon which numerical and algorithmic tools may be developed.", "output": "A mean-field games laboratory for generative modeling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents a comprehensive and practical guide for practitioners andend-users working with Large Language Models (LLMs) in their downstream naturallanguage processing (NLP) tasks. We provide discussions and insights into theusage of LLMs from the perspectives of models, data, and downstream tasks.Firstly, we offer an introduction and brief summary of current GPT- andBERT-style LLMs. Then, we discuss the influence of pre-training data, trainingdata, and test data. Most importantly, we provide a detailed discussion aboutthe use and non-use cases of large language models for various natural languageprocessing tasks, such as knowledge-intensive tasks, traditional naturallanguage understanding tasks, natural language generation tasks, emergentabilities, and considerations for specific tasks.We present various use casesand non-use cases to illustrate the practical applications and limitations ofLLMs in real-world scenarios. We also try to understand the importance of dataand the specific challenges associated with each NLP task. Furthermore, weexplore the impact of spurious biases on LLMs and delve into other essentialconsiderations, such as efficiency, cost, and latency, to ensure acomprehensive understanding of deploying LLMs in practice. This comprehensiveguide aims to provide researchers and practitioners with valuable insights andbest practices for working with LLMs, thereby enabling the successfulimplementation of these models in a wide range of NLP tasks. A curated list ofpractical guide resources of LLMs, regularly updated, can be found aturl{", "output": "Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Identifying phenotypes plays an important role in furthering ourunderstanding of disease biology through practical applications withinhealthcare and the life sciences. The challenge of dealing with thecomplexities and noise within electronic health records (EHRs) has motivatedapplications of machine learning in phenotypic discovery. While recent researchhas focused on finding predictive subtypes for clinical decision support, herewe instead focus on the noise that results in phenotypic misclassification,which can reduce a phenotypes ability to detect associations in genome-wideassociation studies (GWAS). We show that by combining anchor learning andtransformer architectures into our proposed model, AnchorBERT, we are able todetect genomic associations only previously found in large consortium studieswith 5$times$ more cases. When reducing the number of controls available by50%, we find our model is able to maintain 40% more significant genomicassociations from the GWAS catalog compared to standard phenotype definitions.keywords{Phenotyping and Machine Learning and Semi-Supervised and GeneticAssociation Studies and Biological Discovery}", "output": "Phenotyping with Positive Unlabelled Learning for Genome-Wide Association Studies."}]
\ No newline at end of file