From ab649ceb0ca793c5389c6b5df355c1b1ff7c4513 Mon Sep 17 00:00:00 2001 From: wangrongsheng Date: Sun, 2 Jul 2023 08:55:19 +0800 Subject: [PATCH] * update 2023-07-02 08:55:19 --- data/2023-07-02.json | 1 + 1 file changed, 1 insertion(+) create mode 100644 data/2023-07-02.json diff --git a/data/2023-07-02.json b/data/2023-07-02.json new file mode 100644 index 0000000..66004e5 --- /dev/null +++ b/data/2023-07-02.json @@ -0,0 +1 @@ +[{"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the widespread digitization of finance and the increasing popularity ofcryptocurrencies, the sophistication of fraud schemes devised by cybercriminalsis growing. Money laundering -- the movement of illicit funds to conceal theirorigins -- can cross bank and national boundaries, producing complextransaction patterns. The UN estimates 2-5% of global GDP or $0.8 - $2.0trillion dollars are laundered globally each year. Unfortunately, real data totrain machine learning models to detect laundering is generally not available,and previous synthetic data generators have had significant shortcomings. Arealistic, standardized, publicly-available benchmark is needed for comparingmodels and for the advancement of the area.To this end, this paper contributes a synthetic financial transaction datasetgenerator and a set of synthetically generated AML (Anti-Money Laundering)datasets. We have calibrated this agent-based generator to match realtransactions as closely as possible and made the datasets public. We describethe generator in detail and demonstrate how the datasets generated can helpcompare different Graph Neural Networks in terms of their AML abilities. In akey way, using synthetic data in these comparisons can be even better thanusing real data: the ground truth labels are complete, whilst many launderingtransactions in real data are never detected.", "output": "Realistic Synthetic Financial Transactions for Anti-Money Laundering Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model-agnostic feature attributions can provide local insights in complex MLmodels. If the explanation is correct, a domain expert can validate and trustthe model's decision. However, if it contradicts the expert's knowledge,related work only corrects irrelevant features to improve the model. To allowfor unlimited interaction, in this paper we provide model-agnosticimplementations for two popular explanation methods (Occlusion and Shapleyvalues) to enforce entirely different attributions in the complex model. For aparticular set of samples, we use the corrected feature attributions togenerate extra local data, which is used to retrain the model to have the rightexplanation for the samples. Through simulated and real data experiments on avariety of models we show how our proposed approach can significantly improvethe model's performance only by augmenting its training dataset based oncorrected explanations. Adding our interactive explanations to active learningsettings increases the sample efficiency significantly and outperforms existingexplanatory interactive strategies. Additionally we explore how a domain expertcan provide feature attributions which are sufficiently correct to improve themodel.", "output": "Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Social media streams contain large and diverse amount of information, rangingfrom daily-life stories to the latest global and local events and news.Twitter, especially, allows a fast spread of events happening real time, andenables individuals and organizations to stay informed of the events happeningnow. Event detection from social media data poses different challenges fromtraditional text and is a research area that has attracted much attention inrecent years. In this paper, we survey a wide range of event detection methodsfor Twitter data stream, helping readers understand the recent development inthis area. We present the datasets available to the public. Furthermore, a fewresearch opportunities", "output": "Event Detection from Social Media Stream: Methods, Datasets and Opportunities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The two-time scale nature of SAC, which is an actor-critic algorithm, ischaracterised by the fact that the critic estimate has not converged for theactor at any given time, but since the critic learns faster than the actor, itensures eventual consistency between the two. Various strategies have beenintroduced in literature to learn better gradient estimates to help achievebetter convergence. Since gradient estimates depend upon the critic, we positthat improving the critic can provide a better gradient estimate for the actorat each time. Utilizing this, we propose Soft Actor Retrospective Critic(SARC), where we augment the SAC critic loss with another loss term -retrospective loss - leading to faster critic convergence and consequently,better policy gradient estimates for the actor. An existing implementation ofSAC can be easily adapted to SARC with minimal modifications. Through extensiveexperimentation and analysis, we show that SARC provides consistent improvementover SAC on benchmark environments. We plan to open-source the code and allexperiment data at: ", "output": "SARC: Soft Actor Retrospective Critic."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Numerically solving partial differential equations (PDEs) typically requiresfine discretization to resolve necessary spatiotemporal scales, which can becomputationally expensive. Recent advances in deep learning have provided a newapproach to solving PDEs that involves the use of neural operators. Neuraloperators are neural network architectures that learn mappings between functionspaces and have the capability to solve partial differential equations based ondata. This study utilizes a novel neural operator called Hyena, which employs along convolutional filter that is parameterized by a multilayer perceptron. TheHyena operator is an operation that enjoys sub-quadratic complexity and statespace model to parameterize long convolution that enjoys global receptivefield. This mechanism enhances the model's comprehension of the input's contextand enables data-dependent weight for different PDE instances. To measure howeffective the layers are in solving PDEs, we conduct experiments on Burger'sequation and Navier Stokes equation. Our findings indicate Hyena Neuraloperator can serve as an efficient and accurate model for learning PDEs'solution operator. The data and code used can be found at:", "output": "HNO: Hyena Neural Operator for solving PDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Terminology sources, such as controlled vocabularies, thesauri andclassification systems, play a key role in digitizing cultural heritage.However, Information Retrieval (IR) systems that allow to query and explorethese lexical resources often lack an adequate representation of the semanticsbehind the user's search, which can be conveyed through multiple expressionmodalities (e.g., images, keywords or textual descriptions). This paperpresents the implementation of a new search engine for one of the most widelyused iconography classification system, Iconclass. The novelty of this systemis the use of a pre-trained vision-language model, namely CLIP, to retrieve andexplore Iconclass concepts using visual or textual queries.", "output": "Multimodal Search on Iconclass using Vision-Language Pre-Trained Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video retrieval (VR) involves retrieving the ground truth video from thevideo database given a text caption or vice-versa. The two important componentsof compositionality: objects & attributes and actions are joined using correctsemantics to form a proper text query. These components (objects & attributes,actions and semantics) each play an important role to help distinguish amongvideos and retrieve the correct ground truth video. However, it is unclear whatis the effect of these components on the video retrieval performance. Wetherefore, conduct a systematic study to evaluate the compositional andsemantic understanding of video retrieval models on standard benchmarks such asMSRVTT, MSVD and DIDEMO. The study is performed on two categories of videoretrieval models: (i) which are pre-trained on video-text pairs and fine-tunedon downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.)(ii) which adapt pre-trained image-text representations like CLIP for videoretrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal thatactions and semantics play a minor role compared to objects & attributes invideo understanding. Moreover, video retrieval models that use pre-trainedimage-text representations (CLIP) have better semantic and compositionalunderstanding as compared to models pre-trained on video-text data.", "output": "ICSVR: Investigating Compositional and Semantic Understanding in Video Retrieval Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cell line authentication plays a crucial role in the biomedical field,ensuring researchers work with accurately identified cells. Supervised deeplearning has made remarkable strides in cell line identification by studyingcell morphological features through cell imaging. However, batch effects, asignificant issue stemming from the different times at which data is generated,lead to substantial shifts in the underlying data distribution, thuscomplicating reliable differentiation between cell lines from distinct batchcultures. To address this challenge, we introduce CLANet, a pioneeringframework for cross-batch cell line identification using brightfield images,specifically designed to tackle three distinct batch effects. We propose a cellcluster-level selection method to efficiently capture cell density variations,and a self-supervised learning strategy to manage image quality variations,thus producing reliable patch representations. Additionally, we adopt multipleinstance learning(MIL) for effective aggregation of instance-level features forcell line identification. Our innovative time-series segment sampling modulefurther enhances MIL's feature-learning capabilities, mitigating biases fromvarying incubation times across batches. We validate CLANet using data from 32cell lines across 93 experimental batches from the AstraZeneca Global CellBank. Our results show that CLANet outperforms related approaches (e.g. domainadaptation, MIL), demonstrating its effectiveness in addressing batch effectsin cell line identification.", "output": "CLANet: A Comprehensive Framework for Cross-Batch Cell Line Identification Using Brightfield Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The class imbalance problem in deep learning has been explored in severalstudies, but there has yet to be a systematic analysis of this phenomenon inobject detection. Here, we present comprehensive analyses and experiments ofthe foreground-background (F-B) imbalance problem in object detection, which isvery common and caused by small, infrequent objects of interest. Weexperimentally study the effects of different aspects of F-B imbalance (objectsize, number of objects, dataset size, object type) on detection performance.In addition, we also compare 9 leading methods for addressing this problem,including Faster-RCNN, SSD, OHEM, Libra-RCNN, Focal-Loss, GHM, PISA, YOLO-v3,and GFL with a range of datasets from different imaging domains. We concludethat (1) the F-B imbalance can indeed cause a significant drop in detectionperformance, (2) The detection performance is more affected by F-B imbalancewhen fewer training data are available, (3) in most cases, decreasing objectsize leads to larger performance drop than decreasing number of objects, giventhe same change in the ratio of object pixels to non-object pixels, (6) amongall selected methods, Libra-RCNN and PISA demonstrate the best performance inaddressing the issue of F-B imbalance. (7) When the training dataset size islarge, the choice of method is not impactful (8) Soft-sampling methods,including focal-loss, GHM, and GFL, perform fairly well on average but arerelatively unstable.", "output": "A systematic study of the foreground-background imbalance problem in deep learning for object detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As machine learning (ML) based systems are adopted in domains such as lawenforcement, criminal justice, finance, hiring and admissions, ensuring thefairness of ML aided decision-making is becoming increasingly important. Inthis paper, we focus on the problem of fair classification, and introduce anovel min-max F-divergence regularization framework for learning fairclassification models while preserving high accuracy. Our framework consists oftwo trainable networks, namely, a classifier network and a bias/fairnessestimator network, where the fairness is measured using the statistical notionof F-divergence. We show that F-divergence measures possess convexity anddifferentiability properties, and their variational representation make themwidely applicable in practical gradient based training methods. The proposedframework can be readily adapted to multiple sensitive attributes and for highdimensional datasets. We study the F-divergence based training paradigm for twotypes of group fairness constraints, namely, demographic parity and equalizedodds. We present a comprehensive set of experiments for several real-world datasets arising in multiple domains (including COMPAS, Law Admissions, AdultIncome, and CelebA datasets). To quantify the fairness-accuracy tradeoff, weintroduce the notion of fairness-accuracy receiver operating characteristic(FA-ROC) and a corresponding textit{low-bias} FA-ROC, which we argue is anappropriate measure to evaluate different classifiers. In comparison to severalexisting approaches for learning fair classifiers (including pre-processing,post-processing and other regularization methods), we show that the proposedF-divergence based framework achieves state-of-the-art performance with respectto the trade-off between accuracy and fairness.", "output": "Learning Fair Classifiers via Min-Max F-divergence Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-dimensional datasets depict a challenge for learning tasks in datamining and machine learning. Feature selection is an effective technique indealing with dimensionality reduction. It is often an essential data processingstep prior to applying a learning algorithm. Over the decades, filter featureselection methods have evolved from simple univariate relevance rankingalgorithms to more sophisticated relevance-redundancy trade-offs and tomultivariate dependencies-based approaches in recent years. This tendency tocapture multivariate dependence aims at obtaining unique information about theclass from the intercooperation among features. This paper presents acomprehensive survey of the state-of-the-art work on filter feature selectionmethods assisted by feature intercooperation, and summarizes the contributionsof different approaches found in the literature. Furthermore, current issuesand challenges are introduced to identify promising future research anddevelopment.", "output": "Feature Selection: A perspective on inter-attribute cooperation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Neural Networks are powerful tools to understand complex patterns andmaking decisions. However, their black-box nature impedes a completeunderstanding of their inner workings. While online saliency-guided trainingmethods try to highlight the prominent features in the model's output toalleviate this problem, it is still ambiguous if the visually explainablefeatures align with robustness of the model against adversarial examples. Inthis paper, we investigate the saliency trained model's vulnerability toadversarial examples methods. Models are trained using an onlinesaliency-guided training method and evaluated against popular algorithms ofadversarial examples. We quantify the robustness and conclude that despite thewell-explained visualizations in the model's output, the salient models sufferfrom the lower performance against adversarial examples attacks.", "output": "Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, Transformer-based language models have become the standardapproach for natural language processing tasks. However, stringent throughputand latency requirements in industrial applications are limiting theiradoption. To mitigate the gap, model compression techniques such as structuredpruning are being used to improve inference efficiency. However, most existingneural network inference runtimes lack adequate support for structuredsparsity. In this paper, we propose an efficient sparse deep learning inferencesoftware stack for Transformer-based language models where the weights arepruned with constant block size. Our sparse software accelerator leveragesIntel Deep Learning Boost to maximize the performance of sparse matrix - densematrix multiplication (commonly abbreviated as SpMM) on CPUs. Our SpMM kerneloutperforms the existing sparse libraries (oneMKL, TVM, and LIBXSMM) by anorder of magnitude on a wide range of GEMM shapes under 5 representativesparsity ratios (70%, 75%, 80%, 85%, 90%). Moreover, our SpMM kernel shows upto 5x speedup over dense GEMM kernel of oneDNN, a well-optimized dense librarywidely used in industry. We apply our sparse accelerator on widely-usedTransformer-based language models including Bert-Mini, DistilBERT, Bert-Base,and BERT-Large. Our sparse inference software shows up to 1.5x speedup overNeural Magic's Deepsparse under same configurations on Xeon on Amazon WebServices under proxy production latency constraints. We also compare oursolution with two framework-based inference solutions, ONNX Runtime andPyTorch, and demonstrate up to 37x speedup over ONNX Runtime and 345x overPyTorch on Xeon under the latency constraints. All the source code is publiclyavailable on Github: ", "output": "An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is now an essential part of the image training process, asit effectively prevents overfitting and makes the model more robust againstnoisy datasets. Recent mixing augmentation strategies have advanced to generatethe mixup mask that can enrich the saliency information, which is a supervisorysignal. However, these methods incur a significant computational burden tooptimize the mixup mask. From this motivation, we propose a novelsaliency-aware mixup method, GuidedMixup, which aims to retain the salientregions in mixup images with low computational overhead. We develop anefficient pairing algorithm that pursues to minimize the conflict of salientregions of paired images and achieve rich saliency in mixup images. Moreover,GuidedMixup controls the mixup ratio for each pixel to better preserve thesalient region by interpolating two paired images smoothly. The experiments onseveral datasets demonstrate that GuidedMixup provides a good trade-off betweenaugmentation overhead and generalization performance on classificationdatasets. In addition, our method shows good performance in experiments withcorrupted or reduced datasets.", "output": "GuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine-learning models are known to be vulnerable to evasion attacks thatperturb model inputs to induce misclassifications. In this work, we identifyreal-world scenarios where the true threat cannot be assessed accurately byexisting attacks. Specifically, we find that conventional metrics measuringtargeted and untargeted robustness do not appropriately reflect a model'sability to withstand attacks from one set of source classes to another set oftarget classes. To address the shortcomings of existing methods, we formallydefine a new metric, termed group-based robustness, that complements existingmetrics and is better-suited for evaluating model performance in certain attackscenarios. We show empirically that group-based robustness allows us todistinguish between models' vulnerability against specific threat models insituations where traditional robustness metrics do not apply. Moreover, tomeasure group-based robustness efficiently and accurately, we 1) propose twoloss functions and 2) identify three new attack strategies. We show empiricallythat with comparable success rates, finding evasive samples using our new lossfunctions saves computation by a factor as large as the number of targetedclasses, and finding evasive samples using our new attack strategies saves timeby up to 99% compared to brute-force search methods. Finally, we propose adefense method that increases group-based robustness by up to 3.52$times$.", "output": "Group-based Robustness: A General Framework for Customized Robustness in the Real World."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Demand flexibility plays a vital role in maintaining grid balance, reducingpeak demand, and saving customers' energy bills. Given their highly shiftableload and significant contribution to a building's energy consumption, Heating,Ventilation, and Air Conditioning (HVAC) systems can provide valuable demandflexibility to the power systems by adjusting their energy consumption inresponse to electricity price and power system needs. To exploit thisflexibility in both operation time and power, it is imperative to accuratelymodel and aggregate the load flexibility of a large population of HVAC systemsas well as designing effective control algorithms. In this paper, we tackle thecurse of dimensionality issue in modeling and control by utilizing the conceptof laxity to quantify the emergency level of each HVAC operation request. Wefurther propose a two-level approach to address energy optimization for a largepopulation of HVAC systems. The lower level involves an aggregator to aggregateHVAC load laxity information and use least-laxity-first (LLF) rule to allocatereal-time power for individual HVAC systems based on the controller's totalpower. Due to the complex and uncertain nature of HVAC systems, we leverage areinforcement learning (RL)-based controller to schedule the total power basedon the aggregated laxity information and electricity price. We evaluate thetemperature control and energy cost saving performance of a large-scale groupof HVAC systems in both single-zone and multi-zone scenarios, under varyingclimate and electricity market conditions. The experiment results indicate thatproposed approach outperforms the centralized methods in the majority of testscenarios, and performs comparably to model-based method in some scenarios.", "output": "Laxity-Aware Scalable Reinforcement Learning for HVAC Control."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The popularization of intelligent healthcare devices and big data analyticssignificantly boosts the development of smart healthcare networks (SHNs). Toenhance the precision of diagnosis, different participants in SHNs share healthdata that contains sensitive information. Therefore, the data exchange processraises privacy concerns, especially when the integration of health data frommultiple sources (linkage attack) results in further leakage. Linkage attack isa type of dominant attack in the privacy domain, which can leverage variousdata sources for private data mining. Furthermore, adversaries launch poisoningattacks to falsify the health data, which leads to misdiagnosing or evenphysical damage. To protect private health data, we propose a personalizeddifferential privacy model based on the trust levels among users. The trust isevaluated by a defined community density, while the corresponding privacyprotection level is mapped to controllable randomized noise constrained bydifferential privacy. To avoid linkage attacks in personalized differentialprivacy, we designed a noise correlation decoupling mechanism using a Markovstochastic process. In addition, we build the community model on a blockchain,which can mitigate the risk of poisoning attacks during differentially privatedata transmission over SHNs. To testify the effectiveness and superiority ofthe proposed approach, we conduct extensive experiments on benchmark datasets.", "output": "Towards Blockchain-Assisted Privacy-Aware Data Sharing For Edge Intelligence: A Smart Healthcare Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present the Chinese Elementary School Math Word Problems (CMATH) dataset,comprising 1.7k elementary school-level math word problems with detailedannotations, source from actual Chinese workbooks and exams. This dataset aimsto provide a benchmark tool for assessing the following question: to what gradelevel of elementary school math do the abilities of popular large languagemodels (LLMs) correspond? We evaluate a variety of popular LLMs, including bothcommercial and open-source options, and discover that only GPT-4 achievessuccess (accuracy $geq$ 60%) across all six elementary school grades, whileother models falter at different grade levels. Furthermore, we assess therobustness of several top-performing LLMs by augmenting the original problemsin the CMATH dataset with distracting information. Our findings reveal thatGPT-4 is able to maintains robustness, while other model fail. We anticipatethat our study will expose limitations in LLMs' arithmetic and reasoningcapabilities, and promote their ongoing development and advancement.", "output": "CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While open-ended self-explanations have been shown to promote robust learningin multiple studies, they pose significant challenges to automated grading andfeedback in technology-enhanced learning, due to the unconstrained nature ofthe students' input. Our work investigates whether recent advances in LargeLanguage Models, and in particular ChatGPT, can address this issue. Usingdecimal exercises and student data from a prior study of the learning gameDecimal Point, with more than 5,000 open-ended self-explanation responses, weinvestigate ChatGPT's capability in (1) solving the in-game exercises, (2)determining the correctness of students' answers, and (3) providing meaningfulfeedback to incorrect answers. Our results showed that ChatGPT can respond wellto conceptual questions, but struggled with decimal place values and numberline problems. In addition, it was able to accurately assess the correctness of75% of the students' answers and generated generally high-quality feedback,similar to human instructors. We conclude with a discussion of ChatGPT'sstrengths and weaknesses and suggest several venues for extending its use casesin digital teaching and learning.", "output": "Evaluating ChatGPT's Decimal Skills and Feedback Generation in a Digital Learning Game."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of Natural Language Generation models has led to the creationof powerful Artificial Intelligence-assisted writing tools. These tools arecapable of predicting users' needs and actively providing suggestions as theywrite. In this work, we conduct a comparative user-study between such toolsfrom an information retrieval lens: pull and push. Specifically, we investigatethe user demand of AI-assisted writing, the impact of the two paradigms onquality, ownership of the writing product, and efficiency and enjoyment of thewriting process. We also seek to understand the impact of bias of AI-assistedwriting. Our findings show that users welcome seamless assistance of AI intheir writing. Furthermore, AI helped users to diversify the ideas in theirwriting while keeping it clear and concise more quickly. Users also enjoyed thecollaboration with AI-assisted writing tools and did not feel a lack ofownership. Finally, although participants did not experience bias in ourexperiments, they still expressed explicit and clear concerns that should beaddressed in future AI-assisted writing tools.", "output": "The Future of AI-Assisted Writing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, whichaims to generate a natural language sentence for a multimodal social post (animage as well as its caption) to explain why it contains sarcasm. Although theexisting pioneer study has achieved great success with the BART backbone, itoverlooks the gap between the visual feature space and the decoder semanticspace, the object-level metadata of the image, as well as the potentialexternal knowledge. To solve these limitations, in this work, we propose anovel mulTi-source sEmantic grAph-based Multimodal sarcasm explanation scheme,named TEAM. In particular, TEAM extracts the object-level semantic meta-datainstead of the traditional global visual features from the input image.Meanwhile, TEAM resorts to ConceptNet to obtain the external related knowledgeconcepts for the input text and the extracted object meta-data. Thereafter,TEAM introduces a multi-source semantic graph that comprehensively characterizethe multi-source (i.e., caption, object meta-data, external knowledge) semanticrelations to facilitate the sarcasm reasoning. Extensive experiments on apublic released dataset MORE verify the superiority of our model overcutting-edge methods.", "output": "Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce NaturalInversion, a novel model inversion-based method tosynthesize images that agrees well with the original data distribution withoutusing real data. In NaturalInversion, we propose: (1) a Feature TransferPyramid which uses enhanced image prior of the original data by combining themulti-scale feature maps extracted from the pre-trained classifier, (2) aone-to-one approach generative model where only one batch of images aresynthesized by one generator to bring the non-linearity to optimization and toease the overall optimizing process, (3) learnable Adaptive Channel Scalingparameters which are end-to-end trained to scale the output image channel toutilize the original image prior further. With our NaturalInversion, wesynthesize images from classifiers trained on CIFAR-10/100 and show that ourimages are more consistent with original data distribution than prior works byvisualization and additional analysis. Furthermore, our synthesized imagesoutperform prior works on various applications such as knowledge distillationand pruning, demonstrating the effectiveness of our proposed method.", "output": "NaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Game level blending via machine learning, the process of combining featuresof game levels to create unique and novel game levels using Procedural ContentGeneration via Machine Learning (PCGML) techniques, has gained increasingpopularity in recent years. However, many existing techniques rely onhuman-annotated level representations, which limits game level blending to alimited number of annotated games. Even with annotated games, researchers oftenneed to author an additional shared representation to make blending possible.In this paper, we present a novel approach to game level blending that employsClustering-based Tile Embeddings (CTE), a learned level representationtechnique that can serve as a level representation for unannotated games and aunified level representation across games without the need for humanannotation. CTE represents game level tiles as a continuous vectorrepresentation, unifying their visual, contextual, and behavioral information.We apply this approach to two classic Nintendo games, Lode Runner and TheLegend of Zelda. We run an evaluation comparing the CTE representation to acommon, human-annotated representation in the blending task and find that CTEhas comparable or better performance without the need for human annotation.", "output": "Game Level Blending using a Learned Level Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ever-growing complexity of reinforcement learning (RL) tasks demands adistributed RL system to efficiently generate and process a massive amount ofdata to train intelligent agents. However, existing open-source librariessuffer from various limitations, which impede their practical use inchallenging scenarios where large-scale training is necessary. While industrialsystems from OpenAI and DeepMind have achieved successful large-scale RLtraining, their system architecture and implementation details remainundisclosed to the community. In this paper, we present a novel abstraction onthe dataflows of RL training, which unifies practical RL training acrossdiverse applications into a general framework and enables fine-grainedoptimizations. Following this abstraction, we develop a scalable, efficient,and extensible distributed RL system called ReaLly Scalable RL (SRL). Thesystem architecture of SRL separates major RL computation components and allowsmassively parallelized training. Moreover, SRL offers user-friendly andextensible interfaces for customized algorithms. Our evaluation shows that SRLoutperforms existing academic libraries in both a single machine and amedium-sized cluster. In a large-scale cluster, the novel architecture of SRLleads to up to 3.7x speedup compared to the design choices adopted by theexisting libraries. We also conduct a direct benchmark comparison to OpenAI'sindustrial system, Rapid, in the challenging hide-and-seek environment. SRLreproduces the same solution as reported by OpenAI with up to 5x speedup inwall-clock time. Furthermore, we also examine the performance of SRL in a muchharder variant of the hide-and-seek environment and achieve substantiallearning speedup by scaling SRL to over 15k CPU cores and 32 A100 GPUs.Notably, SRL is the first in the academic community to perform RL experimentsat such a large scale.", "output": "SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent studies have demonstrated the susceptibility of deep neural networksto backdoor attacks. Given a backdoored model, its prediction of a poisonedsample with trigger will be dominated by the trigger information, thoughtrigger information and benign information coexist. Inspired by the mechanismof the optical polarizer that a polarizer could pass light waves withparticular polarizations while filtering light waves with other polarizations,we propose a novel backdoor defense method by inserting a learnable neuralpolarizer into the backdoored model as an intermediate layer, in order topurify the poisoned sample via filtering trigger information while maintainingbenign information. The neural polarizer is instantiated as one lightweightlinear transformation layer, which is learned through solving a well designedbi-level optimization problem, based on a limited clean dataset. Compared toother fine-tuning-based defense methods which often adjust all parameters ofthe backdoored model, the proposed method only needs to learn one additionallayer, such that it is more efficient and requires less clean data. Extensiveexperiments demonstrate the effectiveness and efficiency of our method inremoving backdoors across various neural network architectures and datasets,especially in the case of very limited clean data.", "output": "Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Implicit Neural Representation (INR) is an innovative approach forrepresenting complex shapes or objects without explicitly defining theirgeometry or surface structure. Instead, INR represents objects as continuousfunctions. Previous research has demonstrated the effectiveness of using neuralnetworks as INR for image compression, showcasing comparable performance totraditional methods such as JPEG. However, INR holds potential for variousapplications beyond image compression. This paper introduces Rapid-INR, a novelapproach that utilizes INR for encoding and compressing images, therebyaccelerating neural network training in computer vision tasks. Our methodologyinvolves storing the whole dataset directly in INR format on a GPU, mitigatingthe significant data communication overhead between the CPU and GPU duringtraining. Additionally, the decoding process from INR to RGB format is highlyparallelized and executed on-the-fly. To further enhance compression, wepropose iterative and dynamic pruning, as well as layer-wise quantization,building upon previous work. We evaluate our framework on the imageclassification task, utilizing the ResNet-18 backbone network and threecommonly used datasets with varying image sizes. Rapid-INR reduces memoryconsumption to only 5% of the original dataset size and achieves a maximum6$times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2xspeedup over the DALI training pipeline, with only a marginal decrease inaccuracy. Importantly, Rapid-INR can be readily applied to other computervision tasks and backbone networks with reasonable engineering efforts. Ourimplementation code is publicly available at", "output": "Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning is an approach to collaboratively training machinelearning models for multiple parties that prohibit data sharing. One of thechallenges in federated learning is non-IID data between clients, as a singlemodel can not fit the data distribution for all clients. Meta-learning, such asPer-FedAvg, is introduced to cope with the challenge. Meta-learning learnsshared initial parameters for all clients. Each client employs gradient descentto adapt the initialization to local data distributions quickly to realizemodel personalization. However, due to non-convex loss function and randomnessof sampling update, meta-learning approaches have unstable goals in localadaptation for the same client. This fluctuation in different adaptationdirections hinders the convergence in meta-learning. To overcome thischallenge, we use the historical local adapted model to restrict the directionof the inner loop and propose an elastic-constrained method. As a result, thecurrent round inner loop keeps historical goals and adapts to better solutions.Experiments show our method boosts meta-learning convergence and improvespersonalization without additional calculation and communication. Our methodachieved SOTA on all metrics in three public datasets.", "output": "Elastically-Constrained Meta-Learner for Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural network quantum state (NNQS) has emerged as a promising candidate forquantum many-body problems, but its practical applications are often hinderedby the high cost of sampling and local energy calculation. We develop ahigh-performance NNQS method for textit{ab initio} electronic structurecalculations. The major innovations include: (1) A transformer basedarchitecture as the quantum wave function ansatz; (2) A data-centricparallelization scheme for the variational Monte Carlo (VMC) algorithm whichpreserves data locality and well adapts for different computing architectures;(3) A parallel batch sampling strategy which reduces the sampling cost andachieves good load balance; (4) A parallel local energy evaluation scheme whichis both memory and computationally efficient; (5) Study of real chemicalsystems demonstrates both the superior accuracy of our method compared tostate-of-the-art and the strong and weak scalability for large molecularsystems with up to $120$ spin orbitals.", "output": "NNQS-Transformer: an Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study visual question answering in a setting where the answer has to bemined from a pool of relevant and irrelevant images given as a context. Forsuch a setting, a model must first retrieve relevant images from the pool andanswer the question from these retrieved images. We refer to this problem asretrieval-based visual question answering (or RETVQA in short). The RETVQA isdistinctively different and more challenging than the traditionally-studiedVisual Question Answering (VQA), where a given question has to be answered witha single relevant image in context. Towards solving the RETVQA task, we proposea unified Multi Image BART (MI-BART) that takes a question and retrieved imagesusing our relevance encoder for free-form fluent answer generation. Further, weintroduce the largest dataset in this space, namely RETVQA, which has thefollowing salient features: multi-image and retrieval requirement for VQA,metadata-independent questions over a pool of heterogeneous images, expecting amix of classification-oriented and open-ended generative answers. Our proposedframework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposeddataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9%and 11.8% on the image segment of the publicly available WebQA dataset on theaccuracy and fluency metrics, respectively.", "output": "Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present PaRTE, a collection of 1,126 pairs of Recognizing TextualEntailment (RTE) examples to evaluate whether models are robust toparaphrasing. We posit that if RTE models understand language, theirpredictions should be consistent across inputs that share the same meaning. Weuse the evaluation set to determine if RTE models' predictions change whenexamples are paraphrased. In our experiments, contemporary models change theirpredictions on 8-16% of paraphrased examples, indicating that there is stillroom for improvement.", "output": "Evaluating Paraphrastic Robustness in Textual Entailment Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Multi-Scenario Learning (MSL) is widely used in recommendation andretrieval systems in the industry because it facilitates transfer learning fromdifferent scenarios, mitigating data sparsity and reducing maintenance cost.These efforts produce different MSL paradigms by searching more optimal networkstructure, such as Auxiliary Network, Expert Network, and Multi-Tower Network.It is intuitive that different scenarios could hold their specificcharacteristics, activating the user's intents quite differently. In otherwords, different kinds of auxiliary features would bear varying importanceunder different scenarios. With more discriminative feature representationsrefined in a scenario-aware manner, better ranking performance could be easilyobtained without expensive search for the optimal network structure.Unfortunately, this simple idea is mainly overlooked but much desired inreal-world systems.Further analysis also validates the rationality of adaptivefeature learning under a multi-scenario scheme. Moreover, our A/B test resultson the Alibaba search advertising platform also demonstrate that Maria issuperior in production environments.", "output": "Multi-Scenario Ranking with Adaptive Feature Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Demystifying complex human-ground interactions is essential for accurate andrealistic 3D human motion reconstruction from RGB videos, as it ensuresconsistency between the humans and the ground plane. Prior methods have modeledhuman-ground interactions either implicitly or in a sparse manner, oftenresulting in unrealistic and incorrect motions when faced with noise anduncertainty. In contrast, our approach explicitly represents these interactionsin a dense and continuous manner. To this end, we propose a novel Ground-awareMotion Model for 3D Human Motion Reconstruction, named GraMMaR, which jointlylearns the distribution of transitions in both pose and interaction betweenevery joint and ground plane at each time step of a motion sequence. It istrained to explicitly promote consistency between the motion and distancechange towards the ground. After training, we establish a joint optimizationstrategy that utilizes GraMMaR as a dual-prior, regularizing the optimizationtowards the space of plausible ground-aware motions. This leads to realisticand coherent motion reconstruction, irrespective of the assumed or learnedground plane. Through extensive evaluation on the AMASS and AIST++ datasets,our model demonstrates good generalization and discriminating abilities inchallenging cases including complex and ambiguous human-ground interactions.The code will be released.", "output": "GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A major challenge to deploying robots widely is navigation in human-populatedenvironments, commonly referred to as social robot navigation. While the fieldof social navigation has advanced tremendously in recent years, the fairevaluation of algorithms that tackle social navigation remains hard because itinvolves not just robotic agents moving in static environments but also dynamichuman agents and their perceptions of the appropriateness of robot behavior. Incontrast, clear, repeatable, and accessible benchmarks have acceleratedprogress in fields like computer vision, natural language processing andtraditional robot navigation by enabling researchers to fairly comparealgorithms, revealing limitations of existing solutions and illuminatingpromising new directions. We believe the same approach can benefit socialnavigation. In this paper, we pave the road towards common, widely accessible,and repeatable benchmarking criteria to evaluate social robot navigation. Ourcontributions include (a) a definition of a socially navigating robot as onethat respects the principles of safety, comfort, legibility, politeness, socialcompetency, agent understanding, proactivity, and responsiveness to context,(b) guidelines for the use of metrics, development of scenarios, benchmarks,datasets, and simulators to evaluate social navigation, and (c) a design of asocial navigation metrics framework to make it easier to compare results fromdifferent simulators, robots and datasets.", "output": "Principles and Guidelines for Evaluating Social Robot Navigation Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel value approximation method, namely EigensubspaceRegularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivatedby an analysis of the dynamics of Q-value approximation error in theTemporal-Difference (TD) method, which follows a path defined by the1-eigensubspace of the transition kernel associated with the Markov DecisionProcess (MDP). It reveals a fundamental property of TD learning that hasremained unused in previous deep RL approaches. In ERC, we propose aregularizer that guides the approximation error tending towards the1-eigensubspace, resulting in a more efficient and stable path of valueapproximation. Moreover, we theoretically prove the convergence of the ERCmethod. Besides, theoretical analysis and experiments demonstrate that ERCeffectively reduces the variance of value functions. Among 26 tasks in theDMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides,it shows significant advantages in Q-value approximation and variancereduction. Our code is available at ", "output": "Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In open-domain dialogue generation tasks, contexts and responses in mostdatasets are one-to-one mapped, violating an important many-to-manycharacteristic: a context leads to various responses, and a response answersmultiple contexts. Without such patterns, models poorly generalize and preferresponding safely. Many attempts have been made in either multi-turn settingsfrom a one-to-many perspective or in a many-to-many perspective but limited tosingle-turn settings. The major challenge to many-to-many augment multi-turndialogues is that discretely replacing each turn with semantic similaritybreaks fragile context coherence. In this paper, we propose DialoGue PathSampling (DialoGPS) method in continuous semantic space, the first many-to-manyaugmentation method for multi-turn dialogues. Specifically, we map a dialogueto our extended Brownian Bridge, a special Gaussian process. We sample latentvariables to form coherent dialogue paths in the continuous space. A dialoguepath corresponds to a new multi-turn dialogue and is used as augmented trainingdata. We show the effect of DialoGPS with both automatic and human evaluation.", "output": "DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The understanding of complex human interactions and group activities hasgarnered attention in human-centric computer vision. However, the advancementof the related tasks is hindered due to the difficulty of obtaining large-scalelabeled real-world datasets. To mitigate the issue, we propose M3Act, amulti-view multi-group multi-person human atomic action and group activity datagenerator. Powered by the Unity engine, M3Act contains simulation-ready 3Dscenes and human assets, configurable lighting and camera systems, highlyparameterized modular group activities, and a large degree of domainrandomization during the data generation process. Our data generator is capableof generating large-scale datasets of human activities with multipleviewpoints, modalities (RGB images, 2D poses, 3D motions), and high-qualityannotations for individual persons and multi-person groups (2D bounding boxes,instance segmentation masks, individual actions and group activity categories).Using M3Act, we perform synthetic data pre-training for 2D skeleton-based groupactivity recognition and RGB-based multi-person pose tracking. The resultsindicate that learning from our synthetic datasets largely improves the modelperformances on real-world datasets, with the highest gain of 5.59% and 7.32%respectively in group and person recognition accuracy on CAD2, as well as animprovement of 6.63 in MOTP on HiEve. Pre-training with our synthetic data alsoleads to faster model convergence on downstream tasks (up to 6.8% faster).Moreover, M3Act opens new research problems for 3D group activity generation.We release M3Act3D, an 87.6-hour 3D motion dataset of human activities withlarger group sizes and higher complexity of inter-person interactions thanprevious multi-person datasets. We define multiple metrics and propose acompetitive baseline for the novel task.", "output": "Learning from Synthetic Human Group Activities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous vehicles (AV) are expected to reshape future transportationsystems, and decision-making is one of the critical modules toward high-levelautomated driving. To overcome those complicated scenarios that rule-basedmethods could not cope with well, data-driven decision-making approaches havearoused more and more focus. The datasets to be used in developing data-drivenmethods dramatically influences the performance of decision-making, hence it isnecessary to have a comprehensive insight into the existing datasets. From theaspects of collection sources, driving data can be divided into vehicle,environment, and driver related data. This study compares the state-of-the-artdatasets of these three categories and summarizes their features includingsensors used, annotation, and driving scenarios. Based on the characteristicsof the datasets, this survey also concludes the potential applications ofdatasets on various aspects of AV decision-making, assisting researchers tofind appropriate ones to support their own research. The future trends of AVdataset development are summarized.", "output": "A Survey on Datasets for Decision-making of Autonomous Vehicle."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks can be significantly compressed by pruning, leading to sparsemodels requiring considerably less storage and floating-point operations whilemaintaining predictive performance. Model soups (Wortsman et al., 2022) improvegeneralization and out-of-distribution performance by averaging the parametersof multiple models into a single one without increased inference time. However,identifying models in the same loss basin to leverage both sparsity andparameter averaging is challenging, as averaging arbitrary sparse modelsreduces the overall sparsity due to differing sparse connectivities. In thiswork, we address these challenges by demonstrating that exploring a singleretraining phase of Iterative Magnitude Pruning (IMP) with varyinghyperparameter configurations, such as batch ordering or weight decay, producesmodels that are suitable for averaging and share the same sparse connectivityby design. Averaging these models significantly enhances generalizationperformance compared to their individual components. Building on this idea, weintroduce Sparse Model Soups (SMS), a novel method for merging sparse models byinitiating each prune-retrain cycle with the averaged model of the previousphase. SMS maintains sparsity, exploits sparse network benefits being modularand fully parallelizable, and substantially improves IMP's performance.Additionally, we demonstrate that SMS can be adapted to enhance the performanceof state-of-the-art pruning during training approaches.", "output": "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The objective of augmented reality (AR) is to add digital content to naturalimages and videos to create an interactive experience between the user and theenvironment. Scene analysis and object recognition play a crucial role in AR,as they must be performed quickly and accurately. In this study, a new approachis proposed that involves using oriented bounding boxes with a detection andrecognition deep network to improve performance and processing time. Theapproach is evaluated using two datasets: a real image dataset (DOTA dataset)commonly used for computer vision tasks, and a synthetic dataset that simulatesdifferent environmental, lighting, and acquisition conditions. The focus of theevaluation is on small objects, which are difficult to detect and recognise.The results indicate that the proposed approach tends to produce better AveragePrecision and greater accuracy for small objects in most of the testedconditions.", "output": "Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent developments in Artificial Intelligence (AI) provide unprecedentedautomation opportunities in the Architecture, Engineering, and Construction(AEC) industry. However, despite the enthusiasm regarding the use of AI, 85% ofcurrent big data projects fail. One of the main reasons for AI project failuresin the AEC industry is the disconnect between those who plan or decide to useAI and those who implement it. AEC practitioners often lack a clearunderstanding of the capabilities and limitations of AI, leading to a failureto distinguish between what AI should solve, what it can solve, and what itwill solve, treating these categories as if they are interchangeable. This lackof understanding results in the disconnect between AI planning andimplementation because the planning is based on a vision of what AI shouldsolve without considering if it can or will solve it. To address thischallenge, this work introduces the LeanAI method. The method has beendeveloped using data from several ongoing longitudinal studies analyzing AIimplementations in the AEC industry, which involved 50+ hours of interviewdata. The LeanAI method delineates what AI should solve, what it can solve, andwhat it will solve, forcing practitioners to clearly articulate thesecomponents early in the planning process itself by involving the relevantstakeholders. By utilizing the method, practitioners can effectively plan AIimplementations, thus increasing the likelihood of success and ultimatelyspeeding up the adoption of AI. A case example illustrates the usefulness ofthe method.", "output": "LeanAI: A method for AEC practitioners to effectively plan AI implementations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks are very effective when trained on large datasets for a largenumber of iterations. However, when they are trained on non-stationary streamsof data and in an online fashion, their performance is reduced (1) by theonline setup, which limits the availability of data, (2) due to catastrophicforgetting because of the non-stationary nature of the data. Furthermore,several recent works (Caccia et al., 2022; Lange et al., 2023)arXiv:2205.1345(2) showed that replay methods used in continual learning sufferfrom the stability gap, encountered when evaluating the model continually(rather than only on task boundaries). In this article, we study the effect ofmodel ensembling as a way to improve performance and stability in onlinecontinual learning. We notice that naively ensembling models coming from avariety of training tasks increases the performance in online continuallearning considerably. Starting from this observation, and drawing inspirationsfrom semi-supervised learning ensembling methods, we use a lightweight temporalensemble that computes the exponential moving average of the weights (EMA) attest time, and show that it can drastically increase the performance andstability when used in combination with several methods from the literature.", "output": "Improving Online Continual Learning Performance and Stability with Temporal Ensembles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inverse protein folding is challenging due to its inherent one-to-manymapping characteristic, where numerous possible amino acid sequences can foldinto a single, identical protein backbone. This task involves not onlyidentifying viable sequences but also representing the sheer diversity ofpotential solutions. However, existing discriminative models, such astransformer-based auto-regressive models, struggle to encapsulate the diverserange of plausible solutions. In contrast, diffusion probabilistic models, asan emerging genre of generative approaches, offer the potential to generate adiverse set of sequence candidates for determined protein backbones. We proposea novel graph denoising diffusion model for inverse protein folding, where agiven protein backbone guides the diffusion process on the corresponding aminoacid residue types. The model infers the joint distribution of amino acidsconditioned on the nodes' physiochemical properties and local environment.Moreover, we utilize amino acid replacement matrices for the diffusion forwardprocess, encoding the biologically-meaningful prior knowledge of amino acidsfrom their spatial and sequential neighbors as well as themselves, whichreduces the sampling space of the generative process. Our model achievesstate-of-the-art performance over a set of popular baseline methods in sequencerecovery and exhibits great potential in generating diverse protein sequencesfor a determined protein backbone structure.", "output": "Graph Denoising Diffusion for Inverse Protein Folding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial intelligence technology has been widely used in astronomy, and newartificial intelligence technologies and application scenarios are constantlyemerging. There have been a large number of papers reviewing the application ofartificial intelligence technology in astronomy. However, relevant articlesseldom mention telescope intelligence separately, and it is difficult tounderstand the current development status and research hotspots of telescopeintelligence from these papers. This paper combines the development history ofartificial intelligence technology and the difficulties of criticaltechnologies of telescopes, comprehensively introduces the development andresearch hotspots of telescope intelligence, then conducts statistical analysison various research directions of telescope intelligence and defines theresearch directions' merits. All kinds of research directions are evaluated,and the research trend of each telescope's intelligence is pointed out.Finally, according to the advantages of artificial intelligence technology andthe development trend of telescopes, future research hotspots of telescopeintelligence are given.", "output": "Intelligence of Astronomical Optical Telescope: Present Status and Future Perspectives."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Red Palm Weevil (RPW) is a highly destructive insect causing economiclosses and impacting palm tree farming worldwide. This paper proposes aninnovative approach for sustainable palm tree farming by utilizing advancedtechnologies for the early detection and management of RPW. Our approachcombines computer vision, deep learning (DL), the Internet of Things (IoT), andgeospatial data to detect and classify RPW-infested palm trees effectively. Themain phases include; (1) DL classification using sound data from IoT devices,(2) palm tree detection using YOLOv8 on UAV images, and (3) RPW mapping usinggeospatial data. Our custom DL model achieves 100% precision and recall indetecting and localizing infested palm trees. Integrating geospatial dataenables the creation of a comprehensive RPW distribution map for efficientmonitoring and targeted management strategies. This technology-driven approachbenefits agricultural authorities, farmers, and researchers in managing RPWinfestations and safeguarding palm tree plantations' productivity.", "output": "Sustainable Palm Tree Farming: Leveraging IoT and Multi-Modal Data for Early Detection and Mapping of Red Palm Weevil."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We provide partial implementations of von Neumann's universal constructor anduniversal copier, starting out with three types of simple building blocks usingminimal assumptions. Using the same principles, we also construct Turingmachines. Combining both, we arrive at a proposal for a self-replicating Turingmachine. Our construction allows for mutations if desired, and we give a simpledescription language.", "output": "Towards a Self-Replicating Turing Machine."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Early diagnosis of mental disorders and intervention can facilitate theprevention of severe injuries and the improvement of treatment results. Usingsocial media and pre-trained language models, this study explores howuser-generated data can be used to predict mental disorder symptoms. Our studycompares four different BERT models of Hugging Face with standard machinelearning techniques used in automatic depression diagnosis in recentliterature. The results show that new models outperform the previous approachwith an accuracy rate of up to 97%. Analyzing the results while complementingpast findings, we find that even tiny amounts of data (like users' biodescriptions) have the potential to predict mental disorders. We conclude thatsocial media data is an excellent source of mental health screening, andpre-trained models can effectively automate this critical task.", "output": "Harnessing the Power of Hugging Face Transformers for Predicting Mental Health Disorders in Social Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have showcased their remarkable capability to synthesizediverse and high-quality images, sparking interest in their application forreal image editing. However, existing diffusion-based approaches for localimage editing often suffer from undesired artifacts due to the pixel-levelblending of the noised target images and diffusion latent variables, which lackthe necessary semantics for maintaining image consistency. To address theseissues, we propose PFB-Diff, a Progressive Feature Blending method forDiffusion-based image editing. Unlike previous methods, PFB-Diff seamlesslyintegrates text-guided generated content into the target image throughmulti-level feature blending. The rich semantics encoded in deep features andthe progressive blending scheme from high to low levels ensure semanticcoherence and high quality in edited images. Additionally, we introduce anattention masking mechanism in the cross-attention layers to confine the impactof specific words to desired regions, further improving the performance ofbackground editing. PFB-Diff can effectively address various editing tasks,including object/background replacement and object attribute editing. Ourmethod demonstrates its superior performance in terms of image fidelity,editing accuracy, efficiency, and faithfulness to the original image, withoutthe need for fine-tuning or training.", "output": "PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) exhibit exceptional abilities for causalanalysis between concepts in numerous societally impactful domains, includingmedicine, science, and law. Recent research on LLM performance in variouscausal discovery and inference tasks has given rise to a new ladder in theclassical three-stage framework of causality. In this paper, we advance thecurrent research of LLM-driven causal discovery by proposing a novel frameworkthat combines knowledge-based LLM causal analysis with data-driven causalstructure learning. To make LLM more than a query tool and to leverage itspower in discovering natural and new laws of causality, we integrate thevaluable LLM expertise on existing causal mechanisms into statistical analysisof objective data to build a novel and practical baseline for causal structurelearning.We introduce a universal set of prompts designed to extract causal graphsfrom given variables and assess the influence of LLM prior causality onrecovering causal structures from data. We demonstrate the significantenhancement of LLM expertise on the quality of recovered causal structures fromdata, while also identifying critical challenges and issues, along withpotential approaches to address them. As a pioneering study, this paper aims toemphasize the new frontier that LLMs are opening for classical causal discoveryand inference, and to encourage the widespread adoption of LLM capabilities indata-driven causal analysis.", "output": "From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimizing a machine learning pipeline for a task at hand requires carefulconfiguration of various hyperparameters, typically supported by an AutoMLsystem that optimizes the hyperparameters for the given training dataset. Yet,depending on the AutoML system's own second-order meta-configuration, theperformance of the AutoML process can vary significantly. Current AutoMLsystems cannot automatically adapt their own configuration to a specific usecase. Further, they cannot compile user-defined application constraints on theeffectiveness and efficiency of the pipeline and its generation. In this paper,we propose Caml, which uses meta-learning to automatically adapt its own AutoMLparameters, such as the search strategy, the validation strategy, and thesearch space, for a task at hand. The dynamic AutoML strategy of Caml takesuser-defined constraints into account and obtains constraint-satisfyingpipelines with high predictive performance.", "output": "AutoML in Heavily Constrained Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Irregularities in public health data streams (like COVID-19 Cases) hamperdata-driven decision-making for public health stakeholders. A real-time,computer-generated list of the most important, outlying data points fromthousands of daily-updated public health data streams could assist an expertreviewer in identifying these irregularities. However, existing outlierdetection frameworks perform poorly on this task because they do not accountfor the data volume or for the statistical properties of public health streams.Accordingly, we developed FlaSH (Flagging Streams in public Health), apractical outlier detection framework for public health data users that usessimple, scalable models to capture these statistical properties explicitly. Inan experiment where human experts evaluate FlaSH and existing methods(including deep learning approaches), FlaSH scales to the data volume of thistask, matches or exceeds these other methods in mean accuracy, and identifiesthe outlier points that users empirically rate as more helpful. Based on theseresults, FlaSH has been deployed on data streams used by public healthstakeholders.", "output": "Computationally Assisted Quality Control for Public Health Data Streams."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional large-scale neuroscience models and machine learning utilizesimplified models of individual neurons, relying on collective activity andproperly adjusted connections to perform complex computations. However, eachbiological cortical neuron is inherently a sophisticated computational device,as corroborated in a recent study where it took a deep artificial neuralnetwork with millions of parameters to replicate the input-output relationshipof a detailed biophysical model of a cortical pyramidal neuron. We question thenecessity for these many parameters and introduce the Expressive Leaky Memory(ELM) neuron, a biologically inspired, computationally expressive, yetefficient model of a cortical neuron. Remarkably, our ELM neuron requires only8K trainable parameters to match the aforementioned input-output relationshipaccurately. We find that an accurate model necessitates multiple memory-likehidden states and intricate nonlinear synaptic integration. To assess thecomputational ramifications of this design, we evaluate the ELM neuron onvarious tasks with demanding temporal structures, including a sequentialversion of the CIFAR-10 classification task, the challenging Pathfinder-X task,and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELMneuron outperforms most transformer-based models on the Pathfinder-X task with77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, andsuperior performance compared to classic LSTM models on the variant of theSpiking Heidelberg Digits dataset. These findings indicate a potential forbiologically motivated, computationally efficient neuronal models to enhanceperformance in challenging machine learning tasks.", "output": "The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The autonomous driving community has witnessed a rapid growth in approachesthat embrace an end-to-end algorithm framework, utilizing raw sensor input togenerate vehicle motion plans, instead of concentrating on individual taskssuch as detection and motion prediction. End-to-end systems, in comparison tomodular pipelines, benefit from joint feature optimization for perception andplanning. This field has flourished due to the availability of large-scaledatasets, closed-loop evaluation, and the increasing need for autonomousdriving algorithms to perform effectively in challenging scenarios. In thissurvey, we provide a comprehensive analysis of more than 250 papers, coveringthe motivation, roadmap, methodology, challenges, and future trends inend-to-end autonomous driving. We delve into several critical challenges,including multi-modality, interpretability, causal confusion, robustness, andworld models, amongst others. Additionally, we discuss current advancements infoundation models and visual pre-training, as well as how to incorporate thesetechniques within the end-to-end driving framework. To facilitate futureresearch, we maintain an active repository that contains up-to-date links torelevant literature and open-source projects at", "output": "End-to-end Autonomous Driving: Challenges and Frontiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Single image 3D reconstruction is an important but challenging task thatrequires extensive knowledge of our natural world. Many existing methods solvethis problem by optimizing a neural radiance field under the guidance of 2Ddiffusion models but suffer from lengthy optimization time, 3D inconsistencyresults, and poor geometry. In this work, we propose a novel method that takesa single image of any object as input and generates a full 360-degree 3Dtextured mesh in a single feed-forward pass. Given a single image, we first usea view-conditioned 2D diffusion model, Zero123, to generate multi-view imagesfor the input view, and then aim to lift them up to 3D space. Since traditionalreconstruction methods struggle with inconsistent multi-view predictions, webuild our 3D reconstruction module upon an SDF-based generalizable neuralsurface reconstruction method and propose several critical training strategiesto enable the reconstruction of 360-degree meshes. Without costlyoptimizations, our method reconstructs 3D shapes in significantly less timethan existing methods. Moreover, our method favors better geometry, generatesmore 3D consistent results, and adheres more closely to the input image. Weevaluate our approach on both synthetic data and in-the-wild images anddemonstrate its superiority in terms of both mesh quality and runtime. Inaddition, our approach can seamlessly support the text-to-3D task byintegrating with off-the-shelf text-to-image diffusion models.", "output": "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents UMASS_BioNLP team participation in the MEDIQA-Chat 2023shared task for Task-A and Task-C. We focus especially on Task-C and propose anovel LLMs cooperation system named a doctor-patient loop to generatehigh-quality conversation data sets. The experiment results demonstrate thatour approaches yield reasonable performance as evaluated by automatic metricssuch as ROUGE, medical concept recall, BLEU, and Self-BLEU. Furthermore, weconducted a comparative analysis between our proposed method and ChatGPT andGPT-4. This analysis also investigates the potential of utilizing cooperationLLMs to generate high-quality datasets.", "output": "UMASS_BioNLP at MEDIQA-Chat 2023: Can LLMs generate high-quality synthetic note-oriented doctor-patient conversations?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Feature alignment is the primary means of fusing multimodal data. We proposea feature alignment method that fully fuses multimodal information, whichalternately shifts and expands feature information from different modalities tohave a consistent representation in a feature space. The proposed method canrobustly capture high-level interactions between features of differentmodalities, thus significantly improving the performance of multimodallearning. We also show that the proposed method outperforms other popularmultimodal schemes on multiple tasks. Experimental evaluation of ETT andMIT-BIH-Arrhythmia, datasets shows that the proposed method achieves state ofthe art performance.", "output": "Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Aspect-based sentiment analysis is a long-standing research interest in thefield of opinion mining, and in recent years, researchers have graduallyshifted their focus from simple ABSA subtasks to end-to-end multi-element ABSAtasks. However, the datasets currently used in the research are limited toindividual elements of specific tasks, usually focusing on in-domain settings,ignoring implicit aspects and opinions, and with a small data scale. To addressthese issues, we propose a large-scale Multi-Element Multi-Domain dataset(MEMD) that covers the four elements across five domains, including nearly20,000 review sentences and 30,000 quadruples annotated with explicit andimplicit aspects and opinions for ABSA research. Meanwhile, we evaluategenerative and non-generative baselines on multiple ABSA subtasks under theopen domain setting, and the results show that open domain ABSA as well asmining implicit aspects and opinions remain ongoing challenges to be addressed.The datasets are publicly released at url{", "output": "MEMD-ABSA: A Multi-Element Multi-Domain Dataset for Aspect-Based Sentiment Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic structural causal models (SCMs) are a powerful framework forreasoning in dynamic systems about direct effects which measure how a change inone variable affects another variable while holding all other variablesconstant. The causal relations in a dynamic structural causal model can bequalitatively represented with a full-time causal graph. Assuming linearity andcausal sufficiency and given the full-time causal graph, the direct causaleffect is always identifiable and can be estimated from data by adjusting onany set of variables given by the so-called single-door criterion. However, inmany application such a graph is not available for various reasons butnevertheless experts have access to an abstraction of the full-time causalgraph which represents causal relations between time series while omittingtemporal information. This paper presents a complete identifiability resultwhich characterizes all cases for which the direct effect is graphicallyidentifiable from summary causal graphs and gives two sound finite adjustmentsets that can be used to estimate the direct effect whenever it isidentifiable.", "output": "Identifiability of direct effects from summary causal graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-order Graph Neural Networks (HO-GNNs) have been developed to inferconsistent latent spaces in the heterophilic regime, where the labeldistribution is not correlated with the graph structure. However, most of theexisting HO-GNNs are hop-based, i.e., they rely on the powers of the transitionmatrix. As a result, these architectures are not fully reactive to theclassification loss and the achieved structural filters have static supports.In other words, neither the filters' supports nor their coefficients can belearned with these networks. They are confined, instead, to learn combinationsof filters. To address the above concerns, we propose Diffusion-jump GNNs amethod relying on asymptotic diffusion distances that operates on jumps. Adiffusion-pump generates pairwise distances whose projections determine boththe support and coefficients of each structural filter. These filters arecalled jumps because they explore a wide range of scales in order to find bondsbetween scattered nodes with the same label. Actually, the full process iscontrolled by the classification loss. Both the jumps and the diffusiondistances react to classification errors (i.e. they are learnable).Homophiliation, i.e., the process of learning piecewise smooth latent spaces inthe heterophilic regime, is formulated as a Dirichlet problem: the known labelsdetermine the border nodes and the diffusion-pump ensures a minimal deviationof the semi-supervised grouping from a canonical unsupervised grouping. Thistriggers the update of both the diffusion distances and, consequently, thejumps in order to minimize the classification error. The Dirichlet formulationhas several advantages. It leads to the definition of structural heterophily, anovel measure beyond edge heterophily. It also allows us to investigate linkswith (learnable) diffusion distances, absorbing random walks and stochasticdiffusion.", "output": "Diffusion-Jump GNNs: Homophiliation via Learnable Metric Filters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Approaching the era of ubiquitous computing, human motion sensing plays acrucial role in smart systems for decision making, user interaction, andpersonalized services. Extensive research has been conducted on human tracking,pose estimation, gesture recognition, and activity recognition, which arepredominantly based on cameras in traditional methods. However, the intrusivenature of cameras limits their use in smart home applications. To address this,mmWave radars have gained popularity due to their privacy-friendly features. Inthis work, we propose textit{milliFlow}, a novel deep learning method forscene flow estimation as a complementary motion information for mmWave pointcloud, serving as an intermediate level of features and directly benefitingdownstream human motion sensing tasks. Experimental results demonstrate thesuperior performance of our method with an average 3D endpoint error of 4.6cm,significantly surpassing the competing approaches. Furthermore, byincorporating scene flow information, we achieve remarkable improvements inhuman activity recognition, human parsing, and human body part tracking. Tofoster further research in this area, we provide our codebase and dataset foropen access.", "output": "milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of determining crime types based on criminal behavior facts hasbecome a very important and meaningful task in social science. But the problemfacing the field now is that the data samples themselves are unevenlydistributed, due to the nature of the crime itself. At the same time, data setsin the judicial field are less publicly available, and it is not practical toproduce large data sets for direct training. This article proposes a newtraining model to solve this problem through NLP processing methods. We firstpropose a Crime Fact Data Preprocessing Module (CFDPM), which can balance thedefects of uneven data set distribution by generating new samples. Then we usea large open source dataset (CAIL-big) as our pretraining dataset and a smalldataset collected by ourselves for Fine-tuning, giving it good generalizationability to unfamiliar small datasets. At the same time, we use the improvedBert model with dynamic masking to improve the model. Experiments show that theproposed method achieves state-of-the-art results on the present dataset. Atthe same time, the effectiveness of module CFDPM is proved by experiments. Thisarticle provides a valuable methodology contribution for classifying socialscience texts such as criminal behaviors. Extensive experiments on publicbenchmarks show that the proposed method achieves new state-of-the-art results.", "output": "Classifying Crime Types using Judgment Documents from Social Media."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The need for autonomous robot systems in both the service and the industrialdomain is larger than ever. In the latter, the transition to small batches oreven \"batch size 1\" in production created a need for robot control systemarchitectures that can provide the required flexibility. Such architecturesmust not only have a sufficient knowledge integration framework. It must alsosupport autonomous mission execution and allow for interchangeability andinteroperability between different tasks and robot systems. We introduceSkiROS2, a skill-based robot control platform on top of ROS. SkiROS2 proposes alayered, hybrid control structure for automated task planning, and reactiveexecution, supported by a knowledge base for reasoning about the world stateand entities. The scheduling formulation builds on the extended behavior treemodel that merges task-level planning and execution. This allows for a highdegree of modularity and a fast reaction to changes in the environment. Theskill formulation based on pre-, hold- and post-conditions allows to organizerobot programs and to compose diverse skills reaching from perception tolow-level control and the incorporation of external tools. We relate SkiROS2 tothe field and outline three example use cases that cover task planning,reasoning, multisensory input, integration in a manufacturing execution systemand reinforcement learning.", "output": "SkiROS2: A skill-based Robot Control Platform for ROS."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compositionality is a critical aspect of scalable system design.Reinforcement learning (RL) has recently shown substantial success in tasklearning, but has only recently begun to truly leverage composition. In thispaper, we focus on Boolean composition of learned tasks as opposed tofunctional or sequential composition. Existing Boolean composition for RLfocuses on reaching a satisfying absorbing state in environments with discreteaction spaces, but does not support composable safety (i.e., avoidance)constraints. We advance the state of the art in Boolean composition of learnedtasks with three contributions: i) introduce two distinct notions of safety inthis framework; ii) show how to enforce either safety semantics, provecorrectness (under some assumptions), and analyze the trade-offs between thetwo safety notions; and iii) extend Boolean composition from discrete actionspaces to continuous action spaces. We demonstrate these techniques usingmodified versions of value iteration in a grid world, Deep Q-Network (DQN) in agrid world with image observations, and Twin Delayed DDPG (TD3) in acontinuous-observation and continuous-action Bullet physics environment. Webelieve that these contributions advance the theory of safe reinforcementlearning by allowing zero-shot composition of policies satisfying safetyproperties.", "output": "Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sparse knowledge graph (KG) scenarios pose a challenge for previous KnowledgeGraph Completion (KGC) methods, that is, the completion performance decreasesrapidly with the increase of graph sparsity. This problem is also exacerbatedbecause of the widespread existence of sparse KGs in practical applications. Toalleviate this challenge, we present a novel framework, LR-GCN, that is able toautomatically capture valuable long-range dependency among entities tosupplement insufficient structure features and distill logical reasoningknowledge for sparse KGC. The proposed approach comprises two main components:a GNN-based predictor and a reasoning path distiller. The reasoning pathdistiller explores high-order graph structures such as reasoning paths andencodes them as rich-semantic edges, explicitly compositing long-rangedependencies into the predictor. This step also plays an essential role indensifying KGs, effectively alleviating the sparse issue. Furthermore, the pathdistiller further distills logical reasoning knowledge from these minedreasoning paths into the predictor. These two components are jointly optimizedusing a well-designed variational EM algorithm. Extensive experiments andanalyses on four sparse benchmarks demonstrate the effectiveness of ourproposed method.", "output": "Exploring & Exploiting High-Order Graph Structure for Sparse Knowledge Graph Completion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many applications, e.g., in shared mobility, require coordinating a largenumber of agents. Mean-field reinforcement learning addresses the resultingscalability challenge by optimizing the policy of a representative agent. Inthis paper, we address an important generalization where there exist globalconstraints on the distribution of agents (e.g., requiring capacity constraintsor minimum coverage requirements to be met). We propose Safe-$text{M}^3$-UCRL,the first model-based algorithm that attains safe policies even in the case ofunknown transition dynamics. As a key ingredient, it uses epistemic uncertaintyin the transition model within a log-barrier approach to ensure pessimisticconstraints satisfaction with high probability. We showcaseSafe-$text{M}^3$-UCRL on the vehicle repositioning problem faced by manyshared mobility operators and evaluate its performance through simulationsbuilt on Shenzhen taxi trajectory data. Our algorithm effectively meets thedemand in critical areas while ensuring service accessibility in regions withlow demand.", "output": "Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Documents hold spatial focus and valuable locality characteristics. Forexample, descriptions of listings in real estate or travel blogs containinformation about specific local neighborhoods. This information is valuable tocharacterize how humans perceive their environment. However, the first step tomaking use of this information is to identify the spatial focus (e.g., a city)of a document. Traditional approaches for identifying the spatial focus of adocument rely on detecting and disambiguating toponyms from the document. Thisapproach requires a vocabulary set of location phrases and ad-hoc rules, whichignore important words related to location. Recent topic modeling approachesusing large language models often consider a few topics, each with broadcoverage. In contrast, the spatial focus of a document can be a country, acity, or even a neighborhood, which together, is much larger than the number oftopics considered in these approaches. Additionally, topic modeling methods areoften applied to broad topics of news articles where context is easilydistinguishable. To identify the geographic focus of a document effectively, wepresent a simple but effective Joint Embedding of multi-LocaLitY (JELLY), whichjointly learns representations with separate encoders of document and location.JELLY significantly outperforms state-of-the-art methods for identifyingspatial focus from documents from a number of sources. We also demonstrate casestudies on the arithmetic of the learned representations, including identifyingcities with similar locality characteristics and zero-shot learning to identifydocument spatial focus.", "output": "The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Sentiment classification is a fundamental task in natural languageprocessing, assigning one of the three classes, positive, negative, or neutral,to free texts. However, sentiment classification models are highly domaindependent; the classifier may perform classification with reasonable accuracyin one domain but not in another due to the Semantic multiplicity of wordsgetting poor accuracy. This article presents a new Persian/Arabic multi-domainsentiment analysis method using the cumulative weighted capsule networksapproach. Weighted capsule ensemble consists of training separate capsulenetworks for each domain and a weighting measure called domain belonging degree(DBD). This criterion consists of TF and IDF, which calculates the dependencyof each document for each domain separately; this value is multiplied by thepossible output that each capsule creates. In the end, the sum of thesemultiplications is the title of the final output, and is used to determine thepolarity. And the most dependent domain is considered the final output for eachdomain. The proposed method was evaluated using the Digikala dataset andobtained acceptable accuracy compared to the existing approaches. It achievedan accuracy of 0.89 on detecting the domain of belonging and 0.99 on detectingthe polarity. Also, for the problem of dealing with unbalanced classes, acost-sensitive function was used. This function was able to achieve 0.0162improvements in accuracy for sentiment classification. This approach on AmazonArabic data can achieve 0.9695 accuracies in domain classification.", "output": "Presenting an approach based on weighted CapsuleNet networks for Arabic and Persian multi-domain sentiment analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The word creativity originally described a concept from human psychology, butin the realm of computational creativity (CC), it has become much more. Thequestion of what creativity means when it is part of a computational systemmight be considered core to CC. Pinning down the meaning of creativity, andconcepts like it, becomes salient when researchers port concepts from humanpsychology to computation, a widespread practice extending beyond CC intoartificial intelligence (AI). Yet, the human processes shaping human-inspiredcomputational systems have been little investigated. In this paper, we questionwhich human literatures (social sciences, psychology, neuroscience) enter AIscholarship and how they are translated at the port of entry. This study isbased on 22 in-depth, semi-structured interviews, primarily with human-inspiredAI researchers, half of whom focus on creativity as a major research area. Thispaper focuses on findings most relevant to CC. We suggest that which humanliterature enters AI bears greater scrutiny because ideas may becomedisconnected from context in their home discipline. Accordingly, we recommendthat CC researchers document the decisions and context of their practices,particularly those practices formalizing human concepts for machines.Publishing reflexive commentary on human elements in CC and AI would provide auseful record and permit greater dialogue with other disciplines.", "output": "Interdisciplinary Methods in Computational Creativity: How Human Variables Shape Human-Inspired AI Research."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the mainstream schemes for 2D human pose estimation (HPE) is learningkeypoints heatmaps by a neural network. Existing methods typically improve thequality of heatmaps by customized architectures, such as high-resolutionrepresentation and vision Transformers. In this paper, we proposetextbf{DiffusionPose}, a new scheme that formulates 2D HPE as a keypointsheatmaps generation problem from noised heatmaps. During training, thekeypoints are diffused to random distribution by adding noises and thediffusion model learns to recover ground-truth heatmaps from noised heatmapswith respect to conditions constructed by image feature. During inference, thediffusion model generates heatmaps from initialized heatmaps in a progressivedenoising way. Moreover, we further explore improving the performance ofDiffusionPose with conditions from human structural information. Extensiveexperiments show the prowess of our DiffusionPose, with improvements of 1.6,1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets,respectively.", "output": "Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Performance bugs are non-functional bugs that can even manifest inwell-tested commercial products. Fixing these performance bugs is an importantyet challenging problem. In this work, we address this challenge and present anew approach called Retrieval-Augmented Prompt Generation (RAPGen). Given acode snippet with a performance issue, RAPGen first retrieves a promptinstruction from a pre-constructed knowledge-base of previous performance bugfixes and then generates a prompt using the retrieved instruction. It then usesthis prompt on a Large Language Model (such as Codex) in zero-shot to generatea fix. We compare our approach with the various prompt variations and state ofthe art methods in the task of performance bug fixing. Our evaluation showsthat RAPGen can generate performance improvement suggestions equivalent orbetter than a developer in ~60% of the cases, getting ~39% of them verbatim, inan expert-verified dataset of past performance changes made by C# developers.", "output": "RAPGen: An Approach for Fixing Code Inefficiencies in Zero-Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce RL4CO, an extensive reinforcement learning (RL) forcombinatorial optimization (CO) benchmark. RL4CO employs state-of-the-artsoftware libraries as well as best practices in implementation, such asmodularity and configuration management, to be efficient and easily modifiableby researchers for adaptations of neural network architecture, environments,and algorithms. Contrary to the existing focus on specific tasks like thetraveling salesman problem (TSP) for performance assessment, we underline theimportance of scalability and generalization capabilities for diverseoptimization tasks. We also systematically benchmark sample efficiency,zero-shot generalization, and adaptability to changes in data distributions ofvarious models. Our experiments show that some recent state-of-the-art methodsfall behind their predecessors when evaluated using these new metrics,suggesting the necessity for a more balanced view of the performance of neuralCO solvers. We hope RL4CO will encourage the exploration of novel solutions tocomplex real-world tasks, allowing to compare with existing methods through astandardized interface that decouples the science from the softwareengineering. We make our library publicly available at", "output": "RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Using machine learning models to generate synthetic data has become common inmany fields. Technology to generate synthetic transactions that can be used todetect fraud is also growing fast. Generally, this synthetic data contains onlyinformation about the transaction, such as the time, place, and amount ofmoney. It does not usually contain the individual user's characteristics (ageand gender are occasionally included). Using relatively complex syntheticdemographic data may improve the complexity of transaction data features, thusimproving the fraud detection performance. Benefiting from developments ofmachine learning, some deep learning models have potential to perform betterthan other well-established synthetic data generation methods, such asmicrosimulation. In this study, we built a deep-learning Generative AdversarialNetwork (GAN), called DGGAN, which will be used for demographic datageneration. Our model generates samples during model training, which we foundimportant to overcame class imbalance issues. This study can help improve thecognition of synthetic data and further explore the application of syntheticdata generation in card fraud detection.", "output": "Synthetic Demographic Data Generation for Card Fraud Detection Using GANs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generative AI and large language models hold great promise in enhancingcomputing education by powering next-generation educational technologies forintroductory programming. Recent works have studied these models for differentscenarios relevant to programming education; however, these works are limitedfor several reasons, as they typically consider already outdated models or onlyspecific scenario(s). Consequently, there is a lack of a systematic study thatbenchmarks state-of-the-art models for a comprehensive set of programmingeducation scenarios. In our work, we systematically evaluate two models,ChatGPT (based on GPT-3.5) and GPT-4, and compare their performance with humantutors for a variety of scenarios. We evaluate using five introductory Pythonprogramming problems and real-world buggy programs from an online platform, andassess performance using expert-based annotations. Our results show that GPT-4drastically outperforms ChatGPT (based on GPT-3.5) and comes close to humantutors' performance for several scenarios. These results also highlightsettings where GPT-4 still struggles, providing exciting future directions ondeveloping techniques to improve the performance of these models.", "output": "Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a model that can perform multiple vision tasks and can be adaptedto other downstream tasks efficiently. Despite considerable progress inmulti-task learning, most efforts focus on learning from multi-label data: asingle image set with multiple task labels. Such multi-label data sets arerare, small, and expensive. We say heterogeneous to refer to image sets withdifferent task labels, or to combinations of single-task datasets. Few haveexplored training on such heterogeneous datasets. General-purpose vision modelsare still dominated by single-task pretraining, and it remains unclear how toscale up multi-task models by leveraging mainstream vision datasets designedfor different purposes. The challenges lie in managing large intrinsicdifferences among vision tasks, including data distribution, architectures,task-specific modules, dataset scales, and sampling strategies. To addressthese challenges, we propose to modify and scale up mixture-of-experts (MoE)vision transformers, so that they can simultaneously learn classification,detection, and segmentation on diverse mainstream vision datasets includingImageNet, COCO, and ADE20K. Our approach achieves comparable results tosingle-task state-of-the-art models and demonstrates strong generalization ondownstream tasks. Due to its emergent modularity, this general-purpose modeldecomposes into high-performing components, efficiently adapting to downstreamtasks. We can fine-tune it with fewer training parameters, fewer modelparameters, and less computation. Additionally, its modularity allows for easyexpansion in continual-learning-without-forgetting scenarios. Finally, thesefunctions can be controlled and combined to meet various demands of downstreamtasks.", "output": "An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Successful teaching requires an assumption of how the learner learns - howthe learner uses experiences from the world to update their internal states. Weinvestigate what expectations people have about a learner when they teach themin an online manner using rewards and punishment. We focus on a commonreinforcement learning method, Q-learning, and examine what assumptions peoplehave using a behavioral experiment. To do so, we first establish a normativestandard, by formulating the problem as a machine teaching optimizationproblem. To solve the machine teaching optimization problem, we use a deeplearning approximation method which simulates learners in the environment andlearns to predict how feedback affects the learner's internal states. What dopeople assume about a learner's learning and discount rates when they teachthem an idealized exploration-exploitation task? In a behavioral experiment, wefind that people can teach the task to Q-learners in a relatively efficient andeffective manner when the learner uses a small value for its discounting rateand a large value for its learning rate. However, they still are suboptimal. Wealso find that providing people with real-time updates of how possible feedbackwould affect the Q-learner's internal states weakly helps them teach. Ourresults reveal how people teach using evaluative feedback and provide guidancefor how engineers should design machine agents in a manner that is intuitivefor people.", "output": "Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The framework for Simulation of Human and Artificial Emotion (SHArE)describes the architecture of emotion in terms of parameters transferablebetween psychology, neuroscience, and artificial intelligence. These parameterscan be defined as abstract concepts or granularized down to the voltage levelsof individual neurons. This model enables emotional trajectory design forhumans which may lead to novel therapeutic solutions for various mental healthconcerns. For artificial intelligence, this work provides a compact notationwhich can be applied to neural networks as a means to observe the emotions andmotivations of machines.", "output": "Simulation of Human and Artificial Emotion (SHArE)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most Artificial Intelligence applications are based on supervised machinelearning (ML), which ultimately grounds on manually annotated data. Theannotation process is often performed in terms of a majority vote and this hasbeen proved to be often problematic, as highlighted by recent studies on theevaluation of ML models. In this article we describe and advocate for adifferent paradigm, which we call data perspectivism, which moves away fromtraditional gold standard datasets, towards the adoption of methods thatintegrate the opinions and perspectives of the human subjects involved in theknowledge representation step of ML processes. Drawing on previous works whichinspired our proposal we describe the potential of our proposal for not onlythe more subjective tasks (e.g. those related to human language) but also totasks commonly understood as objective (e.g. medical decision making), andpresent the main advantages of adopting a perspectivist stance in ML, as wellas possible disadvantages, and various ways in which such a stance can beimplemented in practice. Finally, we share a set of recommendations and outlinea research agenda to advance the perspectivist stance in ML.", "output": "Toward a Perspectivist Turn in Ground Truthing for Predictive Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks are a major challenge faced by current machine learningresearch. These purposely crafted inputs fool even the most advanced models,precluding their deployment in safety-critical applications. Extensive researchin computer vision has been carried to develop reliable defense strategies.However, the same issue remains less explored in natural language processing.Our work presents a model-agnostic detector of adversarial text examples. Theapproach identifies patterns in the logits of the target classifier whenperturbing the input text. The proposed detector improves the currentstate-of-the-art performance in recognizing adversarial inputs and exhibitsstrong generalization capabilities across different NLP models, datasets, andword-level attacks.", "output": "\"That Is a Suspicious Reaction!\": Interpreting Logits Variation to Detect NLP Adversarial Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite its importance in both industrial and service robotics, mobilemanipulation remains a significant challenge as it requires a seamlessintegration of end-effector trajectory generation with navigation skills aswell as reasoning over long-horizons. Existing methods struggle to control thelarge configuration space, and to navigate dynamic and unknown environments. Inprevious work, we proposed to decompose mobile manipulation tasks into asimplified motion generator for the end-effector in task space and a trainedreinforcement learning agent for the mobile base to account for kinematicfeasibility of the motion. In this work, we introduce Neural Navigation forMobile Manipulation (N$^2$M$^2$) which extends this decomposition to complexobstacle environments and enables it to tackle a broad range of tasks in realworld settings. The resulting approach can perform unseen, long-horizon tasksin unexplored environments while instantly reacting to dynamic obstacles andenvironmental changes. At the same time, it provides a simple way to define newmobile manipulation tasks. We demonstrate the capabilities of our proposedapproach in extensive simulation and real-world experiments on multiplekinematically diverse mobile manipulators. Code and videos are publiclyavailable at ", "output": "N$^2$M$^2$: Learning Navigation for Arbitrary Mobile Manipulation Motions in Unseen and Dynamic Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge embeddings (KE) represent a knowledge graph (KG) by embeddingentities and relations into continuous vector spaces. Existing methods aremainly structure-based or description-based. Structure-based methods learnrepresentations that preserve the inherent structure of KGs. They cannot wellrepresent abundant long-tail entities in real-world KGs with limited structuralinformation. Description-based methods leverage textual information andlanguage models. Prior approaches in this direction barely outperformstructure-based ones, and suffer from problems like expensive negative samplingand restrictive description demand. In this paper, we propose LMKE, whichadopts Language Models to derive Knowledge Embeddings, aiming at both enrichingrepresentations of long-tail entities and solving problems of priordescription-based methods. We formulate description-based KE learning with acontrastive learning framework to improve efficiency in training andevaluation. Experimental results show that LMKE achieves state-of-the-artperformance on KE benchmarks of link prediction and triple classification,especially for long-tail entities.", "output": "Language Models as Knowledge Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We offer a method for one-shot mask-guided image synthesis that allowscontrolling manipulations of a single image by inverting a quasi-robustclassifier equipped with strong regularizers. Our proposed method, entitledMAGIC, leverages structured gradients from a pre-trained quasi-robustclassifier to better preserve the input semantics while preserving itsclassification accuracy, thereby guaranteeing credibility in the synthesis.Unlike current methods that use complex primitives to supervise the process oruse attention maps as a weak supervisory signal, MAGIC aggregates gradientsover the input, driven by a guide binary mask that enforces a strong, spatialprior. MAGIC implements a series of manipulations with a single frameworkachieving shape and location control, intense non-rigid shape deformations, andcopy/move operations in the presence of repeating objects and gives users firmcontrol over the synthesis by requiring to simply specify binary guide masks.Our study and findings are supported by various qualitative comparisons withthe state-of-the-art on the same images sampled from ImageNet and quantitativeanalysis using machine perception along with a user survey of 100+ participantsthat endorse our synthesis quality. Project page at Code is available at", "output": "MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Collision avoidance is key for mobile robots and agents to operate safely inthe real world. In this work we present SAFER, an efficient and effectivecollision avoidance system that is able to improve safety by correcting thecontrol commands sent by an operator. It combines real-world reinforcementlearning (RL), search-based online trajectory planning, and automatic emergencyintervention, e.g. automatic emergency braking (AEB). The goal of the RL is tolearn an effective corrective control action that is used in a focused searchfor collision-free trajectories, and to reduce the frequency of triggeringautomatic emergency braking. This novel setup enables the RL policy to learnsafely and directly on mobile robots in a real-world indoor environment,minimizing actual crashes even during training. Our real-world experiments showthat, when compared with several baselines, our approach enjoys a higheraverage speed, lower crash rate, less emergency intervention, smallercomputation overhead, and smoother overall control.", "output": "SAFER: Safe Collision Avoidance using Focused and Efficient Trajectory Search with Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer models have recently gained popularity in graph representationlearning as they have the potential to learn complex relationships beyond theones captured by regular graph neural networks. The main research question ishow to inject the structural bias of graphs into the transformer architecture,and several proposals have been made for undirected molecular graphs and,recently, also for larger network graphs. In this paper, we study transformersover directed acyclic graphs (DAGs) and propose architecture adaptationstailored to DAGs: (1) An attention mechanism that is considerably moreefficient than the regular quadratic complexity of transformers and at the sametime faithfully captures the DAG structure, and (2) a positional encoding ofthe DAG's partial order, complementing the former. We rigorously evaluate ourapproach over various types of tasks, ranging from classifying source codegraphs to nodes in citation networks, and show that it is effective in twoimportant aspects: in making graph transformers generally outperform graphneural networks tailored to DAGs and in improving SOTA graph transformerperformance in terms of both quality and efficiency.", "output": "Transformers over Directed Acyclic Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Small targets are often submerged in cluttered backgrounds of infraredimages. Conventional detectors tend to generate false alarms, while CNN-baseddetectors lose small targets in deep layers. To this end, we propose iSmallNet,a multi-stream densely nested network with label decoupling for infrared smallobject detection. On the one hand, to fully exploit the shape information ofsmall targets, we decouple the original labeled ground-truth (GT) map into aninterior map and a boundary one. The GT map, in collaboration with the twoadditional maps, tackles the unbalanced distribution of small objectboundaries. On the other hand, two key modules are delicately designed andincorporated into the proposed network to boost the overall performance. First,to maintain small targets in deep layers, we develop a multi-scale nestedinteraction module to explore a wide range of context information. Second, wedevelop an interior-boundary fusion module to integrate multi-granularityinformation. Experiments on NUAA-SIRST and NUDT-SIRST clearly show thesuperiority of iSmallNet over 11 state-of-the-art detectors.", "output": "iSmallNet: Densely Nested Network with Label Decoupling for Infrared Small Target Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Insects as pollinators play a crucial role in ecosystem management and worldfood production. However, insect populations are declining, calling forefficient methods of insect monitoring. Existing methods analyze video ortime-lapse images of insects in nature, but the analysis is challenging sinceinsects are small objects in complex and dynamic scenes of natural vegetation.In this work, we provide a dataset of primary honeybees visiting threedifferent plant species during two months of the summer period. The datasetconsists of 107,387 annotated time-lapse images from multiple cameras,including 9,423 annotated insects. We present a method pipeline for detectinginsects in time-lapse RGB images. The pipeline consists of a two-step process.Firstly, the time-lapse RGB images are preprocessed to enhance insects in theimages. This Motion-Informed-Enhancement technique uses motion and colors toenhance insects in images. Secondly, the enhanced images are subsequently fedinto a Convolutional Neural network (CNN) object detector. The method improvesthe deep learning object detectors You Only Look Once (YOLO) and FasterRegion-based CNN (Faster R-CNN). Using Motion-Informed-Enhancement, theYOLO-detector improves the average micro F1-score from 0.49 to 0.71, and theFaster R-CNN-detector improves the average micro F1-score from 0.32 to 0.56 onthe dataset. Our dataset and proposed method provide a step forward to automatethe time-lapse camera monitoring of flying insects. The dataset is publishedon: ", "output": "Motion Informed Object Detection of Small Insects in Time-lapse Camera Recordings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This dissertation reports some first steps towards a compositional account ofactive inference and the Bayesian brain. Specifically, we use the tools ofcontemporary applied category theory to supply functorial semantics forapproximate inference. To do so, we define on the `syntactic' side the newnotion of Bayesian lens and show that Bayesian updating composes according tothe compositional lens pattern. Using Bayesian lenses, and inspired bycompositional game theory, we define fibrations of statistical games andclassify various problems of statistical inference as corresponding sections:the chain rule of the relative entropy is formalized as a strict section, whilemaximum likelihood estimation and the free energy give lax sections. In theprocess, we introduce a new notion of `copy-composition'.On the `semantic' side, we present a new formalization of general opendynamical systems (particularly: deterministic, stochastic, and random; anddiscrete- and continuous-time) as certain coalgebras of polynomial functors,which we show collect into monoidal opindexed categories (or, alternatively,into algebras for multicategories of generalized polynomial functors). We usethese opindexed categories to define monoidal bicategories of cilia: dynamicalsystems which control lenses, and which supply the target for our functorialsemantics. Accordingly, we construct functors which explain the bidirectionalcompositional structure of predictive coding neural circuits under the freeenergy principle, thereby giving a formal mathematical underpinning to thebidirectionality observed in the cortex. Along the way, we explain how tocompose rate-coded neural circuits using an algebra for a multicategory oflinear circuit diagrams, showing subsequently that this is subsumed by lensesand polynomial functors.", "output": "Mathematical Foundations for a Compositional Account of the Bayesian Brain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Local search is an effective method for solving large-scale combinatorialoptimization problems, and it has made remarkable progress in recent yearsthrough several subtle mechanisms. In this paper, we found two ways to improvethe local search algorithms in solving Pseudo-Boolean Optimization (PBO):Firstly, some of those mechanisms such as unit propagation are merely used insolving MaxSAT before, which can be generalized to solve PBO as well; Secondly,the existing local search algorithms utilize the heuristic on variables,so-called score, to mainly guide the search. We attempt to gain more insightsinto the clause, as it plays the role of a middleman who builds a bridgebetween variables and the given formula. Hence, we first extended thecombination of unit propagation-based decimation algorithm to PBO problem,giving a further generalized definition of unit clause for PBO problem, andapply it to the existing solver LS-PBO for constructing an initial assignment;then, we introduced a new heuristic on clauses, dubbed care, to set a higherpriority for the clauses that are less satisfied in current iterations.Experiments on benchmarks from the most recent PB Competition, as well as threereal-world application benchmarks including minimum-width confidence band,wireless sensor network optimization, and seating arrangement problems showthat our algorithm DeciLS-PBO has a promising performance compared to thestate-of-the-art algorithms.", "output": "DeciLS-PBO: an Effective Local Search Method for Pseudo-Boolean Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose an auto-encoder architecture for multi-texture synthesis. Theapproach relies on both a compact encoder accounting for second order neuralstatistics and a generator incorporating adaptive periodic content. Images areembedded in a compact and geometrically consistent latent space, where thetexture representation and its spatial organisation are disentangled. Texturesynthesis and interpolation tasks can be performed directly from these latentcodes. Our experiments demonstrate that our model outperforms state-of-the-artfeed-forward methods in terms of visual quality and various texture relatedmetrics.", "output": "A geometrically aware auto-encoder for multi-texture synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data poisoning considers cases when an adversary manipulates the behavior ofmachine learning algorithms through malicious training data. Existing threatmodels of data poisoning center around a single metric, the number of poisonedsamples. In consequence, if attackers can poison more samples than expectedwith affordable overhead, as in many practical scenarios, they may be able torender existing defenses ineffective in a short time. To address this issue, weleverage timestamps denoting the birth dates of data, which are often availablebut neglected in the past. Benefiting from these timestamps, we propose atemporal threat model of data poisoning with two novel metrics, earliness andduration, which respectively measure how long an attack started in advance andhow long an attack lasted. Using these metrics, we define the notions oftemporal robustness against data poisoning, providing a meaningful sense ofprotection even with unbounded amounts of poisoned samples. We present abenchmark with an evaluation protocol simulating continuous data collection andperiodic deployments of updated models, thus enabling empirical evaluation oftemporal robustness. Lastly, we develop and also empirically verify a baselinedefense, namely temporal aggregation, offering provable temporal robustness andhighlighting the potential of our temporal threat model for data poisoning.", "output": "Temporal Robustness against Data Poisoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning in Reinforcement Learning (RL) has been widely studied toovercome training issues of Deep-RL, i.e., exploration cost, data availabilityand convergence time, by introducing a way to enhance training phase withexternal knowledge. Generally, knowledge is transferred from expert-agents tonovices. While this fixes the issue for a novice agent, a good understanding ofthe task on expert agent is required for such transfer to be effective. As analternative, in this paper we propose Expert-Free Online Transfer Learning(EF-OnTL), an algorithm that enables expert-free real-time dynamic transferlearning in multi-agent system. No dedicated expert exists, and transfer sourceagent and knowledge to be transferred are dynamically selected at each transferstep based on agents' performance and uncertainty. To improve uncertaintyestimation, we also propose State Action Reward Next-State Random NetworkDistillation (sars-RND), an extension of RND that estimates uncertainty from RLagent-environment interaction. We demonstrate EF-OnTL effectiveness against ano-transfer scenario and advice-based baselines, with and without expertagents, in three benchmark tasks: Cart-Pole, a grid-based Multi-TeamPredator-Prey (mt-pp) and Half Field Offense (HFO). Our results show thatEF-OnTL achieve overall comparable performance when compared againstadvice-based baselines while not requiring any external input nor thresholdtuning. EF-OnTL outperforms no-transfer with an improvement related to thecomplexity of the task addressed.", "output": "Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "LiDAR point clouds have become the most common data source in autonomousdriving. However, due to the sparsity of point clouds, accurate and reliabledetection cannot be achieved in specific scenarios. Because of theircomplementarity with point clouds, images are getting increasing attention.Although with some success, existing fusion methods either perform hard fusionor do not fuse in a direct manner. In this paper, we propose a generic 3Ddetection framework called MMFusion, using multi-modal features. The frameworkaims to achieve accurate fusion between LiDAR and images to improve 3Ddetection in complex scenes. Our framework consists of two separate streams:the LiDAR stream and the camera stream, which can be compatible with anysingle-modal feature extraction network. The Voxel Local Perception Module inthe LiDAR stream enhances local feature representation, and then theMulti-modal Feature Fusion Module selectively combines feature output fromdifferent streams to achieve better fusion. Extensive experiments have shownthat our framework not only outperforms existing benchmarks but also improvestheir detection, especially for detecting cyclists and pedestrians on KITTIbenchmarks, with strong robustness and generalization capabilities. Hopefully,our work will stimulate more research into multi-modal fusion for autonomousdriving tasks.", "output": "A Generalized Multi-Modal Fusion Detection Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, both empirically and theoretically, we show that severalAI-text detectors are not reliable in practical scenarios. Empirically, we showthat paraphrasing attacks, where a light paraphraser is applied on top of alarge language model (LLM), can break a whole range of detectors, includingones using watermarking schemes as well as neural network-based detectors andzero-shot classifiers. Our experiments demonstrate that retrieval-baseddetectors, designed to evade paraphrasing attacks, are still vulnerable torecursive paraphrasing. We then provide a theoretical impossibility resultindicating that as language models become more sophisticated and better atemulating human text, the performance of even the best-possible detectordecreases. For a sufficiently advanced language model seeking to imitate humantext, even the best-possible detector may only perform marginally better than arandom classifier. Our result is general enough to capture specific scenariossuch as particular writing styles, clever prompt design, or text paraphrasing.We also extend the impossibility result to include the case where pseudorandomnumber generators are used for AI-text generation instead of true randomness.We show that the same result holds with a negligible correction term for allpolynomial-time computable detectors. Finally, we show that even LLMs protectedby watermarking schemes can be vulnerable against spoofing attacks whereadversarial humans can infer hidden LLM text signatures and add them tohuman-generated text to be detected as text generated by the LLMs, potentiallycausing reputational damage to their developers. We believe these results canopen an honest conversation in the community regarding the ethical and reliableuse of AI-generated text.", "output": "Can AI-Generated Text be Reliably Detected?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Language is essentially a complex, intricate system of human expressionsgoverned by grammatical rules. It poses a significant challenge to developcapable AI algorithms for comprehending and grasping a language. As a majorapproach, language modeling has been widely studied for language understandingand generation in the past two decades, evolving from statistical languagemodels to neural language models. Recently, pre-trained language models (PLMs)have been proposed by pre-training Transformer models over large-scale corpora,showing strong capabilities in solving various NLP tasks. Since researchershave found that model scaling can lead to performance improvement, they furtherstudy the scaling effect by increasing the model size to an even larger size.Interestingly, when the parameter scale exceeds a certain level, these enlargedlanguage models not only achieve a significant performance improvement but alsoshow some special abilities that are not present in small-scale languagemodels. To discriminate the difference in parameter scale, the researchcommunity has coined the term large language models (LLM) for the PLMs ofsignificant size. Recently, the research on LLMs has been largely advanced byboth academia and industry, and a remarkable progress is the launch of ChatGPT,which has attracted widespread attention from society. The technical evolutionof LLMs has been making an important impact on the entire AI community, whichwould revolutionize the way how we develop and use AI algorithms. In thissurvey, we review the recent advances of LLMs by introducing the background,key findings, and mainstream techniques. In particular, we focus on four majoraspects of LLMs, namely pre-training, adaptation tuning, utilization, andcapacity evaluation. Besides, we also summarize the available resources fordeveloping LLMs and discuss the remaining issues for future directions.", "output": "A Survey of Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of deep-learning models in many tasks, there have beenconcerns about such models learning shortcuts, and their lack of robustness toirrelevant confounders. When it comes to models directly trained on humanfaces, a sensitive confounder is that of human identities. Many face-relatedtasks should ideally be identity-independent, and perform uniformly acrossdifferent individuals (i.e. be fair). One way to measure and enforce suchrobustness and performance uniformity is through enforcing it during training,assuming identity-related information is available at scale. However, due toprivacy concerns and also the cost of collecting such information, this isoften not the case, and most face datasets simply contain input images andtheir corresponding task-related labels. Thus, improving identity-relatedrobustness without the need for such annotations is of great importance. Here,we explore using face-recognition embedding vectors, as proxies for identities,to enforce such robustness. We propose to use the structure in theface-recognition embedding space, to implicitly emphasize rare samples withineach class. We do so by weighting samples according to their conditionalinverse density (CID) in the proxy embedding space. Our experiments suggestthat such a simple sample weighting scheme, not only improves the trainingrobustness, it often improves the overall performance as a result of suchrobustness. We also show that employing such constraints during trainingresults in models that are significantly less sensitive to different levels ofbias in the dataset.", "output": "Improving Identity-Robustness for Face Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Physicians considering clinical trials for their patients are met with thelaborious process of checking many text based eligibility criteria. LargeLanguage Models (LLMs) have shown to perform well for clinical informationextraction and clinical reasoning, including medical tests, but not yet inreal-world scenarios. This paper investigates the use of InstructGPT to assistphysicians in determining eligibility for clinical trials based on a patient'ssummarised medical profile. Using a prompting strategy combining one-shot,selection-inference and chain-of-thought techniques, we investigate theperformance of LLMs on 10 synthetically created patient profiles. Performanceis evaluated at four levels: ability to identify screenable eligibilitycriteria from a trial given a medical profile; ability to classify for eachindividual criterion whether the patient qualifies; the overall classificationwhether a patient is eligible for a clinical trial and the percentage ofcriteria to be screened by physician. We evaluated against 146 clinical trialsand a total of 4,135 eligibility criteria. The LLM was able to correctlyidentify the screenability of 72% (2,994/4,135) of the criteria. Additionally,72% (341/471) of the screenable criteria were evaluated correctly. Theresulting trial level classification as eligible or ineligible resulted in arecall of 0.5. By leveraging LLMs with a physician-in-the-loop, a recall of 1.0and precision of 0.71 on clinical trial level can be achieved while reducingthe amount of criteria to be checked by an estimated 90%. LLMs can be used toassist physicians with pre-screening of patients for clinical trials. Byforcing instruction-tuned LLMs to produce chain-of-thought responses, thereasoning can be made transparent to and the decision process becomes amenableby physicians, thereby making such a system feasible for use in real-worldscenarios.", "output": "Improving Patient Pre-screening for Clinical Trials: Assisting Physicians with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hypernetworks, neural networks that predict the parameters of another neuralnetwork, are powerful models that have been successfully used in diverseapplications from image generation to multi-task learning. Unfortunately,existing hypernetworks are often challenging to train. Training typicallyconverges far more slowly than for non-hypernetwork models, and the rate ofconvergence can be very sensitive to hyperparameter choices. In this work, weidentify a fundamental and previously unidentified problem that contributes tothe challenge of training hypernetworks: a magnitude proportionality betweenthe inputs and outputs of the hypernetwork. We demonstrate both analyticallyand empirically that this can lead to unstable optimization, thereby slowingdown convergence, and sometimes even preventing any learning. We present asimple solution to this problem using a revised hypernetwork formulation thatwe call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposedsolution on several hypernetwork tasks, where it consistently stabilizestraining and achieves faster convergence. Furthermore, we perform acomprehensive ablation study including choices of activation function,normalization strategies, input dimensionality, and hypernetwork architecture;and find that MIP improves training in all scenarios. We provide easy-to-usecode that can turn existing networks into MIP-based hypernetworks.", "output": "Magnitude Invariant Parametrizations Improve Hypernetwork Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantitatively profiling a scholar's scientific impact is important to modernresearch society. Current practices with bibliometric indicators (e.g.,h-index), lists, and networks perform well at scholar ranking, but do notprovide structured context for scholar-centric, analytical tasks such asprofile reasoning and understanding. This work presents GeneticFlow (GF), asuite of novel graph-based scholar profiles that fulfill three essentialrequirements: structured-context, scholar-centric, and evolution-rich. Wepropose a framework to compute GF over large-scale academic data sources withmillions of scholars. The framework encompasses a new unsupervisedadvisor-advisee detection algorithm, a well-engineered citation type classifierusing interpretable features, and a fine-tuned graph neural network (GNN)model. Evaluations are conducted on the real-world task of scientific awardinference. Experiment outcomes show that the F1 score of best GF profilesignificantly outperforms alternative methods of impact indicators andbibliometric networks in all the 6 computer science fields considered.Moreover, the core GF profiles, with 63.6%-66.5% nodes and 12.5%-29.9% edges ofthe full profile, still significantly outrun existing methods in 5 out of 6fields studied. Visualization of GF profiling result also reveals humanexplainable patterns for high-impact scholars.", "output": "Impact-Oriented Contextual Scholar Profiling using Self-Citation Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer-based language models, including ChatGPT, have demonstratedexceptional performance in various natural language generation tasks. However,there has been limited research evaluating ChatGPT's keyphrase generationability, which involves identifying informative phrases that accurately reflecta document's content. This study seeks to address this gap by comparingChatGPT's keyphrase generation performance with state-of-the-art models, whilealso testing its potential as a solution for two significant challenges in thefield: domain adaptation and keyphrase generation from long documents. Weconducted experiments on six publicly available datasets from scientificarticles and news domains, analyzing performance on both short and longdocuments. Our results show that ChatGPT outperforms current state-of-the-artmodels in all tested datasets and environments, generating high-qualitykeyphrases that adapt well to diverse domains and document lengths.", "output": "ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this article, a benchmark for real-world bin packing problems is proposed.This dataset consists of 12 instances of varying levels of complexity regardingsize (with the number of packages ranging from 38 to 53) and user-definedrequirements. In fact, several real-world-oriented restrictions were taken intoaccount to build these instances: i) item and bin dimensions, ii) weightrestrictions, iii) affinities among package categories iv) preferences forpackage ordering and v) load balancing. Besides the data, we also offer an owndeveloped Python script for the dataset generation, coined Q4RealBPP-DataGen.The benchmark was initially proposed to evaluate the performance of quantumsolvers. Therefore, the characteristics of this set of instances were designedaccording to the current limitations of quantum devices. Additionally, thedataset generator is included to allow the construction of general-purposebenchmarks. The data introduced in this article provides a baseline that willencourage quantum computing researchers to work on real-world bin packingproblems.", "output": "Benchmark dataset and instance generator for Real-World Three-Dimensional Bin Packing Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "During the continuous evolution of one organism's ancestry, its genesaccumulate extensive experiences and knowledge, enabling newborn descendants torapidly adapt to their specific environments. Motivated by this observation, wepropose a novel machine learning paradigm Learngene to enable learning modelsto incorporate three key characteristics of genes. (i) Accumulating: theknowledge is accumulated during the continuous learning of an ancestry model.(ii) Condensing: the extensive accumulated knowledge is condensed into a muchmore compact information piece, i.e., learngene. (iii) Inheriting: thecondensed learngene is inherited to make it easier for descendant models toadapt to new environments. Since accumulating has been studied inwell-established paradigms like large-scale pre-training and lifelong learning,we focus on condensing and inheriting, which induces three key issues and weprovide the preliminary solutions to these issues in this paper: (i) LearngeneForm: the learngene is set to a few integral layers that can preservesignificance. (ii) Learngene Condensing: we identify which layers among theancestry model have the most similarity as one pseudo descendant model. (iii)Learngene Inheriting: to construct distinct descendant models for the specificdownstream tasks, we stack some randomly initialized layers to the learngenelayers. Extensive experiments across various settings, including usingdifferent network architectures like Vision Transformer (ViT) and ConvolutionalNeural Networks (CNNs) on different datasets, are carried out to confirm fouradvantages of Learngene: it makes the descendant models 1) converge morequickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and4) require fewer training samples to converge.", "output": "Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperparameter (HP) optimization of deep learning (DL) is essential for highperformance. As DL often requires several hours to days for its training, HPoptimization (HPO) of DL is often prohibitively expensive. This boosted theemergence of tabular or surrogate benchmarks, which enable querying the(predictive) performance of DL with a specific HP configuration in a fraction.However, since the actual runtime of a DL training is significantly differentfrom its query response time, simulators of an asynchronous HPO, e.g.multi-fidelity optimization, must wait for the actual runtime at each iterationin a na\"ive implementation; otherwise, the evaluation order during simulationdoes not match with the real experiment. To ease this issue, we developed aPython wrapper and describe its usage. This wrapper forces each worker to waitso that we yield exactly the same evaluation order as in the real experimentwith only $10^{-2}$ seconds of waiting instead of waiting several hours. Ourimplementation is available at", "output": "Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning leverages knowledge from other domains and has beensuccessful in many applications. Transfer learning methods rely on the overallsimilarity of the source and target domains. However, in some cases, it isimpossible to provide an overall similar source domain, and only some sourcedomains with similar local features can be provided. Can transfer learning beachieved? In this regard, we propose a multi-source adversarial transferlearning method based on local feature similarity to the source domain tohandle transfer scenarios where the source and target domains have only localsimilarities. This method extracts transferable local features between a singlesource domain and the target domain through a sub-network. Specifically, thefeature extractor of the sub-network is induced by the domain discriminator tolearn transferable knowledge between the source domain and the target domain.The extracted features are then weighted by an attention module to suppressnon-transferable local features while enhancing transferable local features. Inorder to ensure that the data from the target domain in different sub-networksin the same batch is exactly the same, we designed a multi-source domainindependent strategy to provide the possibility for later local feature fusionto complete the key features required. In order to verify the effectiveness ofthe method, we made the dataset \"Local Carvana Image Masking Dataset\". Applyingthe proposed method to the image segmentation task of the proposed datasetachieves better transfer performance than other multi-source transfer learningmethods. It is shown that the designed transfer learning method is feasible fortransfer scenarios where the source and target domains have only localsimilarities.", "output": "Multi-source adversarial transfer learning based on similar source domains with local features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The increasingly popular adoption of source code in many critical tasksmotivates the development of data augmentation (DA) techniques to enhancetraining data and improve various capabilities (e.g., robustness andgeneralizability) of these models. Although a series of DA methods have beenproposed and tailored for source code models, there lacks a comprehensivesurvey and examination to understand their effectiveness and implications. Thispaper fills this gap by conducting a comprehensive and integrative survey ofdata augmentation for source code, wherein we systematically compile andencapsulate existing literature to provide a comprehensive overview of thefield. We start by constructing a taxonomy of DA for source code models modelapproaches, followed by a discussion on prominent, methodologicallyillustrative approaches. Next, we highlight the general strategies andtechniques to optimize the DA quality. Subsequently, we underscore techniquesthat find utility in widely-accepted source code scenarios and downstreamtasks. Finally, we outline the prevailing challenges and potentialopportunities for future research. In essence, this paper endeavors todemystify the corpus of existing literature on DA for source code models, andfoster further exploration in this sphere. Complementing this, we present acontinually updated GitHub repository that hosts a list of update-to-datepapers on DA for source code models, accessible aturl{", "output": "Data Augmentation Approaches for Source Code Models: A Survey."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The recent development of generative and large language models (LLMs) posesnew challenges for model evaluation that the research community and industryare grappling with. While the versatile capabilities of these models igniteexcitement, they also inevitably make a leap toward homogenization: powering awide range of applications with a single, often referred to as``general-purpose'', model. In this position paper, we argue that modelevaluation practices must take on a critical task to cope with the challengesand responsibilities brought by this homogenization: providing validassessments for whether and how much human needs in downstream use cases can besatisfied by the given model (socio-technical gap). By drawing on lessons fromthe social sciences, human-computer interaction (HCI), and theinterdisciplinary field of explainable AI (XAI), we urge the community todevelop evaluation methods based on real-world socio-requirements and embracediverse evaluation methods with an acknowledgment of trade-offs between realismto socio-requirements and pragmatic costs to conduct the evaluation. By mappingHCI and current NLG evaluation methods, we identify opportunities forevaluation methods for LLMs to narrow the socio-technical gap and pose openquestions.", "output": "Rethinking Model Evaluation as Narrowing the Socio-Technical Gap."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work will enter the submission stage, so specific information will betemporarily hidden, also hide the title.", "output": "A Work Based on GAN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A singular attribute of humankind is our ability to undertake novel,cooperative behavior, or teamwork. This requires that we can communicate goals,plans, and ideas between the brains of individuals to create sharedintentionality. Using the information processing model of David Marr, I derivenecessary characteristics of basic mechanisms to enable shared intentionalitybetween prelinguistic computational agents and indicate how these could beimplemented in present-day AI-based robots.More speculatively, I suggest the mechanisms derived by this thoughtexperiment apply to humans and extend to provide explanations for humanrationality and aspects of intentional and phenomenal consciousness that accordwith observation. This yields what I call the Shared Intentionality FirstTheory (SIFT) for rationality and consciousness.The significance of shared intentionality has been recognized and advocatedpreviously, but typically from a sociological or behavioral point of view. SIFTcomplements prior work by applying a computer science perspective to theunderlying mechanisms.", "output": "On Computational Mechanisms for Shared Intentionality, and Speculation on Rationality and Consciousness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the fact that adversarial training has become the de facto method forimproving the robustness of deep neural networks, it is well-known that vanillaadversarial training suffers from daunting robust overfitting, resulting inunsatisfactory robust generalization. A number of approaches have been proposedto address these drawbacks such as extra regularization, adversarial weightsperturbation, and training with more data over the last few years. However, therobust generalization improvement is yet far from satisfactory. In this paper,we approach this challenge with a brand new perspective -- refining historicaloptimization trajectories. We propose a new method named textbf{WeightedOptimization Trajectories (WOT)} that leverages the optimization trajectoriesof adversarial training in time. We have conducted extensive experiments todemonstrate the effectiveness of WOT under various state-of-the-art adversarialattacks. Our results show that WOT integrates seamlessly with the existingadversarial training methods and consistently overcomes the robust overfittingissue, resulting in better adversarial robustness. For example, WOT boosts therobust accuracy of AT-PGD under AA-$L_{infty}$ attack by 1.53% $sim$ 6.11%and meanwhile increases the clean accuracy by 0.55%$sim$5.47% across SVHN,CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.", "output": "Enhancing Adversarial Training via Reweighting Optimization Trajectory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, badminton analytics has drawn attention due to theadvancement of artificial intelligence and the efficiency of data collection.While there is a line of effective applications to improve and investigateplayer performance, there are only a few public badminton datasets that can beused for researchers outside the badminton domain. Existing badminton singlesdatasets focus on specific matchups; however, they cannot provide comprehensivestudies on different players and various matchups. In this paper, we provide abadminton singles dataset, ShuttleSet22, which is collected from high-rankingmatches in 2022. ShuttleSet22 consists of 30,172 strokes in 2,888 rallies inthe training set, 1,400 strokes in 450 rallies in the validation set, and 2,040strokes in 654 rallies in the testing set with detailed stroke-level metadatawithin a rally. To benchmark existing work with ShuttleSet22, we test thestate-of-the-art stroke forecasting approach, ShuttleNet, with thecorresponding stroke forecasting task, i.e., predict the future strokes basedon the given strokes of each rally. We also hold a challenge, Track 2:Forecasting Future Turn-Based Strokes in Badminton Rallies, at CoachAIBadminton Challenge 2023 to boost researchers to tackle this problem. Thebaseline codes and the dataset will be made available on", "output": "ShuttleSet22: Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Rashomon Effect describes the following phenomenon: for a given datasetthere may exist many models with equally good performance but with differentsolution strategies. The Rashomon Effect has implications for ExplainableMachine Learning, especially for the comparability of explanations. We providea unified view on three different comparison scenarios and conduct aquantitative evaluation across different datasets, models, attribution methods,and metrics. We find that hyperparameter-tuning plays a role and that metricselection matters. Our results provide empirical support for previouslyanecdotal evidence and exhibit challenges for both scientists andpractitioners.", "output": "An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertical Federated Learning (VFL) attracts increasing attention because itempowers multiple parties to jointly train a privacy-preserving model oververtically partitioned data. Recent research has shown that applyingzeroth-order optimization (ZOO) has many advantages in building a practical VFLalgorithm. However, a vital problem with the ZOO-based VFL is its slowconvergence rate, which limits its application in handling modern large models.To address this problem, we propose a cascaded hybrid optimization method inVFL. In this method, the downstream models (clients) are trained with ZOO toprotect privacy and ensure that no internal information is shared. Meanwhile,the upstream model (server) is updated with first-order optimization (FOO)locally, which significantly improves the convergence rate, making it feasibleto train the large models without compromising privacy and security. Wetheoretically prove that our VFL framework converges faster than the ZOO-basedVFL, as the convergence of our framework is not limited by the size of theserver model, making it effective for training large models with the major parton the server. Extensive experiments demonstrate that our method achievesfaster convergence than the ZOO-based VFL framework, while maintaining anequivalent level of privacy protection. Moreover, we show that the convergenceof our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, wedemonstrate that our method makes the training of a large model feasible.", "output": "Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper describes our participation in the MentalRiskES task at IberLEF2023. The task involved predicting the likelihood of an individual experiencingdepression based on their social media activity. The dataset consisted ofconversations from 175 Telegram users, each labeled according to their evidenceof suffering from the disorder. We used a combination of traditional machinelearning and deep learning techniques to solve four predictive subtasks: binaryclassification, simple regression, multiclass classification, and multi-outputregression.We approached this by training a model to solve the multi-output regressioncase and then transforming the predictions to work for the other threesubtasks.We compare the performance of two modeling approaches: fine-tuning aBERT-based model directly for the task or using its embeddings as inputs to alinear regressor, with the latter yielding better results. The code toreproduce our results can be found at:", "output": "A Framework for Identifying Depression on Social Media: MentalRiskES@IberLEF 2023."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper studies a category of visual question answering tasks, in whichaccessing external knowledge is necessary for answering the questions. Thiscategory is called outside-knowledge visual question answering (OK-VQA). Amajor step in developing OK-VQA systems is to retrieve relevant documents forthe given multi-modal query. Current state-of-the-art asymmetric denseretrieval model for this task uses an architecture with a multi-modal queryencoder and a uni-modal document encoder. Such an architecture requires a largeamount of training data for effective performance. We propose an automatic datageneration pipeline for pre-training passage retrieval models for OK-VQA tasks.The proposed approach leads to 26.9% Precision@5 improvements compared to thecurrent state-of-the-art asymmetric architecture. Additionally, the proposedpre-training approach exhibits a good ability in zero-shot retrieval scenarios.", "output": "Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The task of recognising Handwritten Mathematical Expressions (HMER) iscrucial in the fields of digital education and scholarly research. However, itis difficult to accurately determine the length and complex spatialrelationships among symbols in handwritten mathematical expressions. In thisstudy, we present a novel encoder-decoder architecture (DenseBAM-GI) for HMER,where the encoder has a Bottleneck Attention Module (BAM) to improve featurerepresentation and the decoder has a Gated Input-GRU (GI-GRU) unit with anextra gate to make decoding long and complex expressions easier. The proposedmodel is an efficient and lightweight architecture with performance equivalentto state-of-the-art models in terms of Expression Recognition Rate (exprate).It also performs better in terms of top 1, 2, and 3 error accuracy across theCROHME 2014, 2016, and 2019 datasets. DenseBAM-GI achieves the best exprateamong all models on the CROHME 2019 dataset. Importantly, these successes areaccomplished with a drop in the complexity of the calculation and a reductionin the need for GPU memory.", "output": "DenseBAM-GI: Attention Augmented DeneseNet with momentum aided GRU for HMER."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large multimodal models trained on natural documents, which interleave imagesand text, outperform models trained on image-text pairs on various multimodalbenchmarks that require reasoning over one or multiple images to generate atext. However, the datasets used to train these models have not been released,and the collection process has not been fully specified. We introduce theOBELISC dataset, an open web-scale filtered dataset of interleaved image-textdocuments comprising 141 million web pages extracted from Common Crawl, 353million associated images, and 115 billion text tokens. We describe the datasetcreation process, present comprehensive filtering rules, and provide ananalysis of the dataset's content. To show the viability of OBELISC, we trainan 80 billion parameters vision and language model on the dataset and obtaincompetitive performance on various multimodal benchmarks. We release the codeto reproduce the dataset along with the dataset itself.", "output": "OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Video retrieval (VR) involves retrieving the ground truth video from thevideo database given a text caption or vice-versa. The two important componentsof compositionality: objects & attributes and actions are joined using correctsemantics to form a proper text query. These components (objects & attributes,actions and semantics) each play an important role to help distinguish amongvideos and retrieve the correct ground truth video. However, it is unclear whatis the effect of these components on the video retrieval performance. Wetherefore, conduct a systematic study to evaluate the compositional andsemantic understanding of video retrieval models on standard benchmarks such asMSRVTT, MSVD and DIDEMO. The study is performed on two categories of videoretrieval models: (i) which are pre-trained on video-text pairs and fine-tunedon downstream video retrieval datasets (Eg. Frozen-in-Time, Violet, MCQ etc.)(ii) which adapt pre-trained image-text representations like CLIP for videoretrieval (Eg. CLIP4Clip, XCLIP, CLIP2Video etc.). Our experiments reveal thatactions and semantics play a minor role compared to objects & attributes invideo understanding. Moreover, video retrieval models that use pre-trainedimage-text representations (CLIP) have better semantic and compositionalunderstanding as compared to models pre-trained on video-text data.", "output": "ICSVR: Investigating Compositional and Semantic Understanding in Video Retrieval Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Cell line authentication plays a crucial role in the biomedical field,ensuring researchers work with accurately identified cells. Supervised deeplearning has made remarkable strides in cell line identification by studyingcell morphological features through cell imaging. However, batch effects, asignificant issue stemming from the different times at which data is generated,lead to substantial shifts in the underlying data distribution, thuscomplicating reliable differentiation between cell lines from distinct batchcultures. To address this challenge, we introduce CLANet, a pioneeringframework for cross-batch cell line identification using brightfield images,specifically designed to tackle three distinct batch effects. We propose a cellcluster-level selection method to efficiently capture cell density variations,and a self-supervised learning strategy to manage image quality variations,thus producing reliable patch representations. Additionally, we adopt multipleinstance learning(MIL) for effective aggregation of instance-level features forcell line identification. Our innovative time-series segment sampling modulefurther enhances MIL's feature-learning capabilities, mitigating biases fromvarying incubation times across batches. We validate CLANet using data from 32cell lines across 93 experimental batches from the AstraZeneca Global CellBank. Our results show that CLANet outperforms related approaches (e.g. domainadaptation, MIL), demonstrating its effectiveness in addressing batch effectsin cell line identification.", "output": "CLANet: A Comprehensive Framework for Cross-Batch Cell Line Identification Using Brightfield Images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The class imbalance problem in deep learning has been explored in severalstudies, but there has yet to be a systematic analysis of this phenomenon inobject detection. Here, we present comprehensive analyses and experiments ofthe foreground-background (F-B) imbalance problem in object detection, which isvery common and caused by small, infrequent objects of interest. Weexperimentally study the effects of different aspects of F-B imbalance (objectsize, number of objects, dataset size, object type) on detection performance.In addition, we also compare 9 leading methods for addressing this problem,including Faster-RCNN, SSD, OHEM, Libra-RCNN, Focal-Loss, GHM, PISA, YOLO-v3,and GFL with a range of datasets from different imaging domains. We concludethat (1) the F-B imbalance can indeed cause a significant drop in detectionperformance, (2) The detection performance is more affected by F-B imbalancewhen fewer training data are available, (3) in most cases, decreasing objectsize leads to larger performance drop than decreasing number of objects, giventhe same change in the ratio of object pixels to non-object pixels, (6) amongall selected methods, Libra-RCNN and PISA demonstrate the best performance inaddressing the issue of F-B imbalance. (7) When the training dataset size islarge, the choice of method is not impactful (8) Soft-sampling methods,including focal-loss, GHM, and GFL, perform fairly well on average but arerelatively unstable.", "output": "A systematic study of the foreground-background imbalance problem in deep learning for object detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meeting online is becoming the new normal. Creating an immersive experiencefor online meetings is a necessity towards more diverse and seamlessenvironments. Efficient photorealistic rendering of human 3D dynamics is thecore of immersive meetings. Current popular applications achieve real-timeconferencing but fall short in delivering photorealistic human dynamics, eitherdue to limited 2D space or the use of avatars that lack realistic interactionsbetween participants. Recent advances in neural rendering, such as the NeuralRadiance Field (NeRF), offer the potential for greater realism in metaversemeetings. However, the slow rendering speed of NeRF poses challenges forreal-time conferencing. We envision a pipeline for a future extended realitymetaverse conferencing system that leverages monocular video acquisition andfree-viewpoint synthesis to enhance data and hardware efficiency. Towards animmersive conferencing experience, we explore an accelerated NeRF-basedfree-viewpoint synthesis algorithm for rendering photorealistic human dynamicsmore efficiently. We show that our algorithm achieves comparable renderingquality while performing training and inference 44.5% and 213% faster thanstate-of-the-art methods, respectively. Our exploration provides a design basisfor constructing metaverse conferencing systems that can handle complexapplication scenarios, including dynamic scene relighting with customizedthemes and multi-user conferencing that harmonizes real-world people into anextended world.", "output": "Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The lack of ability to adapt the motion compensation model to video contentis an important limitation of current end-to-end learned video compressionmodels. This paper advances the state-of-the-art by proposing an adaptivemotion-compensation model for end-to-end rate-distortion optimized hierarchicalbi-directional video compression. In particular, we propose two novelties: i) amulti-scale deformable alignment scheme at the feature level combined withmulti-scale conditional coding, ii) motion-content adaptive inference. Inaddition, we employ a gain unit, which enables a single model to operate atmultiple rate-distortion operating points. We also exploit the gain unit tocontrol bit allocation among intra-coded vs. bi-directionally coded frames byfine tuning corresponding models for truly flexible-rate learned video coding.Experimental results demonstrate state-of-the-art rate-distortion performanceexceeding those of all prior art in learned video coding.", "output": "Multi-Scale Deformable Alignment and Content-Adaptive Inference for Flexible-Rate Bi-Directional Video Compression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present Palm, a solution to the Long-Term Action Anticipation (LTA) taskutilizing vision-language and large language models. Given an input video withannotated action periods, the LTA task aims to predict possible future actions.We hypothesize that an optimal solution should capture the interdependencybetween past and future actions, and be able to infer future actions based onthe structure and dependency encoded in the past actions. Large language modelshave demonstrated remarkable commonsense-based reasoning ability. Inspired bythat, Palm chains an image captioning model and a large language model. Itpredicts future actions based on frame descriptions and action labels extractedfrom the input videos. Our method outperforms other participants in the EGO4DLTA challenge and achieves the best performance in terms of action prediction.Our code is available at ", "output": "Palm: Predicting Actions through Language Models @ Ego4D Long-Term Action Anticipation Challenge 2023."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper investigates the impact of LiDAR configuration shifts on theperformance of 3D LiDAR point cloud semantic segmentation models, a topic notextensively studied before. We explore the effect of using different LiDARchannels when training and testing a 3D LiDAR point cloud semantic segmentationmodel, utilizing Cylinder3D for the experiments. A Cylinder3D model is trainedand tested on simulated 3D LiDAR point cloud datasets created using theMississippi State University Autonomous Vehicle Simulator (MAVS) and 32, 64channel 3D LiDAR point clouds of the RELLIS-3D dataset collected in areal-world off-road environment. Our experimental results demonstrate thatsensor and spatial domain shifts significantly impact the performance ofLiDAR-based semantic segmentation models. In the absence of spatial domainchanges between training and testing, models trained and tested on the samesensor type generally exhibited better performance. Moreover, higher-resolutionsensors showed improved performance compared to those with lower-resolutionones. However, results varied when spatial domain changes were present. In somecases, the advantage of a sensor's higher resolution led to better performanceboth with and without sensor domain shifts. In other instances, the higherresolution resulted in overfitting within a specific domain, causing a lack ofgeneralization capability and decreased performance when tested on data withdifferent sensor configurations.", "output": "Analysis of LiDAR Configurations on Off-road Semantic Segmentation Performance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated medical image segmentation inherently involves a certain degree ofuncertainty. One key factor contributing to this uncertainty is the ambiguitythat can arise in determining the boundaries of a target region of interest,primarily due to variations in image appearance. On top of this, even amongexperts in the field, different opinions can emerge regarding the precisedefinition of specific anatomical structures. This work specifically addressesthe modeling of segmentation uncertainty, known as inter-rater uncertainty. Itsprimary objective is to explore and analyze the variability in segmentationoutcomes that can occur when multiple experts in medical imaging interpret andannotate the same images. We introduce a novel Bayesian neural network-basedarchitecture to estimate inter-rater uncertainty in medical image segmentation.Our approach has three key advancements. Firstly, we introduce aone-encoder-multi-decoder architecture specifically tailored for uncertaintyestimation, enabling us to capture the rater-specific representation of eachexpert involved. Secondly, we propose Bayesian modeling for the newarchitecture, allowing efficient capture of the inter-rater distribution,particularly in scenarios with limited annotations. Lastly, we enhance therater-specific representation by integrating an attention module into eachdecoder. This module facilitates focused and refined segmentation results foreach rater. We conduct extensive evaluations using synthetic and real-worlddatasets to validate our technical innovations rigorously. Our method surpassesexisting baseline methods in five out of seven diverse tasks on the publiclyavailable emph{QUBIQ} dataset, considering two evaluation metrics encompassingdifferent uncertainty aspects. Our codes, models, and the new dataset areavailable through our GitHub repository: .", "output": "Inter-Rater Uncertainty Quantification in Medical Image Segmentation via Rater-Specific Bayesian Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep Neural Networks are powerful tools to understand complex patterns andmaking decisions. However, their black-box nature impedes a completeunderstanding of their inner workings. While online saliency-guided trainingmethods try to highlight the prominent features in the model's output toalleviate this problem, it is still ambiguous if the visually explainablefeatures align with robustness of the model against adversarial examples. Inthis paper, we investigate the saliency trained model's vulnerability toadversarial examples methods. Models are trained using an onlinesaliency-guided training method and evaluated against popular algorithms ofadversarial examples. We quantify the robustness and conclude that despite thewell-explained visualizations in the model's output, the salient models sufferfrom the lower performance against adversarial examples attacks.", "output": "Does Saliency-Based Training bring Robustness for Deep Neural Networks in Image Classification?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The availability of real-time semantics greatly improves the core geometricfunctionality of SLAM systems, enabling numerous robotic and AR/VRapplications. We present a new methodology for real-time semantic mapping fromRGB-D sequences that combines a 2D neural network and a 3D network based on aSLAM system with 3D occupancy mapping. When segmenting a new frame we performlatent feature re-projection from previous frames based on differentiablerendering. Fusing re-projected feature maps from previous frames withcurrent-frame features greatly improves image segmentation quality, compared toa baseline that processes images independently. For 3D map processing, wepropose a novel geometric quasi-planar over-segmentation method that groups 3Dmap elements likely to belong to the same semantic classes, relying on surfacenormals. We also describe a novel neural network design for lightweightsemantic map post-processing. Our system achieves state-of-the-art semanticmapping quality within 2D-3D networks-based systems and matches the performanceof 3D convolutional networks on three real indoor datasets, while working inreal-time. Moreover, it shows better cross-sensor generalization abilitiescompared to 3D CNNs, enabling training and inference with different depthsensors. Code and data will be released on project page:", "output": "SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While natural language offers a convenient shared interface for humans androbots, enabling robots to interpret and follow language commands remains alongstanding challenge in manipulation. A crucial step to realizing aperformant instruction-following robot is achieving semantic manipulation,where a robot interprets language at different specificities, from high-levelinstructions like \"Pick up the stuffed animal\" to more detailed inputs like\"Grab the left ear of the elephant.\" To tackle this, we propose Keypoints +Instructions to Execution (KITE), a two-step framework for semanticmanipulation which attends to both scene semantics (distinguishing betweendifferent objects in a visual scene) and object semantics (precisely localizingdifferent parts within an object instance). KITE first grounds an inputinstruction in a visual scene through 2D image keypoints, providing a highlyaccurate object-centric bias for downstream action inference. Provided an RGB-Dscene observation, KITE then executes a learned keypoint-conditioned skill tocarry out the instruction. The combined precision of keypoints andparameterized skills enables fine-grained manipulation with generalization toscene and object variations. Empirically, we demonstrate KITE in 3 real-worldenvironments: long-horizon 6-DoF tabletop manipulation, semantic grasping, anda high-precision coffee-making task. In these settings, KITE achieves a 75%,70%, and 71% overall success rate for instruction-following, respectively. KITEoutperforms frameworks that opt for pre-trained visual language models overkeypoint-based grounding, or omit skills in favor of end-to-end visuomotorcontrol, all while being trained from fewer or comparable amounts ofdemonstrations. Supplementary material, datasets, code, and videos can be foundon our website: ", "output": "KITE: Keypoint-Conditioned Policies for Semantic Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present EgoCOL, an egocentric camera pose estimation method for open-world3D object localization. Our method leverages sparse camera pose reconstructionsin a two-fold manner, video and scan independently, to estimate the camera poseof egocentric frames in 3D renders with high recall and precision. Weextensively evaluate our method on the Visual Query (VQ) 3D object localizationEgo4D benchmark. EgoCOL can estimate 62% and 59% more camera poses than theEgo4D baseline in the Ego4D Visual Queries 3D Localization challenge at CVPR2023 in the val and test sets, respectively. Our code is publicly available at", "output": "EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is now an essential part of the image training process, asit effectively prevents overfitting and makes the model more robust againstnoisy datasets. Recent mixing augmentation strategies have advanced to generatethe mixup mask that can enrich the saliency information, which is a supervisorysignal. However, these methods incur a significant computational burden tooptimize the mixup mask. From this motivation, we propose a novelsaliency-aware mixup method, GuidedMixup, which aims to retain the salientregions in mixup images with low computational overhead. We develop anefficient pairing algorithm that pursues to minimize the conflict of salientregions of paired images and achieve rich saliency in mixup images. Moreover,GuidedMixup controls the mixup ratio for each pixel to better preserve thesalient region by interpolating two paired images smoothly. The experiments onseveral datasets demonstrate that GuidedMixup provides a good trade-off betweenaugmentation overhead and generalization performance on classificationdatasets. In addition, our method shows good performance in experiments withcorrupted or reduced datasets.", "output": "GuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine-learning models are known to be vulnerable to evasion attacks thatperturb model inputs to induce misclassifications. In this work, we identifyreal-world scenarios where the true threat cannot be assessed accurately byexisting attacks. Specifically, we find that conventional metrics measuringtargeted and untargeted robustness do not appropriately reflect a model'sability to withstand attacks from one set of source classes to another set oftarget classes. To address the shortcomings of existing methods, we formallydefine a new metric, termed group-based robustness, that complements existingmetrics and is better-suited for evaluating model performance in certain attackscenarios. We show empirically that group-based robustness allows us todistinguish between models' vulnerability against specific threat models insituations where traditional robustness metrics do not apply. Moreover, tomeasure group-based robustness efficiently and accurately, we 1) propose twoloss functions and 2) identify three new attack strategies. We show empiricallythat with comparable success rates, finding evasive samples using our new lossfunctions saves computation by a factor as large as the number of targetedclasses, and finding evasive samples using our new attack strategies saves timeby up to 99% compared to brute-force search methods. Finally, we propose adefense method that increases group-based robustness by up to 3.52$times$.", "output": "Group-based Robustness: A General Framework for Customized Robustness in the Real World."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This study addresses the problem of 3D human mesh reconstruction frommulti-view images. Recently, approaches that directly estimate the skinnedmulti-person linear model (SMPL)-based human mesh vertices based on volumetricheatmap representation from input images have shown good performance. We showthat representation learning of vertex heatmaps using an autoencoder helpsimprove the performance of such approaches. Vertex heatmap autoencoder (VHA)learns the manifold of plausible human meshes in the form of latent codes usingAMASS, which is a large-scale motion capture dataset. Body code predictor (BCP)utilizes the learned body prior from VHA for human mesh reconstruction frommulti-view images through latent code-based supervision and transfer ofpretrained weights. According to experiments on Human3.6M and LightStagedatasets, the proposed method outperforms previous methods and achievesstate-of-the-art human mesh reconstruction performance.", "output": "Representation learning of vertex heatmaps for 3D human mesh reconstruction from multi-view images."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Segmentation is an essential step for remote sensing image processing. Thisstudy aims to advance the application of the Segment Anything Model (SAM), aninnovative image segmentation model by Meta AI, in the field of remote sensingimage analysis. SAM is known for its exceptional generalization capabilitiesand zero-shot learning, making it a promising approach to processing aerial andorbital images from diverse geographical contexts. Our exploration involvedtesting SAM across multi-scale datasets using various input prompts, such asbounding boxes, individual points, and text descriptors. To enhance the model'sperformance, we implemented a novel automated technique that combines atext-prompt-derived general example with one-shot training. This adjustmentresulted in an improvement in accuracy, underscoring SAM's potential fordeployment in remote sensing imagery and reducing the need for manualannotation. Despite the limitations encountered with lower spatial resolutionimages, SAM exhibits promising adaptability to remote sensing data analysis. Werecommend future research to enhance the model's proficiency throughintegration with supplementary fine-tuning techniques and other networks.Furthermore, we provide the open-source code of our modifications on onlinerepositories, encouraging further and broader adaptations of SAM to the remotesensing domain.", "output": "The Segment Anything Model (SAM) for Remote Sensing Applications: From Zero to One Shot."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the development of effective deepfake detection models in recentyears, several recent studies have demonstrated that biases in the trainingdata utilized to develop deepfake detection models can lead to unfairperformance for demographic groups of different races and/or genders. Such canresult in these groups being unfairly targeted or excluded from detection,allowing misclassified deepfakes to manipulate public opinion and erode trustin the model. While these studies have focused on identifying and evaluatingthe unfairness in deepfake detection, no methods have been developed to addressthe fairness issue of deepfake detection at the algorithm level. In this work,we make the first attempt to improve deepfake detection fairness by proposingnovel loss functions to train fair deepfake detection models in ways that areagnostic or aware of demographic factors. Extensive experiments on fourdeepfake datasets and five deepfake detectors demonstrate the effectiveness andflexibility of our approach in improving the deepfake detection fairness.", "output": "Improving Fairness in Deepfake Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal fusion integrates the complementary information present inmultiple modalities and has gained much attention recently. Most existingfusion approaches either learn a fixed fusion strategy during training andinference, or are only capable of fusing the information to a certain extent.Such solutions may fail to fully capture the dynamics of interactions acrossmodalities especially when there are complex intra- and inter-modalitycorrelations to be considered for informative multimodal fusion. In this paper,we propose a novel deep equilibrium (DEQ) method towards multimodal fusion viaseeking a fixed point of the dynamic multimodal fusion process and modeling thefeature correlations in an adaptive and recursive manner. This new way encodesthe rich information within and across modalities thoroughly from low level tohigh level for efficacious downstream multimodal learning and is readilypluggable to various multimodal frameworks. Extensive experiments on BRCA,MM-IMDB, CMU-MOSI, SUN RGB-D, and VQA-v2 demonstrate the superiority of our DEQfusion. More remarkably, DEQ fusion consistently achieves state-of-the-artperformance on multiple multimodal benchmarks. The code will be released.", "output": "Deep Equilibrium Multimodal Fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Magnetic Resonance Imaging (MRI) produces excellent soft tissue contrast,albeit it is an inherently slow imaging modality. Promising deep learningmethods have recently been proposed to reconstruct accelerated MRI scans.However, existing methods still suffer from various limitations regarding imagefidelity, contextual sensitivity, and reliance on fully-sampled acquisitionsfor model training. To comprehensively address these limitations, we propose anovel self-supervised deep reconstruction model, named Self-SupervisedDiffusion Reconstruction (SSDiffRecon). SSDiffRecon expresses a conditionaldiffusion process as an unrolled architecture that interleaves cross-attentiontransformers for reverse diffusion steps with data-consistency blocks forphysics-driven processing. Unlike recent diffusion methods for MRIreconstruction, a self-supervision strategy is adopted to train SSDiffReconusing only undersampled k-space data. Comprehensive experiments on public brainMR datasets demonstrates the superiority of SSDiffRecon againststate-of-the-art supervised, and self-supervised baselines in terms ofreconstruction speed and quality. Implementation will be available at", "output": "Self-Supervised MRI Reconstruction with Unrolled Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional domain adaptation assumes the same vocabulary across source andtarget domains, which often struggles with limited transfer flexibility andefficiency while handling target domains with different vocabularies. Inspiredby recent vision-language models (VLMs) that enable open-vocabulary visualrecognition by reasoning on both images and texts, we study open-vocabularydomain adaptation (OVDA), a new unsupervised domain adaptation framework thatpositions a pre-trained VLM as the source model and transfers it towardsarbitrary unlabelled target domains. To this end, we design a Prompt EnsembleSelf-training (PEST) technique that exploits the synergy between vision andlanguage to mitigate the domain discrepancies in image and text distributionssimultaneously. Specifically, PEST makes use of the complementary property ofmultiple prompts within and across vision and language modalities, whichenables joint exploitation of vision and language information and effectivelearning of image-text correspondences in the unlabelled target domains.Additionally, PEST captures temporal information via temporal prompt ensemblewhich helps memorize previously learnt target information. Extensiveexperiments show that PEST outperforms the state-of-the-art consistently across10 image recognition tasks.", "output": "Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "While deep neural networks are being utilized heavily for autonomous driving,they need to be adapted to new unseen environmental conditions for which theywere not trained. We focus on a safety critical application of lane detection,and propose a lightweight, fully unsupervised, real-time adaptation approachthat only adapts the batch-normalization parameters of the model. Wedemonstrate that our technique can perform inference, followed by on-deviceadaptation, under a tight constraint of 30 FPS on Nvidia Jetson Orin. It showssimilar accuracy (avg. of 92.19%) as a state-of-the-art semi-supervisedadaptation algorithm but which does not support real-time adaptation.", "output": "Real-Time Fully Unsupervised Domain Adaptation for Lane Detection in Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce NaturalInversion, a novel model inversion-based method tosynthesize images that agrees well with the original data distribution withoutusing real data. In NaturalInversion, we propose: (1) a Feature TransferPyramid which uses enhanced image prior of the original data by combining themulti-scale feature maps extracted from the pre-trained classifier, (2) aone-to-one approach generative model where only one batch of images aresynthesized by one generator to bring the non-linearity to optimization and toease the overall optimizing process, (3) learnable Adaptive Channel Scalingparameters which are end-to-end trained to scale the output image channel toutilize the original image prior further. With our NaturalInversion, wesynthesize images from classifiers trained on CIFAR-10/100 and show that ourimages are more consistent with original data distribution than prior works byvisualization and additional analysis. Furthermore, our synthesized imagesoutperform prior works on various applications such as knowledge distillationand pruning, demonstrating the effectiveness of our proposed method.", "output": "NaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Procedural Content Generation via Machine Learning (PCGML) faces asignificant hurdle that sets it apart from other fields, such as image or textgeneration, which is limited annotated data. Many existing methods forprocedural level generation via machine learning require a secondaryrepresentation besides level images. However, the current methods for obtainingsuch representations are laborious and time-consuming, which contributes tothis problem. In this work, we aim to address this problem by utilizinggameplay videos of two human-annotated games to develop a novel multi-tailframework that learns to perform simultaneous level translation and generation.The translation tail of our framework can convert gameplay video frames to anequivalent secondary representation, while its generation tail can producenovel level segments. Evaluation results and comparisons between our frameworkand baselines suggest that combining the level generation and translation taskscan lead to an overall improved performance regarding both tasks. Thisrepresents a possible solution to limited annotated level data, and wedemonstrate the potential for future versions to generalize to unseen games.", "output": "Joint Level Generation and Translation Using Gameplay Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The proliferation of deep learning-based machine vision applications hasgiven rise to a new type of compression, so called video coding for machine(VCM). VCM differs from traditional video coding in that it is optimized formachine vision performance instead of human visual quality. In the featurecompression track of MPEG-VCM, multi-scale features extracted from images aresubject to compression. Recent feature compression works have demonstrated thatthe versatile video coding (VVC) standard-based approach can achieve a BD-ratereduction of up to 96% against MPEG-VCM feature anchor. However, it is stillsub-optimal as VVC was not designed for extracted features but for naturalimages. Moreover, the high encoding complexity of VVC makes it difficult todesign a lightweight encoder without sacrificing performance. To address thesechallenges, we propose a novel multi-scale feature compression method thatenables both the end-to-end optimization on the extracted features and thedesign of lightweight encoders. The proposed model combines a learnablecompressor with a multi-scale feature fusion network so that the redundancy inthe multi-scale features is effectively removed. Instead of simply cascadingthe fusion network and the compression network, we integrate the fusion andencoding processes in an interleaved way. Our model first encodes alarger-scale feature to obtain a latent representation and then fuses thelatent with a smaller-scale feature. This process is successively performeduntil the smallest-scale feature is fused and then the encoded latent at thefinal stage is entropy-coded for transmission. The results show that our modeloutperforms previous approaches by at least 52% BD-rate reduction and has$times5$ to $times27$ times less encoding time for object detection. It isnoteworthy that our model can attain near-lossless task performance with only0.002-0.003% of the uncompressed feature data size.", "output": "End-to-End Learnable Multi-Scale Feature Compression for VCM."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing popularity and the increasing size of vision transformers(ViTs), there has been an increasing interest in making them more efficient andless computationally costly for deployment on edge devices with limitedcomputing resources. Binarization can be used to help reduce the size of ViTmodels and their computational cost significantly, using popcount operationswhen the weights and the activations are in binary. However, ViTs suffer alarger performance drop when directly applying convolutional neural network(CNN) binarization methods or existing binarization methods to binarize ViTscompared to CNNs on datasets with a large number of classes such asImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such asDeiT miss out on a lot of key architectural properties that CNNs have thatallow binary CNNs to have much higher representational capability than binaryvanilla ViT. Therefore, we propose BinaryViT, in which inspired by the CNNarchitecture, we include operations from the CNN architecture into a pure ViTarchitecture to enrich the representational capability of a binary ViT withoutintroducing convolutions. These include an average pooling layer instead of atoken pooling layer, a block that contains multiple average pooling branches,an affine transformation right before the addition of each main residualconnection, and a pyramid structure. Experimental results on the ImageNet-1kdataset show the effectiveness of these operations that allow a binary pure ViTmodel to be competitive with previous state-of-the-art (SOTA) binary CNNmodels.", "output": "BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Egocentric action anticipation aims to predict the future actions the camerawearer will perform from the observation of the past. While predictions aboutthe future should be available before the predicted events take place, mostapproaches do not pay attention to the computational time required to make suchpredictions. As a result, current evaluation schemes assume that predictionsare available right after the input video is observed, i.e., presuming anegligible runtime, which may lead to overly optimistic evaluations. We proposea streaming egocentric action evaluation scheme which assumes that predictionsare performed online and made available only after the model has processed thecurrent input segment, which depends on its runtime. To evaluate all modelsconsidering the same prediction horizon, we hence propose that slower modelsshould base their predictions on temporal segments sampled ahead of time. Basedon the observation that model runtime can affect performance in the consideredstreaming evaluation scenario, we further propose a lightweight actionanticipation model based on feed-forward 3D CNNs which is optimized usingknowledge distillation techniques with a novel past-to-future distillationloss. Experiments on the three popular datasets EPIC-KITCHENS-55,EPIC-KITCHENS-100 and EGTEA Gaze+ show that (i) the proposed evaluation schemeinduces a different ranking on state-of-the-art methods as compared to classicevaluations, (ii) lightweight approaches tend to outmatch more computationallyexpensive ones, and (iii) the proposed model based on feed-forward 3D CNNs andknowledge distillation outperforms current art in the streaming egocentricaction anticipation scenario.", "output": "Streaming egocentric action anticipation: An evaluation scheme and approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Implicit Neural Representation (INR) is an innovative approach forrepresenting complex shapes or objects without explicitly defining theirgeometry or surface structure. Instead, INR represents objects as continuousfunctions. Previous research has demonstrated the effectiveness of using neuralnetworks as INR for image compression, showcasing comparable performance totraditional methods such as JPEG. However, INR holds potential for variousapplications beyond image compression. This paper introduces Rapid-INR, a novelapproach that utilizes INR for encoding and compressing images, therebyaccelerating neural network training in computer vision tasks. Our methodologyinvolves storing the whole dataset directly in INR format on a GPU, mitigatingthe significant data communication overhead between the CPU and GPU duringtraining. Additionally, the decoding process from INR to RGB format is highlyparallelized and executed on-the-fly. To further enhance compression, wepropose iterative and dynamic pruning, as well as layer-wise quantization,building upon previous work. We evaluate our framework on the imageclassification task, utilizing the ResNet-18 backbone network and threecommonly used datasets with varying image sizes. Rapid-INR reduces memoryconsumption to only 5% of the original dataset size and achieves a maximum6$times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2xspeedup over the DALI training pipeline, with only a marginal decrease inaccuracy. Importantly, Rapid-INR can be readily applied to other computervision tasks and backbone networks with reasonable engineering efforts. Ourimplementation code is publicly available at", "output": "Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamics models learned from visual observations have shown to be effectivein various robotic manipulation tasks. One of the key questions for learningsuch dynamics models is what scene representation to use. Prior works typicallyassume representation at a fixed dimension or resolution, which may beinefficient for simple tasks and ineffective for more complicated tasks. Inthis work, we investigate how to learn dynamic and adaptive representations atdifferent levels of abstraction to achieve the optimal trade-off betweenefficiency and effectiveness. Specifically, we construct dynamic-resolutionparticle representations of the environment and learn a unified dynamics modelusing graph neural networks (GNNs) that allows continuous selection of theabstraction level. During test time, the agent can adaptively determine theoptimal resolution at each model-predictive control (MPC) step. We evaluate ourmethod in object pile manipulation, a task we commonly encounter in cooking,agriculture, manufacturing, and pharmaceutical applications. Throughcomprehensive evaluations both in the simulation and the real world, we showthat our method achieves significantly better performance than state-of-the-artfixed-resolution baselines at the gathering, sorting, and redistribution ofgranular object piles made with various instances like coffee beans, almonds,corn, etc.", "output": "Dynamic-Resolution Model Learning for Object Pile Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning is an approach to collaboratively training machinelearning models for multiple parties that prohibit data sharing. One of thechallenges in federated learning is non-IID data between clients, as a singlemodel can not fit the data distribution for all clients. Meta-learning, such asPer-FedAvg, is introduced to cope with the challenge. Meta-learning learnsshared initial parameters for all clients. Each client employs gradient descentto adapt the initialization to local data distributions quickly to realizemodel personalization. However, due to non-convex loss function and randomnessof sampling update, meta-learning approaches have unstable goals in localadaptation for the same client. This fluctuation in different adaptationdirections hinders the convergence in meta-learning. To overcome thischallenge, we use the historical local adapted model to restrict the directionof the inner loop and propose an elastic-constrained method. As a result, thecurrent round inner loop keeps historical goals and adapts to better solutions.Experiments show our method boosts meta-learning convergence and improvespersonalization without additional calculation and communication. Our methodachieved SOTA on all metrics in three public datasets.", "output": "Elastically-Constrained Meta-Learner for Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper presents Diffusion Model for Scene Text Recognition(DiffusionSTR), an end-to-end text recognition framework using diffusion modelsfor recognizing text in the wild. While existing studies have viewed the scenetext recognition task as an image-to-text transformation, we rethought it as atext-text one under images in a diffusion model. We show for the first timethat the diffusion model can be applied to text recognition. Furthermore,experimental results on publicly available datasets show that the proposedmethod achieves competitive accuracy compared to state-of-the-art methods.", "output": "DiffusionSTR: Diffusion Model for Scene Text Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Long-tailed visual recognition has received increasing attention in recentyears. Due to the extremely imbalanced data distribution in long-tailedlearning, the learning process shows great uncertainties. For example, thepredictions of different experts on the same image vary remarkably despite thesame training settings. To alleviate the uncertainty, we propose a NestedCollaborative Learning (NCL++) which tackles the long-tailed learning problemby a collaborative learning. To be specific, the collaborative learningconsists of two folds, namely inter-expert collaborative learning (InterCL) andintra-expert collaborative learning (IntraCL). In-terCL learns multiple expertscollaboratively and concurrently, aiming to transfer the knowledge amongdifferent experts. IntraCL is similar to InterCL, but it aims to conduct thecollaborative learning on multiple augmented copies of the same image withinthe single expert. To achieve the collaborative learning in long-tailedlearning, the balanced online distillation is proposed to force the consistentpredictions among different experts and augmented copies, which reduces thelearning uncertainties. Moreover, in order to improve the meticulousdistinguishing ability on the confusing categories, we further propose a HardCategory Mining (HCM), which selects the negative categories with highpredicted scores as the hard categories. Then, the collaborative learning isformulated in a nested way, in which the learning is conducted on not just allcategories from a full perspective but some hard categories from a partialperspective. Extensive experiments manifest the superiority of our method withoutperforming the state-of-the-art whether with using a single model or anensemble. The code will be publicly released.", "output": "NCL++: Nested Collaborative Learning for Long-Tailed Visual Recognition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study visual question answering in a setting where the answer has to bemined from a pool of relevant and irrelevant images given as a context. Forsuch a setting, a model must first retrieve relevant images from the pool andanswer the question from these retrieved images. We refer to this problem asretrieval-based visual question answering (or RETVQA in short). The RETVQA isdistinctively different and more challenging than the traditionally-studiedVisual Question Answering (VQA), where a given question has to be answered witha single relevant image in context. Towards solving the RETVQA task, we proposea unified Multi Image BART (MI-BART) that takes a question and retrieved imagesusing our relevance encoder for free-form fluent answer generation. Further, weintroduce the largest dataset in this space, namely RETVQA, which has thefollowing salient features: multi-image and retrieval requirement for VQA,metadata-independent questions over a pool of heterogeneous images, expecting amix of classification-oriented and open-ended generative answers. Our proposedframework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposeddataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9%and 11.8% on the image segment of the publicly available WebQA dataset on theaccuracy and fluency metrics, respectively.", "output": "Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) playsan important role in the screening and prognosis assessment of high-risk breastcancer. The segmentation of cancerous regions is essential useful for thesubsequent analysis of breast MRI. To alleviate the annotation effort to trainthe segmentation networks, we propose a weakly-supervised strategy usingextreme points as annotations for breast cancer segmentation. Without using anybells and whistles, our strategy focuses on fully exploiting the learningcapability of the routine training procedure, i.e., the train - fine-tune -retrain process. The network first utilizes the pseudo-masks generated usingthe extreme points to train itself, by minimizing a contrastive loss, whichencourages the network to learn more representative features for cancerousvoxels. Then the trained network fine-tunes itself by using a similarity-awarepropagation learning (SimPLe) strategy, which leverages feature similaritybetween unlabeled and positive voxels to propagate labels. Finally the networkretrains itself by employing the pseudo-masks generated using previousfine-tuned network. The proposed method is evaluated on our collected DCE-MRIdataset containing 206 patients with biopsy-proven breast cancers. Experimentalresults demonstrate our method effectively fine-tunes the network by using theSimPLe strategy, and achieves a mean Dice value of 81%.", "output": "SimPLe: Similarity-Aware Propagation Learning for Weakly-Supervised Breast Cancer Segmentation in DCE-MRI."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Arbitrary-oriented object detection is a relatively emerging but challengingtask. Although remarkable progress has been made, there still remain manyunsolved issues due to the large diversity of patterns in orientation, scale,aspect ratio, and visual appearance of objects in aerial images. Most of theexisting methods adopt a coarse-grained fixed label assignment strategy andsuffer from the inconsistency between the classification score and localizationaccuracy. First, to align the metric inconsistency between sample selection andregression loss calculation caused by fixed IoU strategy, we introduce affinetransformation to evaluate the quality of samples and propose a distance-basedlabel assignment strategy. The proposed metric-aligned selection (MAS) strategycan dynamically select samples according to the shape and rotationcharacteristic of objects. Second, to further address the inconsistency betweenclassification and localization, we propose a critical feature sampling (CFS)module, which performs localization refinement on the sampling location forclassification task to extract critical features accurately. Third, we presenta scale-controlled smooth $L_1$ loss (SC-Loss) to adaptively select highquality samples by changing the form of regression loss function based on thestatistics of proposals during training. Extensive experiments are conducted onfour challenging rotated object detection datasets DOTA, FAIR1M-1.0, HRSC2016,and UCAS-AOD. The results show the state-of-the-art accuracy of the proposeddetector.", "output": "Metric-aligned Sample Selection and Critical Feature Sampling for Oriented Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The detection of leaf diseases in plants generally involves visualobservation of patterns appearing on the leaf surface. However, there are manydiseases that are distinguished based on very subtle changes in these visuallyobservable patterns. This paper attempts to identify plant leaf diseases usingimage processing techniques. The focus of this study is on the detection ofcitrus leaf canker disease. Canker is a bacterial infection of leaves. Symptomsof citrus cankers include brown spots on the leaves, often with a watery oroily appearance. The spots (called lesions in botany) are usually yellow. It issurrounded by a halo of the leaves and is found on both the top and bottom ofthe leaf. This paper describes various methods that have been used to detectcitrus leaf canker disease. The methods used are histogram comparison andk-means clustering. Using these methods, citrus canker development was detectedbased on histograms generated based on leaf patterns. The results thus obtainedcan be used, after consultation with experts in the field of agriculture, toidentify suitable treatments for the processes used.", "output": "Unified View of Damage leaves Planimetry & Analysis Using Digital Images Processing Techniques."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Demystifying complex human-ground interactions is essential for accurate andrealistic 3D human motion reconstruction from RGB videos, as it ensuresconsistency between the humans and the ground plane. Prior methods have modeledhuman-ground interactions either implicitly or in a sparse manner, oftenresulting in unrealistic and incorrect motions when faced with noise anduncertainty. In contrast, our approach explicitly represents these interactionsin a dense and continuous manner. To this end, we propose a novel Ground-awareMotion Model for 3D Human Motion Reconstruction, named GraMMaR, which jointlylearns the distribution of transitions in both pose and interaction betweenevery joint and ground plane at each time step of a motion sequence. It istrained to explicitly promote consistency between the motion and distancechange towards the ground. After training, we establish a joint optimizationstrategy that utilizes GraMMaR as a dual-prior, regularizing the optimizationtowards the space of plausible ground-aware motions. This leads to realisticand coherent motion reconstruction, irrespective of the assumed or learnedground plane. Through extensive evaluation on the AMASS and AIST++ datasets,our model demonstrates good generalization and discriminating abilities inchallenging cases including complex and ambiguous human-ground interactions.The code will be released.", "output": "GraMMaR: Ground-aware Motion Model for 3D Human Motion Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Foundation models have exhibited remarkable success in various applications,such as disease diagnosis and text report generation. To date, a foundationmodel for endoscopic video analysis is still lacking. In this paper, we proposeEndo-FM, a foundation model specifically developed using massive endoscopicvideo data. First, we build a video transformer, which captures both local andglobal long-range dependencies across spatial and temporal dimensions. Second,we pre-train our transformer model using global and local views via aself-supervised manner, aiming to make it robust to spatial-temporal variationsand discriminative across different scenes. To develop the foundation model, weconstruct a large-scale endoscopy video dataset by combining 9 publiclyavailable datasets and a privately collected dataset from Baoshan Branch ofRenji Hospital in Shanghai, China. Our dataset overall consists of over 33Kvideo clips with up to 5 million frames, encompassing various protocols, targetorgans, and disease types. Our pre-trained Endo-FM can be easily adopted for agiven downtream task via fine-tuning by serving as the backbone. Withexperiments on 3 different types of downstream tasks, including classification,segmentation, and detection, our Endo-FM surpasses the current state-of-the-artself-supervised pre-training and adapter-based transfer learning methods by asignificant margin, such as VCL (3.1% F1 for classification, 4.8% Dice forsegmentation, and 5.5% F1 for detection) and ST-Adapter (5.9% F1 forclassification, 9.6% Dice for segmentation, and 9.9% F1 for detection). Code,datasets, and models are released at ", "output": "Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperspectral images (HSI) captured from earth observing satellites andaircraft is becoming increasingly important for applications in agriculture,environmental monitoring, mining, etc. Due to the limited availablehyperspectral datasets, the pixel-wise random sampling is the most commonlyused training-test dataset partition approach, which has significant overlapbetween samples in training and test datasets. Furthermore, our experimentalobservations indicates that regions with larger overlap often exhibit higherclassification accuracy. Consequently, the pixel-wise random sampling approachposes a risk of data leakage. Thus, we propose a block-wise sampling method tominimize the potential for data leakage. Our experimental findings also confirmthe presence of data leakage in models such as 2DCNN. Further, We propose aspectral-spatial axial aggregation transformer model, namely SaaFormer, toaddress the challenges associated with hyperspectral image classifier thatconsiders HSI as long sequential three-dimensional images. The model comprisestwo primary components: axial aggregation attention and multi-levelspectral-spatial extraction. The axial aggregation attention mechanismeffectively exploits the continuity and correlation among spectral bands ateach pixel position in hyperspectral images, while aggregating spatialdimension features. This enables SaaFormer to maintain high precision evenunder block-wise sampling. The multi-level spectral-spatial extractionstructure is designed to capture the sensitivity of different materialcomponents to specific spectral bands, allowing the model to focus on a broaderrange of spectral details. The results on six publicly available datasetsdemonstrate that our model exhibits comparable performance when using randomsampling, while significantly outperforming other methods when employingblock-wise sampling partition.", "output": "SaaFormer: Spectral-spatial Axial Aggregation Transformer for Hyperspectral Image Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The understanding of complex human interactions and group activities hasgarnered attention in human-centric computer vision. However, the advancementof the related tasks is hindered due to the difficulty of obtaining large-scalelabeled real-world datasets. To mitigate the issue, we propose M3Act, amulti-view multi-group multi-person human atomic action and group activity datagenerator. Powered by the Unity engine, M3Act contains simulation-ready 3Dscenes and human assets, configurable lighting and camera systems, highlyparameterized modular group activities, and a large degree of domainrandomization during the data generation process. Our data generator is capableof generating large-scale datasets of human activities with multipleviewpoints, modalities (RGB images, 2D poses, 3D motions), and high-qualityannotations for individual persons and multi-person groups (2D bounding boxes,instance segmentation masks, individual actions and group activity categories).Using M3Act, we perform synthetic data pre-training for 2D skeleton-based groupactivity recognition and RGB-based multi-person pose tracking. The resultsindicate that learning from our synthetic datasets largely improves the modelperformances on real-world datasets, with the highest gain of 5.59% and 7.32%respectively in group and person recognition accuracy on CAD2, as well as animprovement of 6.63 in MOTP on HiEve. Pre-training with our synthetic data alsoleads to faster model convergence on downstream tasks (up to 6.8% faster).Moreover, M3Act opens new research problems for 3D group activity generation.We release M3Act3D, an 87.6-hour 3D motion dataset of human activities withlarger group sizes and higher complexity of inter-person interactions thanprevious multi-person datasets. We define multiple metrics and propose acompetitive baseline for the novel task.", "output": "Learning from Synthetic Human Group Activities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Decreased visibility, intensive noise, and biased color are the commonproblems existing in low-light images. These visual disturbances further reducethe performance of high-level vision tasks, such as object detection, andtracking. To address this issue, some image enhancement methods have beenproposed to increase the image contrast. However, most of them are implementedonly in the spatial domain, which can be severely influenced by noise signalswhile enhancing. Hence, in this work, we propose a novel residual recurrentmulti-wavelet convolutional neural network R2-MWCNN learned in the frequencydomain that can simultaneously increase the image contrast and reduce noisesignals well. This end-to-end trainable network utilizes a multi-level discretewavelet transform to divide input feature maps into distinct frequencies,resulting in a better denoise impact. A channel-wise loss function is proposedto correct the color distortion for more realistic results. Extensiveexperiments demonstrate that our proposed R2-MWCNN outperforms thestate-of-the-art methods quantitively and qualitatively.", "output": "Low-Light Enhancement in the Frequency Domain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The objective of augmented reality (AR) is to add digital content to naturalimages and videos to create an interactive experience between the user and theenvironment. Scene analysis and object recognition play a crucial role in AR,as they must be performed quickly and accurately. In this study, a new approachis proposed that involves using oriented bounding boxes with a detection andrecognition deep network to improve performance and processing time. Theapproach is evaluated using two datasets: a real image dataset (DOTA dataset)commonly used for computer vision tasks, and a synthetic dataset that simulatesdifferent environmental, lighting, and acquisition conditions. The focus of theevaluation is on small objects, which are difficult to detect and recognise.The results indicate that the proposed approach tends to produce better AveragePrecision and greater accuracy for small objects in most of the testedconditions.", "output": "Evaluation of Environmental Conditions on Object Detection using Oriented Bounding Boxes for AR Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Perceptually Aligned Gradients (PAG) refer to an intriguing property observedin robust image classification models, wherein their input gradients align withhuman perception and pose semantic meanings. While this phenomenon has gainedsignificant research attention, it was solely studied in the context ofunimodal vision-only architectures. In this work, we extend the study of PAG toVision-Language architectures, which form the foundations for diverseimage-text tasks and applications. Through an adversarial robustificationfinetuning of CLIP, we demonstrate that robust Vision-Language models exhibitPAG in contrast to their vanilla counterparts. This work reveals the merits ofCLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, weshow that seamlessly integrating CLIPAG in a \"plug-n-play\" manner leads tosubstantial improvements in vision-language generative applications.Furthermore, leveraging its PAG property, CLIPAG enables text-to-imagegeneration without any generative model, which typically requires hugegenerators.", "output": "CLIPAG: Towards Generator-Free Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The mainstream style transfer methods usually use pre-trained deepconvolutional neural network (VGG) models as encoders, or use more complexmodel structures to achieve better style transfer effects. This leads toextremely slow processing speeds for practical tasks due to limited resourcesor higher resolution image processing, such as 4K images, severely hinderingthe practical application value of style transfer models. We introduce alightweight and fast styletransfer model with controllable detail attentionenhancement, named ICDaeLST. The model adopts a minimal, shallow, and smallarchitecture, forming a very compact lightweight model for efficient forwardinference. Although its structure is simple and has limited parameters, weachieve better overall color and texture structure matching by introducing astyle discriminator, design additional global semantic invariance loss topreserve the semantic and structural information of the content image from ahigh-level global perspective, and design a shallow detail attentionenhancement module to preserve the detail information of the content image froma low-level detail perspective. We also achieve controllable intensity duringinference for the first time (adjusting the degree of detail retention andtexture structure transfer based on subjective judgment) to meet differentusers' subjective evaluation of stylization effects. Compared with the currentbest-performing and most lightweight models, our model achieves better styletransfer quality and better content structure and detail retention, whilehaving a smaller model size (17-250 times smaller) and faster speed (0.26-6.5times faster), and achieves the fastest processing speed of 0.38s on 4Khigh-resolution images.", "output": "ICDaeLST: Intensity-Controllable Detail Attention-enhanced for Lightweight Fast Style Transfer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Red Palm Weevil (RPW) is a highly destructive insect causing economiclosses and impacting palm tree farming worldwide. This paper proposes aninnovative approach for sustainable palm tree farming by utilizing advancedtechnologies for the early detection and management of RPW. Our approachcombines computer vision, deep learning (DL), the Internet of Things (IoT), andgeospatial data to detect and classify RPW-infested palm trees effectively. Themain phases include; (1) DL classification using sound data from IoT devices,(2) palm tree detection using YOLOv8 on UAV images, and (3) RPW mapping usinggeospatial data. Our custom DL model achieves 100% precision and recall indetecting and localizing infested palm trees. Integrating geospatial dataenables the creation of a comprehensive RPW distribution map for efficientmonitoring and targeted management strategies. This technology-driven approachbenefits agricultural authorities, farmers, and researchers in managing RPWinfestations and safeguarding palm tree plantations' productivity.", "output": "Sustainable Palm Tree Farming: Leveraging IoT and Multi-Modal Data for Early Detection and Mapping of Red Palm Weevil."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNNs) have become ubiquitous in machine learning, buttheir energy consumption remains a notable issue. Lowering the supply voltageis an effective strategy for reducing energy consumption. However, aggressivelyscaling down the supply voltage can lead to accuracy degradation due to randombit flips in static random access memory (SRAM) where model parameters arestored. To address this challenge, we introduce NeuralFuse, a novel add-onmodule that addresses the accuracy-energy tradeoff in low-voltage regimes bylearning input transformations to generate error-resistant datarepresentations. NeuralFuse protects DNN accuracy in both nominal andlow-voltage scenarios. Moreover, NeuralFuse is easy to implement and can bereadily applied to DNNs with limited access, such as non-configurable hardwareor remote access to cloud-based APIs. Experimental results demonstrate that, ata 1% bit error rate, NeuralFuse can reduce SRAM memory access energy by up to24% while improving accuracy by up to 57%. To the best of our knowledge, thisis the first model-agnostic approach (i.e., no model retraining) to addresslow-voltage-induced bit errors. The source code is available at", "output": "NeuralFuse: Learning to Improve the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of two-stage few-shot classification methods, in theepisodic meta-training stage, the model suffers severe overfitting. Wehypothesize that it is caused by over-discrimination, i.e., the model learns toover-rely on the superficial features that fit for base class discriminationwhile suppressing the novel class generalization. To penalizeover-discrimination, we introduce knowledge distillation techniques to keepnovel generalization knowledge from the teacher model during training.Specifically, we select the teacher model as the one with the best validationaccuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL)divergence between the output distribution of the linear classifier of theteacher model and that of the student model. This simple approach outperformsthe standard meta-training process. We further propose the Nearest NeighborSymmetric Kullback-Leibler (NNSKL) divergence for meta-training to push thelimits of knowledge distillation techniques. NNSKL takes few-shot tasks asinput and penalizes the output of the nearest neighbor classifier, whichpossesses an impact on the relationships between query embedding and supportcenters. By combining SKL and NNSKL in meta-training, the model achieves evenbetter performance and surpasses state-of-the-art results on severalbenchmarks.", "output": "Understanding the Overfitting of the Episodic Meta-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper proposes a multi-object tracking (MOT) algorithm for trafficmonitoring using a drone equipped with optical and thermal cameras. Objectdetections on the images are obtained using a neural network for each type ofcamera. The cameras are modelled as direction-of-arrival (DOA) sensors. EachDOA detection follows a von-Mises Fisher distribution, whose mean direction isobtain by projecting a vehicle position on the ground to the camera. We thenuse the trajectory Poisson multi-Bernoulli mixture filter (TPMBM), which is aBayesian MOT algorithm, to optimally estimate the set of vehicle trajectories.We have also developed a parameter estimation algorithm for the measurementmodel. We have tested the accuracy of the resulting TPMBM filter in syntheticand experimental data sets.", "output": "Trajectory Poisson multi-Bernoulli mixture filter for traffic monitoring using a drone."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Diffusion models have showcased their remarkable capability to synthesizediverse and high-quality images, sparking interest in their application forreal image editing. However, existing diffusion-based approaches for localimage editing often suffer from undesired artifacts due to the pixel-levelblending of the noised target images and diffusion latent variables, which lackthe necessary semantics for maintaining image consistency. To address theseissues, we propose PFB-Diff, a Progressive Feature Blending method forDiffusion-based image editing. Unlike previous methods, PFB-Diff seamlesslyintegrates text-guided generated content into the target image throughmulti-level feature blending. The rich semantics encoded in deep features andthe progressive blending scheme from high to low levels ensure semanticcoherence and high quality in edited images. Additionally, we introduce anattention masking mechanism in the cross-attention layers to confine the impactof specific words to desired regions, further improving the performance ofbackground editing. PFB-Diff can effectively address various editing tasks,including object/background replacement and object attribute editing. Ourmethod demonstrates its superior performance in terms of image fidelity,editing accuracy, efficiency, and faithfulness to the original image, withoutthe need for fine-tuning or training.", "output": "PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Estimating camera motion in deformable scenes poses a complex and openresearch challenge. Most existing non-rigid structure from motion techniquesassume to observe also static scene parts besides deforming scene parts inorder to establish an anchoring reference. However, this assumption does nothold true in certain relevant application cases such as endoscopies. Deformableodometry and SLAM pipelines, which tackle the most challenging scenario ofexploratory trajectories, suffer from a lack of robustness and properquantitative evaluation methodologies. To tackle this issue with a commonbenchmark, we introduce the Drunkard's Dataset, a challenging collection ofsynthetic data targeting visual navigation and reconstruction in deformableenvironments. This dataset is the first large set of exploratory cameratrajectories with ground truth inside 3D scenes where every surface exhibitsnon-rigid deformations over time. Simulations in realistic 3D buildings lets usobtain a vast amount of data and ground truth labels, including camera poses,RGB images and depth, optical flow and normal maps at high resolution andquality. We further present a novel deformable odometry method, dubbed theDrunkard's Odometry, which decomposes optical flow estimates into rigid-bodycamera motion and non-rigid scene deformations. In order to validate our data,our work contains an evaluation of several baselines as well as a noveltracking error metric which does not require ground truth data. Dataset andcode: ", "output": "The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, deep learning has become a breakthrough technique inassisting medical image diagnosis. Supervised learning using convolutionalneural networks (CNN) provides state-of-the-art performance and has served as abenchmark for various medical image segmentation and classification. However,supervised learning deeply relies on large-scale annotated data, which isexpensive, time-consuming, and even impractical to acquire in medical imagingapplications. Active Learning (AL) methods have been widely applied in naturalimage classification tasks to reduce annotation costs by selecting morevaluable examples from the unlabeled data pool. However, their application inmedical image segmentation tasks is limited, and there is currently noeffective and universal AL-based method specifically designed for 3D medicalimage segmentation. To address this limitation, we propose an AL-based methodthat can be simultaneously applied to 2D medical image classification,segmentation, and 3D medical image segmentation tasks. We extensively validatedour proposed active learning method on three publicly available and challengingmedical image datasets, Kvasir Dataset, COVID-19 Infection SegmentationDataset, and BraTS2019 Dataset. The experimental results demonstrate that ourPCDAL can achieve significantly improved performance with fewer annotations in2D classification and segmentation and 3D segmentation tasks. The codes of thisstudy are available at ", "output": "PCDAL: A Perturbation Consistency-Driven Active Learning Approach for Medical Image Segmentation and Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Pretraining with large-scale 3D volumes has a potential for improving thesegmentation performance on a target medical image dataset where the trainingimages and annotations are limited. Due to the high cost of acquiringpixel-level segmentation annotations on the large-scale pretraining dataset,pretraining with unannotated images is highly desirable. In this work, wepropose a novel self-supervised learning strategy named Volume Fusion (VF) forpretraining 3D segmentation models. It fuses several random patches from aforeground sub-volume to a background sub-volume based on a predefined set ofdiscrete fusion coefficients, and forces the model to predict the fusioncoefficient of each voxel, which is formulated as a self-supervisedsegmentation task without manual annotations. Additionally, we propose a novelnetwork architecture based on parallel convolution and transformer blocks thatis suitable to be transferred to different downstream segmentation tasks withvarious scales of organs and lesions. The proposed model was pretrained with110k unannotated 3D CT volumes, and experiments with different downstreamsegmentation targets including head and neck organs, thoracic/abdominal organsshowed that our pretrained model largely outperformed training from scratch andseveral state-of-the-art self-supervised training methods and segmentationmodels. The code and pretrained model are available at", "output": "MIS-FM: 3D Medical Image Segmentation using Foundation Models Pretrained on a Large-Scale Unannotated Dataset."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The autonomous driving community has witnessed a rapid growth in approachesthat embrace an end-to-end algorithm framework, utilizing raw sensor input togenerate vehicle motion plans, instead of concentrating on individual taskssuch as detection and motion prediction. End-to-end systems, in comparison tomodular pipelines, benefit from joint feature optimization for perception andplanning. This field has flourished due to the availability of large-scaledatasets, closed-loop evaluation, and the increasing need for autonomousdriving algorithms to perform effectively in challenging scenarios. In thissurvey, we provide a comprehensive analysis of more than 250 papers, coveringthe motivation, roadmap, methodology, challenges, and future trends inend-to-end autonomous driving. We delve into several critical challenges,including multi-modality, interpretability, causal confusion, robustness, andworld models, amongst others. Additionally, we discuss current advancements infoundation models and visual pre-training, as well as how to incorporate thesetechniques within the end-to-end driving framework. To facilitate futureresearch, we maintain an active repository that contains up-to-date links torelevant literature and open-source projects at", "output": "End-to-end Autonomous Driving: Challenges and Frontiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Single image 3D reconstruction is an important but challenging task thatrequires extensive knowledge of our natural world. Many existing methods solvethis problem by optimizing a neural radiance field under the guidance of 2Ddiffusion models but suffer from lengthy optimization time, 3D inconsistencyresults, and poor geometry. In this work, we propose a novel method that takesa single image of any object as input and generates a full 360-degree 3Dtextured mesh in a single feed-forward pass. Given a single image, we first usea view-conditioned 2D diffusion model, Zero123, to generate multi-view imagesfor the input view, and then aim to lift them up to 3D space. Since traditionalreconstruction methods struggle with inconsistent multi-view predictions, webuild our 3D reconstruction module upon an SDF-based generalizable neuralsurface reconstruction method and propose several critical training strategiesto enable the reconstruction of 360-degree meshes. Without costlyoptimizations, our method reconstructs 3D shapes in significantly less timethan existing methods. Moreover, our method favors better geometry, generatesmore 3D consistent results, and adheres more closely to the input image. Weevaluate our approach on both synthetic data and in-the-wild images anddemonstrate its superiority in terms of both mesh quality and runtime. Inaddition, our approach can seamlessly support the text-to-3D task byintegrating with off-the-shelf text-to-image diffusion models.", "output": "One-2-3-45: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper introduces DreamDiffusion, a novel method for generatinghigh-quality images directly from brain electroencephalogram (EEG) signals,without the need to translate thoughts into text. DreamDiffusion leveragespre-trained text-to-image models and employs temporal masked signal modeling topre-train the EEG encoder for effective and robust EEG representations.Additionally, the method further leverages the CLIP image encoder to provideextra supervision to better align EEG, text, and image embeddings with limitedEEG-image pairs. Overall, the proposed method overcomes the challenges of usingEEG signals for image generation, such as noise, limited information, andindividual differences, and achieves promising results. Quantitative andqualitative results demonstrate the effectiveness of the proposed method as asignificant step towards portable and low-cost ``thoughts-to-image'', withpotential applications in neuroscience and computer vision.", "output": "DreamDiffusion: Generating High-Quality Images from Brain EEG Signals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Invariance to spatial transformations such as translations and rotations is adesirable property and a basic design principle for classification neuralnetworks. However, the commonly used convolutional neural networks (CNNs) areactually very sensitive to even small translations. There exist vast works toachieve exact or approximate transformation invariance by designingtransformation-invariant models or assessing the transformations. These worksusually make changes to the standard CNNs and harm the performance on standarddatasets. In this paper, rather than modifying the classifier, we propose apre-classifier restorer to recover translated (or even rotated) inputs to theoriginal ones which will be fed into any classifier for the same dataset. Therestorer is based on a theoretical result which gives a sufficient andnecessary condition for an affine operator to be translational equivariant on atensor space.", "output": "Restore Translation Using Equivariant Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We show, for the first time, that neural networks trained only on syntheticdata achieve state-of-the-art accuracy on the problem of 3D human pose andshape (HPS) estimation from real images. Previous synthetic datasets have beensmall, unrealistic, or lacked realistic clothing. Achieving sufficient realismis non-trivial and we show how to do this for full bodies in motion.Specifically, our BEDLAM dataset contains monocular RGB videos withground-truth 3D bodies in SMPL-X format. It includes a diversity of bodyshapes, motions, skin tones, hair, and clothing. The clothing is realisticallysimulated on the moving bodies using commercial clothing physics simulation. Werender varying numbers of people in realistic scenes with varied lighting andcamera motions. We then train various HPS regressors using BEDLAM and achievestate-of-the-art accuracy on real-image benchmarks despite training withsynthetic data. We use BEDLAM to gain insights into what model design choicesare important for accuracy. With good synthetic training data, we find that abasic method like HMR approaches the accuracy of the current SOTA method(CLIFF). BEDLAM is useful for a variety of tasks and all images, ground truthbodies, 3D clothing, support code, and more are available for researchpurposes. Additionally, we provide detailed information about our syntheticdata generation pipeline, enabling others to generate their own datasets. Seethe project page: ", "output": "BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Feature alignment is the primary means of fusing multimodal data. We proposea feature alignment method that fully fuses multimodal information, whichalternately shifts and expands feature information from different modalities tohave a consistent representation in a feature space. The proposed method canrobustly capture high-level interactions between features of differentmodalities, thus significantly improving the performance of multimodallearning. We also show that the proposed method outperforms other popularmultimodal schemes on multiple tasks. Experimental evaluation of ETT andMIT-BIH-Arrhythmia, datasets shows that the proposed method achieves state ofthe art performance.", "output": "Alternative Telescopic Displacement: An Efficient Multimodal Alignment Method."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One central challenge in source-free unsupervised domain adaptation (UDA) isthe lack of an effective approach to evaluate the prediction results of theadapted network model in the target domain. To address this challenge, wepropose to explore a new method called cross-inferential networks (CIN). Ourmain idea is that, when we adapt the network model to predict the sample labelsfrom encoded features, we use these prediction results to construct newtraining samples with derived labels to learn a new examiner network thatperforms a different but compatible task in the target domain. Specifically, inthis work, the base network model is performing image classification while theexaminer network is tasked to perform relative ordering of triplets of sampleswhose training labels are carefully constructed from the prediction results ofthe base network model. Two similarity measures, cross-network correlationmatrix similarity and attention consistency, are then developed to provideimportant guidance for the UDA process. Our experimental results on benchmarkdatasets demonstrate that our proposed CIN approach can significantly improvethe performance of source-free UDA.", "output": "Cross-Inferential Networks for Source-free Unsupervised Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Classifiers based on deep neural networks have been recently challenged byAdversarial Attack, where the widely existing vulnerability has invoked theresearch in defending them from potential threats. Given a vulnerableclassifier, existing defense methods are mostly white-box and often requirere-training the victim under modified loss functions/training regimes. Whilethe model/data/training specifics of the victim are usually unavailable to theuser, re-training is unappealing, if not impossible for reasons such as limitedcomputational resources. To this end, we propose a new black-box defenseframework. It can turn any pre-trained classifier into a resilient one withlittle knowledge of the model specifics. This is achieved by new joint Bayesiantreatments on the clean data, the adversarial examples and the classifier, formaximizing their joint probability. It is further equipped with a newpost-train strategy which keeps the victim intact. We name our frameworkBayesian Boundary Correction (BBC). BBC is a general and flexible frameworkthat can easily adapt to different data types. We instantiate BBC for imageclassification and skeleton-based human activity recognition, for both staticand dynamic data. Exhaustive evaluation shows that BBC has superior robustnessand can enhance robustness without severely hurting the clean accuracy,compared with existing defense methods.", "output": "Defending Black-box Classifiers by Bayesian Boundary Correction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial intelligence represents a new frontier in human medicine thatcould save more lives and reduce the costs, thereby increasing accessibility.As a consequence, the rate of advancement of AI in cancer medical imaging andmore particularly tissue pathology has exploded, opening it to ethical andtechnical questions that could impede its adoption into existing systems. Inorder to chart the path of AI in its application to cancer tissue imaging, wereview current work and identify how it can improve cancer pathologydiagnostics and research. In this review, we identify 5 core tasks that modelsare developed for, including regression, classification, segmentation,generation, and compression tasks. We address the benefits and challenges thatsuch methods face, and how they can be adapted for use in cancer prevention andtreatment. The studies looked at in this paper represent the beginning of thisfield and future experiments will build on the foundations that we highlight.", "output": "The State of Applying Artificial Intelligence to Tissue Imaging for Cancer Research and Early Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal Named Entity Recognition (MNER) is a crucial task for informationextraction from social media platforms such as Twitter. Most current methodsrely on attention weights to extract information from both text and images butare often unreliable and lack interpretability. To address this problem, wepropose incorporating uncertainty estimation into the MNER task, producingtrustworthy predictions. Our proposed algorithm models the distribution of eachmodality as a Normal-inverse Gamma distribution, and fuses them into a unifieddistribution with an evidential fusion mechanism, enabling hierarchicalcharacterization of uncertainties and promotion of prediction accuracy andtrustworthiness. Additionally, we explore the potential of pre-trained largefoundation models in MNER and propose an efficient fusion approach thatleverages their robust feature representations. Experiments on two datasetsdemonstrate that our proposed method outperforms the baselines and achieves newstate-of-the-art performance.", "output": "Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regularization is a set of techniques that are used to improve thegeneralization ability of deep neural networks. In this paper, we introduceweight compander (WC), a novel effective method to improve generalization byreparameterizing each weight in deep neural networks using a nonlinearfunction. It is a general, intuitive, cheap and easy to implement method, whichcan be combined with various other regularization techniques. Large weights indeep neural networks are a sign of a more complex network that is overfitted tothe training data. Moreover, regularized networks tend to have a greater rangeof weights around zero with fewer weights centered at zero. We introduce aweight reparameterization function which is applied to each weight andimplicitly reduces overfitting by restricting the magnitude of the weightswhile forcing them away from zero at the same time. This leads to a moredemocratic decision-making in the network. Firstly, individual weights cannothave too much influence in the prediction process due to the restriction oftheir magnitude. Secondly, more weights are used in the prediction process,since they are forced away from zero during the training. This promotes theextraction of more features from the input data and increases the level ofweight redundancy, which makes the network less sensitive to statisticaldifferences between training and test data. We extend our method to learn thehyperparameters of the introduced weight reparameterization function. Thisavoids hyperparameter search and gives the network the opportunity to align theweight reparameterization with the training progress. We show experimentallythat using weight compander in addition to standard regularization methodsimproves the performance of neural networks.", "output": "Weight Compander: A Simple Weight Reparameterization for Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "State-of-the-art deep learning-based registration methods employ threedifferent learning strategies: supervised learning, which requires costlymanual annotations, unsupervised learning, which heavily relies on hand-craftedsimilarity metrics designed by domain experts, or learning from synthetic data,which introduces a domain shift. To overcome the limitations of thesestrategies, we propose a novel self-supervised learning paradigm forunsupervised registration, relying on self-training. Our idea is based on twokey insights. Feature-based differentiable optimizers 1) perform reasonableregistration even from random features and 2) stabilize the training of thepreceding feature extraction network on noisy labels. Consequently, we proposecyclical self-training, where pseudo labels are initialized as the displacementfields inferred from random features and cyclically updated based on more andmore expressive features from the learning feature extractor, yielding aself-reinforcement effect. We evaluate the method for abdomen and lungregistration, consistently surpassing metric-based supervision andoutperforming diverse state-of-the-art competitors. Source code is available at", "output": "Unsupervised 3D registration through optimization-guided cyclical self-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regularization is a set of techniques that are used to improve thegeneralization ability of deep neural networks. In this paper, we introducespectral batch normalization (SBN), a novel effective method to improvegeneralization by normalizing feature maps in the frequency (spectral) domain.The activations of residual networks without batch normalization (BN) tend toexplode exponentially in the depth of the network at initialization. This leadsto extremely large feature map norms even though the parameters are relativelysmall. These explosive dynamics can be very detrimental to learning. BN makesweight decay regularization on the scaling factors $gamma, beta$approximately equivalent to an additive penalty on the norm of the featuremaps, which prevents extremely large feature map norms to a certain degree.However, we show experimentally that, despite the approximate additive penaltyof BN, feature maps in deep neural networks (DNNs) tend to explode at thebeginning of the network and that feature maps of DNNs contain large valuesduring the whole training. This phenomenon also occurs in a weakened form innon-residual networks. SBN addresses large feature maps by normalizing them inthe frequency domain. In our experiments, we empirically show that SBN preventsexploding feature maps at initialization and large feature map values duringthe training. Moreover, the normalization of feature maps in the frequencydomain leads to more uniform distributed frequency components. This discouragesthe DNNs to rely on single frequency components of feature maps. These,together with other effects of SBN, have a regularizing effect on the trainingof residual and non-residual networks. We show experimentally that using SBN inaddition to standard regularization methods improves the performance of DNNs bya relevant margin, e.g. ResNet50 on ImageNet by 0.71%.", "output": "Spectral Batch Normalization: Normalization in the Frequency Domain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multiple Object Tracking (MOT) is crucial to autonomous vehicle perception.End-to-end transformer-based algorithms, which detect and track objectssimultaneously, show great potential for the MOT task. However, most existingmethods focus on image-based tracking with a single object category. In thispaper, we propose an end-to-end transformer-based MOT algorithm (MotionTrack)with multi-modality sensor inputs to track objects with multiple classes. Ourobjective is to establish a transformer baseline for the MOT in an autonomousdriving environment. The proposed algorithm consists of a transformer-baseddata association (DA) module and a transformer-based query enhancement moduleto achieve MOT and Multiple Object Detection (MOD) simultaneously. TheMotionTrack and its variations achieve better results (AMOTA score at 0.55) onthe nuScenes dataset compared with other classical baseline models, such as theAB3DMOT, the CenterTrack, and the probabilistic 3D Kalman filter. In addition,we prove that a modified attention mechanism can be utilized for DA toaccomplish the MOT, and aggregate history features to enhance the MODperformance.", "output": "MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The privacy protection mechanism of federated learning (FL) offers aneffective solution for cross-center medical collaboration and data sharing. Inmulti-site medical image segmentation, each medical site serves as a client ofFL, and its data naturally forms a domain. FL supplies the possibility toimprove the performance of seen domains model. However, there is a problem ofdomain generalization (DG) in the actual de-ployment, that is, the performanceof the model trained by FL in unseen domains will decrease. Hence, MLA-BIN isproposed to solve the DG of FL in this study. Specifically, the model-levelattention module (MLA) and batch-instance style normalization (BIN) block weredesigned. The MLA represents the unseen domain as a linear combination of seendomain models. The atten-tion mechanism is introduced for the weightingcoefficient to obtain the optimal coefficient ac-cording to the similarity ofinter-domain data features. MLA enables the global model to gen-eralize tounseen domain. In the BIN block, batch normalization (BN) and instancenormalization (IN) are combined to perform the shallow layers of thesegmentation network for style normali-zation, solving the influence ofinter-domain image style differences on DG. The extensive experimental resultsof two medical image seg-mentation tasks demonstrate that the proposed MLA-BINoutperforms state-of-the-art methods.", "output": "MLA-BIN: Model-level Attention and Batch-instance Style Normalization for Domain Generalization of Federated Learning on Medical Image Segmentation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Approaching the era of ubiquitous computing, human motion sensing plays acrucial role in smart systems for decision making, user interaction, andpersonalized services. Extensive research has been conducted on human tracking,pose estimation, gesture recognition, and activity recognition, which arepredominantly based on cameras in traditional methods. However, the intrusivenature of cameras limits their use in smart home applications. To address this,mmWave radars have gained popularity due to their privacy-friendly features. Inthis work, we propose textit{milliFlow}, a novel deep learning method forscene flow estimation as a complementary motion information for mmWave pointcloud, serving as an intermediate level of features and directly benefitingdownstream human motion sensing tasks. Experimental results demonstrate thesuperior performance of our method with an average 3D endpoint error of 4.6cm,significantly surpassing the competing approaches. Furthermore, byincorporating scene flow information, we achieve remarkable improvements inhuman activity recognition, human parsing, and human body part tracking. Tofoster further research in this area, we provide our codebase and dataset foropen access.", "output": "milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The search and retrieval of digital histopathology slides is an importanttask that has yet to be solved. In this case study, we investigate the clinicalreadiness of three state-of-the-art histopathology slide search engines,Yottixel, SISH, and RetCCL, on three patients with solid tumors. We provide aqualitative assessment of each model's performance in providing retrievalresults that are reliable and useful to pathologists. We found that all threeimage search engines fail to produce consistently reliable results and havedifficulties in capturing granular and subtle features of malignancy, limitingtheir diagnostic accuracy. Based on our findings, we also propose a minimal setof requirements to further advance the development of accurate and reliablehistopathology image search engines for successful clinical adoption.", "output": "Histopathology Slide Indexing and Search: Are We There Yet?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spiking neural networks (SNNs) have ultra-low energy consumption and highbiological plausibility due to their binary and bio-driven nature compared withartificial neural networks (ANNs). While previous research has primarilyfocused on enhancing the performance of SNNs in classification tasks, thegenerative potential of SNNs remains relatively unexplored. In our paper, weput forward Spiking Denoising Diffusion Probabilistic Models (SDDPM), a newclass of SNN-based generative models that achieve high sample quality. To fullyexploit the energy efficiency of SNNs, we propose a purely Spiking U-Netarchitecture, which achieves comparable performance to its ANN counterpartusing only 4 time steps, resulting in significantly reduced energy consumption.Extensive experimental results reveal that our approach achievesstate-of-the-art on the generative tasks and substantially outperforms otherSNN-based generative models, achieving up to $12times$ and $6times$improvement on the CIFAR-10 and the CelebA datasets, respectively. Moreover, wepropose a threshold-guided strategy that can further improve the performancesby 16.7% in a training-free manner. The SDDPM symbolizes a significantadvancement in the field of SNN generation, injecting new perspectives andpotential avenues of exploration.", "output": "Spiking Denoising Diffusion Probabilistic Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Documents hold spatial focus and valuable locality characteristics. Forexample, descriptions of listings in real estate or travel blogs containinformation about specific local neighborhoods. This information is valuable tocharacterize how humans perceive their environment. However, the first step tomaking use of this information is to identify the spatial focus (e.g., a city)of a document. Traditional approaches for identifying the spatial focus of adocument rely on detecting and disambiguating toponyms from the document. Thisapproach requires a vocabulary set of location phrases and ad-hoc rules, whichignore important words related to location. Recent topic modeling approachesusing large language models often consider a few topics, each with broadcoverage. In contrast, the spatial focus of a document can be a country, acity, or even a neighborhood, which together, is much larger than the number oftopics considered in these approaches. Additionally, topic modeling methods areoften applied to broad topics of news articles where context is easilydistinguishable. To identify the geographic focus of a document effectively, wepresent a simple but effective Joint Embedding of multi-LocaLitY (JELLY), whichjointly learns representations with separate encoders of document and location.JELLY significantly outperforms state-of-the-art methods for identifyingspatial focus from documents from a number of sources. We also demonstrate casestudies on the arithmetic of the learned representations, including identifyingcities with similar locality characteristics and zero-shot learning to identifydocument spatial focus.", "output": "The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "One of the mainstream schemes for 2D human pose estimation (HPE) is learningkeypoints heatmaps by a neural network. Existing methods typically improve thequality of heatmaps by customized architectures, such as high-resolutionrepresentation and vision Transformers. In this paper, we proposetextbf{DiffusionPose}, a new scheme that formulates 2D HPE as a keypointsheatmaps generation problem from noised heatmaps. During training, thekeypoints are diffused to random distribution by adding noises and thediffusion model learns to recover ground-truth heatmaps from noised heatmapswith respect to conditions constructed by image feature. During inference, thediffusion model generates heatmaps from initialized heatmaps in a progressivedenoising way. Moreover, we further explore improving the performance ofDiffusionPose with conditions from human structural information. Extensiveexperiments show the prowess of our DiffusionPose, with improvements of 1.6,1.2, and 1.2 mAP on widely-used COCO, CrowdPose, and AI Challenge datasets,respectively.", "output": "Learning Structure-Guided Diffusion Model for 2D Human Pose Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The rapid advancements in computer vision have stimulated remarkable progressin face forgery techniques, capturing the dedicated attention of researcherscommitted to detecting forgeries and precisely localizing manipulated areas.Nonetheless, with limited fine-grained pixel-wise supervision labels, deepfakedetection models perform unsatisfactorily on precise forgery detection andlocalization. To address this challenge, we introduce the well-trained visionsegmentation foundation model, i.e., Segment Anything Model (SAM) in faceforgery detection and localization. Based on SAM, we propose the Detect AnyDeepfakes (DADF) framework with the Multiscale Adapter, which can captureshort- and long-range forgery contexts for efficient fine-tuning. Moreover, tobetter identify forged traces and augment the model's sensitivity towardsforgery regions, Reconstruction Guided Attention (RGA) module is proposed. Theproposed framework seamlessly integrates end-to-end forgery localization anddetection optimization. Extensive experiments on three benchmark datasetsdemonstrate the superiority of our approach for both forgery detection andlocalization. The codes will be released soon at", "output": "Detect Any Deepfakes: Segment Anything Meets Face Forgery Detection and Localization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual learning (CL) is an approach to address catastrophic forgetting,which refers to forgetting previously learned knowledge by neural networks whentrained on new tasks or data distributions. The adversarial robustness hasdecomposed features into robust and non-robust types and demonstrated thatmodels trained on robust features significantly enhance adversarial robustness.However, no study has been conducted on the efficacy of robust features fromthe lens of the CL model in mitigating catastrophic forgetting in CL. In thispaper, we introduce the CL robust dataset and train four baseline models onboth the standard and CL robust datasets. Our results demonstrate that the CLmodels trained on the CL robust dataset experienced less catastrophicforgetting of the previously learned tasks than when trained on the standarddataset. Our observations highlight the significance of the features providedto the underlying CL models, showing that CL robust features can alleviatecatastrophic forgetting.", "output": "The Importance of Robust Features in Mitigating Catastrophic Forgetting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Historically, the rotorcraft community has experienced a higher fatalaccident rate than other aviation segments, including commercial and generalaviation. Recent advancements in artificial intelligence (AI) and theapplication of these technologies in different areas of our lives are bothintriguing and encouraging. When developed appropriately for the aviationdomain, AI techniques provide an opportunity to help design systems that canaddress rotorcraft safety challenges. Our recent work demonstrated that AIalgorithms could use video data from onboard cameras and correctly identifydifferent flight parameters from cockpit gauges, e.g., indicated airspeed.These AI-based techniques provide a potentially cost-effective solution,especially for small helicopter operators, to record the flight stateinformation and perform post-flight analyses. We also showed that carefullydesigned and trained AI systems could accurately predict rotorcraft attitude(i.e., pitch and yaw) from outside scenes (images or video data). Ordinaryoff-the-shelf video cameras were installed inside the rotorcraft cockpit torecord the outside scene, including the horizon. The AI algorithm couldcorrectly identify rotorcraft attitude at an accuracy in the range of 80%. Inthis work, we combined five different onboard camera viewpoints to improveattitude prediction accuracy to 94%. In this paper, five onboard camera viewsincluded the pilot windshield, co-pilot windshield, pilot Electronic FlightInstrument System (EFIS) display, co-pilot EFIS display, and the attitudeindicator gauge. Using video data from each camera view, we trained variousconvolutional neural networks (CNNs), which achieved prediction accuracy in therange of 79% % to 90% %. We subsequently ensembled the learned knowledge fromall CNNs and achieved an ensembled accuracy of 93.3%.", "output": "Deep Ensemble for Rotorcraft Attitude Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Instruction tuning unlocks the superior capability of Large Language Models(LLM) to interact with humans. Furthermore, recent instruction-followingdatasets include images as visual inputs, collecting responses for image-basedinstructions. However, visual instruction-tuned models cannot comprehendtextual details within images well. This work enhances the current visualinstruction tuning pipeline with text-rich images (e.g., movie posters, bookcovers, etc.). Specifically, we first use publicly available OCR tools tocollect results on 422K text-rich images from the LAION dataset. Moreover, weprompt text-only GPT-4 with recognized texts and image captions to generate 16Kconversations, each containing question-answer pairs for text-rich images. Bycombining our collected data with previous multi-modal instruction-followingdata, our model, LLaVAR, substantially improves the LLaVA model's capability ontext-based VQA datasets (up to 20% accuracy improvement) while achieving anaccuracy of 91.42% on ScienceQA. The GPT-4-based instruction-followingevaluation also demonstrates the improvement of our model on both naturalimages and text-rich images. Through qualitative analysis, LLaVAR showspromising interaction (e.g., reasoning, writing, and elaboration) skills withhumans based on the latest real-world online content that combines text andimages. We make our code/data/models publicly available at", "output": "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a novel alignment-before-generation approach to tackle thechallenging task of generating general 3D shapes based on 2D images or texts.Directly learning a conditional generative model from images or texts to 3Dshapes is prone to producing inconsistent results with the conditions because3D shapes have an additional dimension whose distribution significantly differsfrom that of 2D images and texts. To bridge the domain gap among the threemodalities and facilitate multi-modal-conditioned 3D shape generation, weexplore representing 3D shapes in a shape-image-text-aligned space. Ourframework comprises two models: a Shape-Image-Text-Aligned VariationalAuto-Encoder (SITA-VAE) and a conditional Aligned Shape Latent Diffusion Model(ASLDM). The former model encodes the 3D shapes into the shape latent spacealigned to the image and text and reconstructs the fine-grained 3D neuralfields corresponding to given shape embeddings via the transformer-baseddecoder. The latter model learns a probabilistic mapping function from theimage or text space to the latent shape space. Our extensive experimentsdemonstrate that our proposed approach can generate higher-quality and morediverse 3D shapes that better semantically conform to the visual or texturalconditional inputs, validating the effectiveness of theshape-image-text-aligned space for cross-modality 3D shape generation.", "output": "Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Masked image modelling (MIM) is a powerful self-supervised representationlearning paradigm, whose potential has not been widely demonstrated in medicalimage analysis. In this work, we show the capacity of MIM to capture richsemantic representations of Haemotoxylin & Eosin (H&E)-stained images at thenuclear level. Inspired by Bidirectional Encoder representation from ImageTransformers (BEiT), we split the images into smaller patches and generatecorresponding discrete visual tokens. In addition to the regular grid-basedpatches, typically used in visual Transformers, we introduce patches ofindividual cell nuclei. We propose positional encoding of the irregulardistribution of these structures within an image. We pre-train the model in aself-supervised manner on H&E-stained whole-slide images of diffuse largeB-cell lymphoma, where cell nuclei have been segmented. The pre-trainingobjective is to recover the original discrete visual tokens of the masked imageon the one hand, and to reconstruct the visual tokens of the masked objectinstances on the other. Coupling these two pre-training tasks allows us tobuild powerful, context-aware representations of nuclei. Our model generalizeswell and can be fine-tuned on downstream classification tasks, achievingimproved cell classification accuracy on PanNuke dataset by more than 5%compared to current instance segmentation methods.", "output": "Learning Nuclei Representations with Masked Image Modelling."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Portrait synthesis creates realistic digital avatars which enable users tointeract with others in a compelling way. Recent advances in StyleGAN and itsextensions have shown promising results in synthesizing photorealistic andaccurate reconstruction of human faces. However, previous methods often focuson frontal face synthesis and most methods are not able to handle large headrotations due to the training data distribution of StyleGAN. In this work, ourgoal is to take as input a monocular video of a face, and create an editabledynamic portrait able to handle extreme head poses. The user can create novelviewpoints, edit the appearance, and animate the face. Our method utilizespivotal tuning inversion (PTI) to learn a personalized video prior from amonocular video sequence. Then we can input pose and expression coefficients toMLPs and manipulate the latent vectors to synthesize different viewpoints andexpressions of the subject. We also propose novel loss functions to furtherdisentangle pose and expression in the latent space. Our algorithm shows muchbetter performance over previous approaches on monocular video datasets, and itis also capable of running in real-time at 54 FPS on an RTX 3080.", "output": "PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Given sparse views of an object, estimating their camera poses is along-standing and intractable problem. We harness the pre-trained diffusionmodel of novel views conditioned on viewpoints (Zero-1-to-3). We presentID-Pose which inverses the denoising diffusion process to estimate the relativepose given two input images. ID-Pose adds a noise on one image, and predictsthe noise conditioned on the other image and a decision variable for the pose.The prediction error is used as the objective to find the optimal pose with thegradient descent method. ID-Pose can handle more than two images and estimateeach of the poses with multiple image pairs from triangular relationships.ID-Pose requires no training and generalizes to real-world images. We conductexperiments using high-quality real-scanned 3D objects, where ID-Posesignificantly outperforms state-of-the-art methods.", "output": "ID-Pose: Sparse-view Camera Pose Estimation by Inverting Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent advances in diffusion-based generative models have shown incrediblepromise for Image-to-Image translation and editing. Most recent work in thisspace relies on additional training or architecture-specific adjustments to thediffusion process. In this work, we show that much of this low-level controlcan be achieved without additional training or any access to features of thediffusion model. Our method simply applies a filter to the input of eachdiffusion step based on the output of the previous step in an adaptive manner.Notably, this approach does not depend on any specific architecture or samplerand can be done without access to internal features of the network, making iteasy to combine with other techniques, samplers, and diffusion architectures.Furthermore, it has negligible cost to performance, and allows for morecontinuous adjustment of guidance strength than other approaches. We show FGDoffers a fast and strong baseline that is competitive with recentarchitecture-dependent approaches. Furthermore, FGD can also be used as asimple add-on to enhance the structural guidance of other state-of-the-art I2Imethods. Finally, our derivation of this method helps to understand the impactof self attention, a key component of other recent architecture-specific I2Iapproaches, in a more architecture-independent way. Project page:", "output": "Filtered-Guided Diffusion: Fast Filter Guidance for Black-Box Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Text-to-image diffusion models have attracted considerable interest due totheir wide applicability across diverse fields. However, challenges persist increating controllable models for personalized object generation. In this paper,we first identify the entanglement issues in existing personalized generativemodels, and then propose a straightforward and efficient data augmentationtraining strategy that guides the diffusion model to focus solely on objectidentity. By inserting the plug-and-play adapter layers from a pre-trainedcontrollable diffusion model, our model obtains the ability to control thelocation and size of each generated personalized object. During inference, wepropose a regionally-guided sampling technique to maintain the quality andfidelity of the generated images. Our method achieves comparable or superiorfidelity for personalized objects, yielding a robust, versatile, andcontrollable text-to-image diffusion model that is capable of generatingrealistic and personalized images. Our approach demonstrates significantpotential for various applications, such as those in art, entertainment, andadvertising design.", "output": "Generate Anything Anywhere in Any Scene."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a model that can perform multiple vision tasks and can be adaptedto other downstream tasks efficiently. Despite considerable progress inmulti-task learning, most efforts focus on learning from multi-label data: asingle image set with multiple task labels. Such multi-label data sets arerare, small, and expensive. We say heterogeneous to refer to image sets withdifferent task labels, or to combinations of single-task datasets. Few haveexplored training on such heterogeneous datasets. General-purpose vision modelsare still dominated by single-task pretraining, and it remains unclear how toscale up multi-task models by leveraging mainstream vision datasets designedfor different purposes. The challenges lie in managing large intrinsicdifferences among vision tasks, including data distribution, architectures,task-specific modules, dataset scales, and sampling strategies. To addressthese challenges, we propose to modify and scale up mixture-of-experts (MoE)vision transformers, so that they can simultaneously learn classification,detection, and segmentation on diverse mainstream vision datasets includingImageNet, COCO, and ADE20K. Our approach achieves comparable results tosingle-task state-of-the-art models and demonstrates strong generalization ondownstream tasks. Due to its emergent modularity, this general-purpose modeldecomposes into high-performing components, efficiently adapting to downstreamtasks. We can fine-tune it with fewer training parameters, fewer modelparameters, and less computation. Additionally, its modularity allows for easyexpansion in continual-learning-without-forgetting scenarios. Finally, thesefunctions can be controlled and combined to meet various demands of downstreamtasks.", "output": "An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite significant progress, previous multi-view unsupervised featureselection methods mostly suffer from two limitations. First, they generallyutilize either cluster structure or similarity structure to guide the featureselection, which neglect the possibility of a joint formulation with mutualbenefits. Second, they often learn the similarity structure by either globalstructure learning or local structure learning, which lack the capability ofgraph learning with both global and local structural awareness. In light ofthis, this paper presents a joint multi-view unsupervised feature selection andgraph learning (JMVFG) approach. Particularly, we formulate the multi-viewfeature selection with orthogonal decomposition, where each target matrix isdecomposed into a view-specific basis matrix and a view-consistent clusterindicator. The cross-space locality preservation is incorporated to bridge thecluster structure learning in the projected space and the similarity learning(i.e., graph learning) in the original space. Further, a unified objectivefunction is presented to enable the simultaneous learning of the clusterstructure, the global and local similarity structures, and the multi-viewconsistency and inconsistency, upon which an alternating optimization algorithmis developed with theoretically proved convergence. Extensive experiments on avariety of real-world multi-view datasets demonstrate the superiority of ourapproach for both the multi-view feature selection and graph learning tasks.The code is available at ", "output": "Joint Multi-view Unsupervised Feature Selection and Graph Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a dense dynamic RGB-D SLAM pipeline based on a learning-basedvisual odometry, TartanVO. TartanVO, like other direct methods rather thanfeature-based, estimates camera pose through dense optical flow, which onlyapplies to static scenes and disregards dynamic objects. Due to the colorconstancy assumption, optical flow is not able to differentiate between dynamicand static pixels. Therefore, to reconstruct a static map through such directmethods, our pipeline resolves dynamic/static segmentation by leveraging theoptical flow output, and only fuse static points into the map. Moreover, wererender the input frames such that the dynamic pixels are removed anditeratively pass them back into the visual odometry to refine the poseestimate.", "output": "Dynamic Dense RGB-D SLAM using Learning-based Visual Odometry."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Current multi-view 3D object detection methods often fail to detect objectsin the overlap region properly, and the networks' understanding of the scene isoften limited to that of a monocular detection network. Moreover, objects inthe overlap region are often largely occluded or suffer from deformation due tocamera distortion, causing a domain shift. To mitigate this issue, we proposeusing the following two main modules: (1) Stereo Disparity Estimation for WeakDepth Supervision and (2) Adversarial Overlap Region Discriminator. The formerutilizes the traditional stereo disparity estimation method to obtain reliabledisparity information from the overlap region. Given the disparity estimates assupervision, we propose regularizing the network to fully utilize the geometricpotential of binocular images and improve the overall detection accuracyaccordingly. Further, the latter module minimizes the representational gapbetween non-overlap and overlapping regions. We demonstrate the effectivenessof the proposed method with the nuScenes large-scale multi-view 3D objectdetection data. Our experiments show that our proposed method outperformscurrent state-of-the-art models, i.e., DETR3D and BEVDet.", "output": "ORA3D: Overlap Region Aware Multi-view 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Learned video compression has recently emerged as an essential research topicin developing advanced video compression technologies, where motioncompensation is considered one of the most challenging issues. In this paper,we propose a learned video compression framework via heterogeneous deformablecompensation strategy (HDCVC) to tackle the problems of unstable compressionperformance caused by single-size deformable kernels in downsampled featuredomain. More specifically, instead of utilizing optical flow warping orsingle-size-kernel deformable alignment, the proposed algorithm extractsfeatures from the two adjacent frames to estimate content-adaptiveheterogeneous deformable (HetDeform) kernel offsets. Then we transform thereference features with the HetDeform convolution to accomplish motioncompensation. Moreover, we design a Spatial-Neighborhood-Conditioned DivisiveNormalization (SNCDN) to achieve more effective data Gaussianization combinedwith the Generalized Divisive Normalization. Furthermore, we propose amulti-frame enhanced reconstruction module for exploiting context and temporalinformation for final quality enhancement. Experimental results indicate thatHDCVC achieves superior performance than the recent state-of-the-art learnedvideo compression approaches.", "output": "Learned Video Compression via Heterogeneous Deformable Compensation Network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper outlines an end-to-end optimized lossy image compression frameworkusing diffusion generative models. The approach relies on the transform codingparadigm, where an image is mapped into a latent space for entropy coding and,from there, mapped back to the data space for reconstruction. In contrast toVAE-based neural compression, where the (mean) decoder is a deterministicneural network, our decoder is a conditional diffusion model. Our approach thusintroduces an additional \"content\" latent variable on which the reversediffusion process is conditioned and uses this variable to store informationabout the image. The remaining \"texture\" variables characterizing the diffusionprocess are synthesized at decoding time. We show that the model's performancecan be tuned toward perceptual metrics of interest. Our extensive experimentsinvolving multiple datasets and image quality assessment metrics show that ourapproach yields stronger reported FID scores than the GAN-based model, whilealso yielding competitive performance with VAE-based models in severaldistortion metrics. Furthermore, training the diffusion with X-parameterizationenables high-quality reconstructions in only a handful of decoding steps,greatly affecting the model's practicality.", "output": "Lossy Image Compression with Conditional Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We offer a method for one-shot mask-guided image synthesis that allowscontrolling manipulations of a single image by inverting a quasi-robustclassifier equipped with strong regularizers. Our proposed method, entitledMAGIC, leverages structured gradients from a pre-trained quasi-robustclassifier to better preserve the input semantics while preserving itsclassification accuracy, thereby guaranteeing credibility in the synthesis.Unlike current methods that use complex primitives to supervise the process oruse attention maps as a weak supervisory signal, MAGIC aggregates gradientsover the input, driven by a guide binary mask that enforces a strong, spatialprior. MAGIC implements a series of manipulations with a single frameworkachieving shape and location control, intense non-rigid shape deformations, andcopy/move operations in the presence of repeating objects and gives users firmcontrol over the synthesis by requiring to simply specify binary guide masks.Our study and findings are supported by various qualitative comparisons withthe state-of-the-art on the same images sampled from ImageNet and quantitativeanalysis using machine perception along with a user survey of 100+ participantsthat endorse our synthesis quality. Project page at Code is available at", "output": "MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a longstanding interest in capturing the error behaviour of objectdetectors by finding images where their performance is likely to beunsatisfactory. In real-world applications such as autonomous driving, it isalso crucial to characterise potential failures beyond simple requirements ofdetection performance. For example, a missed detection of a pedestrian close toan ego vehicle will generally require closer inspection than a missed detectionof a car in the distance. The problem of predicting such potential failures attest time has largely been overlooked in the literature and conventionalapproaches based on detection uncertainty fall short in that they are agnosticto such fine-grained characterisation of errors. In this work, we propose toreformulate the problem of finding \"hard\" images as a query-based hard imageretrieval task, where queries are specific definitions of \"hardness\", and offera simple and intuitive method that can solve this task for a large family ofqueries. Our method is entirely post-hoc, does not require ground-truthannotations, is independent of the choice of a detector, and relies on anefficient Monte Carlo estimation that uses a simple stochastic model in placeof the ground-truth. We show experimentally that it can be applied successfullyto a wide variety of queries for which it can reliably identify hard images fora given detector without any labelled data. We provide results on ranking andclassification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN,and Cascade Mask-RCNN object detectors. The code for this project is availableat ", "output": "Query-based Hard-Image Retrieval for Object Detection at Test Time."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Small targets are often submerged in cluttered backgrounds of infraredimages. Conventional detectors tend to generate false alarms, while CNN-baseddetectors lose small targets in deep layers. To this end, we propose iSmallNet,a multi-stream densely nested network with label decoupling for infrared smallobject detection. On the one hand, to fully exploit the shape information ofsmall targets, we decouple the original labeled ground-truth (GT) map into aninterior map and a boundary one. The GT map, in collaboration with the twoadditional maps, tackles the unbalanced distribution of small objectboundaries. On the other hand, two key modules are delicately designed andincorporated into the proposed network to boost the overall performance. First,to maintain small targets in deep layers, we develop a multi-scale nestedinteraction module to explore a wide range of context information. Second, wedevelop an interior-boundary fusion module to integrate multi-granularityinformation. Experiments on NUAA-SIRST and NUDT-SIRST clearly show thesuperiority of iSmallNet over 11 state-of-the-art detectors.", "output": "iSmallNet: Densely Nested Network with Label Decoupling for Infrared Small Target Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The forensic attribution of the handwriting in a digitized document tomultiple scribes is a challenging problem of high dimensionality. Uniquehandwriting styles may be dissimilar in a blend of several factors includingcharacter size, stroke width, loops, ductus, slant angles, and cursiveligatures. Previous work on labeled data with Hidden Markov models, supportvector machines, and semi-supervised recurrent neural networks have providedmoderate to high success. In this study, we successfully detect hand shifts ina historical manuscript through fuzzy soft clustering in combination withlinear principal component analysis. This advance demonstrates the successfuldeployment of unsupervised methods for writer attribution of historicaldocuments and forensic document analysis.", "output": "Recognizing Handwriting Styles in a Historical Scanned Document Using Unsupervised Fuzzy Clustering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Several companies often safeguard their trained deep models (i.e., details ofarchitecture, learnt weights, training details etc.) from third-party users byexposing them only as black boxes through APIs. Moreover, they may not evenprovide access to the training data due to proprietary reasons or sensitivityconcerns. In this work, we propose a novel defense mechanism for black boxmodels against adversarial attacks in a data-free set up. We constructsynthetic data via generative model and train surrogate network using modelstealing techniques. To minimize adversarial contamination on perturbedsamples, we propose 'wavelet noise remover' (WNR) that performs discretewavelet decomposition on input images and carefully select only a few importantcoefficients determined by our 'wavelet coefficient selection module' (WCSM).To recover the high-frequency content of the image after noise removal via WNR,we further train a 'regenerator' network with the objective of retrieving thecoefficients such that the reconstructed image yields similar to originalpredictions on the surrogate model. At test time, WNR combined with trainedregenerator network is prepended to the black box network, resulting in a highboost in adversarial accuracy. Our method improves the adversarial accuracy onCIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared tobaseline, even when the attacker uses surrogate architecture (Alexnet-half andAlexnet) similar to the black box architecture (Alexnet) with same modelstealing strategy as defender. The code is available at", "output": "Data-free Defense of Black Box Models Against Adversarial Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Insects as pollinators play a crucial role in ecosystem management and worldfood production. However, insect populations are declining, calling forefficient methods of insect monitoring. Existing methods analyze video ortime-lapse images of insects in nature, but the analysis is challenging sinceinsects are small objects in complex and dynamic scenes of natural vegetation.In this work, we provide a dataset of primary honeybees visiting threedifferent plant species during two months of the summer period. The datasetconsists of 107,387 annotated time-lapse images from multiple cameras,including 9,423 annotated insects. We present a method pipeline for detectinginsects in time-lapse RGB images. The pipeline consists of a two-step process.Firstly, the time-lapse RGB images are preprocessed to enhance insects in theimages. This Motion-Informed-Enhancement technique uses motion and colors toenhance insects in images. Secondly, the enhanced images are subsequently fedinto a Convolutional Neural network (CNN) object detector. The method improvesthe deep learning object detectors You Only Look Once (YOLO) and FasterRegion-based CNN (Faster R-CNN). Using Motion-Informed-Enhancement, theYOLO-detector improves the average micro F1-score from 0.49 to 0.71, and theFaster R-CNN-detector improves the average micro F1-score from 0.32 to 0.56 onthe dataset. Our dataset and proposed method provide a step forward to automatethe time-lapse camera monitoring of flying insects. The dataset is publishedon: ", "output": "Motion Informed Object Detection of Small Insects in Time-lapse Camera Recordings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose an auto-encoder architecture for multi-texture synthesis. Theapproach relies on both a compact encoder accounting for second order neuralstatistics and a generator incorporating adaptive periodic content. Images areembedded in a compact and geometrically consistent latent space, where thetexture representation and its spatial organisation are disentangled. Texturesynthesis and interpolation tasks can be performed directly from these latentcodes. Our experiments demonstrate that our model outperforms state-of-the-artfeed-forward methods in terms of visual quality and various texture relatedmetrics.", "output": "A geometrically aware auto-encoder for multi-texture synthesis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dot-product attention mechanism plays a crucial role in modern deeparchitectures (e.g., Transformer) for sequence modeling, however, na\"ive exactcomputation of this model incurs quadratic time and memory complexities insequence length, hindering the training of long-sequence models. Criticalbottlenecks are due to the computation of partition functions in thedenominator of softmax function as well as the multiplication of the softmaxmatrix with the matrix of values. Our key observation is that the former can bereduced to a variant of the kernel density estimation (KDE) problem, and anefficient KDE solver can be further utilized to accelerate the latter viasubsampling-based fast matrix products. Our proposed KDEformer can approximatethe attention in sub-quadratic time with provable spectral norm bounds, whileall prior results merely provide entry-wise error bounds. Empirically, weverify that KDEformer outperforms other attention approximations in terms ofaccuracy, memory, and runtime on various pre-trained models. On BigGAN imagegeneration, we achieve better generative scores than the exact computation withover $4times$ speedup. For ImageNet classification with T2T-ViT, KDEformershows over $18times$ speedup while the accuracy drop is less than $0.5%$.", "output": "KDEformer: Accelerating Transformers via Kernel Density Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The success of self-supervised learning (SSL) has mostly been attributed tothe availability of unlabeled yet large-scale datasets. However, in aspecialized domain such as medical imaging which is a lot different fromnatural images, the assumption of data availability is unrealistic andimpractical, as the data itself is scanty and found in small databases,collected for specific prognosis tasks. To this end, we seek to investigate theapplicability of self-supervised learning algorithms on small-scale medicalimaging datasets. In particular, we evaluate $4$ state-of-the-art SSL methodson three publicly accessible emph{small} medical imaging datasets. Ourinvestigation reveals that in-domain low-resource SSL pre-training can yieldcompetitive performance to transfer learning from large-scale datasets (such asImageNet). Furthermore, we extensively analyse our empirical findings toprovide valuable insights that can motivate for further research towardscircumventing the need for pre-training on a large image corpus. To the best ofour knowledge, this is the first attempt to holistically exploreself-supervision on low-resource medical datasets.", "output": "Exploring Self-Supervised Representation Learning For Low-Resource Medical Image Analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "LiDAR point clouds have become the most common data source in autonomousdriving. However, due to the sparsity of point clouds, accurate and reliabledetection cannot be achieved in specific scenarios. Because of theircomplementarity with point clouds, images are getting increasing attention.Although with some success, existing fusion methods either perform hard fusionor do not fuse in a direct manner. In this paper, we propose a generic 3Ddetection framework called MMFusion, using multi-modal features. The frameworkaims to achieve accurate fusion between LiDAR and images to improve 3Ddetection in complex scenes. Our framework consists of two separate streams:the LiDAR stream and the camera stream, which can be compatible with anysingle-modal feature extraction network. The Voxel Local Perception Module inthe LiDAR stream enhances local feature representation, and then theMulti-modal Feature Fusion Module selectively combines feature output fromdifferent streams to achieve better fusion. Extensive experiments have shownthat our framework not only outperforms existing benchmarks but also improvestheir detection, especially for detecting cyclists and pedestrians on KITTIbenchmarks, with strong robustness and generalization capabilities. Hopefully,our work will stimulate more research into multi-modal fusion for autonomousdriving tasks.", "output": "A Generalized Multi-Modal Fusion Detection Framework."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Motion forecasting is a key module in an autonomous driving system. Due tothe heterogeneous nature of multi-sourced input, multimodality in agentbehavior, and low latency required by onboard deployment, this task isnotoriously challenging. To cope with these difficulties, this paper proposes anovel agent-centric model with anchor-informed proposals for efficientmultimodal motion prediction. We design a modality-agnostic strategy toconcisely encode the complex input in a unified manner. We generate diverseproposals, fused with anchors bearing goal-oriented scene context, to inducemultimodal prediction that covers a wide range of future trajectories. Ournetwork architecture is highly uniform and succinct, leading to an efficientmodel amenable for real-world driving deployment. Experiments reveal that ouragent-centric network compares favorably with the state-of-the-art methods inprediction accuracy, while achieving scene-centric level inference latency.", "output": "ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous driving requires a comprehensive understanding of the surroundingenvironment for reliable trajectory planning. Previous works rely on denserasterized scene representation (e.g., agent occupancy and semantic map) toperform planning, which is computationally intensive and misses theinstance-level structure information. In this paper, we propose VAD, anend-to-end vectorized paradigm for autonomous driving, which models the drivingscene as a fully vectorized representation. The proposed vectorized paradigmhas two significant advantages. On one hand, VAD exploits the vectorized agentmotion and map elements as explicit instance-level planning constraints whicheffectively improves planning safety. On the other hand, VAD runs much fasterthan previous end-to-end planning methods by getting rid ofcomputation-intensive rasterized representation and hand-designedpost-processing steps. VAD achieves state-of-the-art end-to-end planningperformance on the nuScenes dataset, outperforming the previous best method bya large margin. Our base model, VAD-Base, greatly reduces the average collisionrate by 29.0% and runs 2.5x faster. Besides, a lightweight variant, VAD-Tiny,greatly improves the inference speed (up to 9.3x) while achieving comparableplanning performance. We believe the excellent performance and the highefficiency of VAD are critical for the real-world deployment of an autonomousdriving system. Code and models will be released for facilitating futureresearch.", "output": "VAD: Vectorized Scene Representation for Efficient Autonomous Driving."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The application of Computer Vision (CV) techniques massively stimulatesmicroscopic traffic safety analysis from the perspective of traffic conflictsand near misses, which is usually measured using Surrogate Safety Measures(SSM). However, as video processing and traffic safety modeling are twoseparate research domains and few research have focused on systematicallybridging the gap between them, it is necessary to provide transportationresearchers and practitioners with corresponding guidance. With this aim inmind, this paper focuses on reviewing the applications of CV techniques intraffic safety modeling using SSM and suggesting the best way forward. The CValgorithm that are used for vehicle detection and tracking from earlyapproaches to the state-of-the-art models are summarized at a high level. Then,the video pre-processing and post-processing techniques for vehicle trajectoryextraction are introduced. A detailed review of SSMs for vehicle trajectorydata along with their application on traffic safety analysis is presented.Finally, practical issues in traffic video processing and SSM-based safetyanalysis are discussed, and the available or potential solutions are provided.This review is expected to assist transportation researchers and engineers withthe selection of suitable CV techniques for video processing, and the usage ofSSMs for various traffic safety research objectives.", "output": "Advances and Applications of Computer Vision Techniques in Vehicle Trajectory Generation and Surrogate Traffic Safety Indicators."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of deep-learning models in many tasks, there have beenconcerns about such models learning shortcuts, and their lack of robustness toirrelevant confounders. When it comes to models directly trained on humanfaces, a sensitive confounder is that of human identities. Many face-relatedtasks should ideally be identity-independent, and perform uniformly acrossdifferent individuals (i.e. be fair). One way to measure and enforce suchrobustness and performance uniformity is through enforcing it during training,assuming identity-related information is available at scale. However, due toprivacy concerns and also the cost of collecting such information, this isoften not the case, and most face datasets simply contain input images andtheir corresponding task-related labels. Thus, improving identity-relatedrobustness without the need for such annotations is of great importance. Here,we explore using face-recognition embedding vectors, as proxies for identities,to enforce such robustness. We propose to use the structure in theface-recognition embedding space, to implicitly emphasize rare samples withineach class. We do so by weighting samples according to their conditionalinverse density (CID) in the proxy embedding space. Our experiments suggestthat such a simple sample weighting scheme, not only improves the trainingrobustness, it often improves the overall performance as a result of suchrobustness. We also show that employing such constraints during trainingresults in models that are significantly less sensitive to different levels ofbias in the dataset.", "output": "Improving Identity-Robustness for Face Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional Neural Networks (CNNs) are the predominant model used for avariety of medical image analysis tasks. At inference time, these models arecomputationally intensive, especially with volumetric data. In principle, it ispossible to trade accuracy for computational efficiency by manipulating therescaling factor in the downsample and upsample layers of CNN architectures.However, properly exploring the accuracy-efficiency trade-off is prohibitivelyexpensive with existing models. To address this, we introduce Scale-SpaceHyperNetworks (SSHN), a method that learns a spectrum of CNNs with varyinginternal rescaling factors. A single SSHN characterizes an entire Paretoaccuracy-efficiency curve of models that match, and occasionally surpass, theoutcomes of training many separate networks with fixed rescaling factors. Wedemonstrate the proposed approach in several medical image analysisapplications, comparing SSHN against strategies with both fixed and dynamicrescaling factors. We find that SSHN consistently provides a betteraccuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNsenable the user to quickly choose a rescaling factor that appropriatelybalances accuracy and computational efficiency for their particular needs atinference.", "output": "Scale-Space Hypernetworks for Efficient Biomedical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Detecting fake images is becoming a major goal of computer vision. This needis becoming more and more pressing with the continuous improvement of synthesismethods based on Generative Adversarial Networks (GAN), and even more with theappearance of powerful methods based on Diffusion Models (DM). Towards thisend, it is important to gain insight into which image features betterdiscriminate fake images from real ones. In this paper we report on oursystematic study of a large number of image generators of different families,aimed at discovering the most forensically relevant characteristics of real andgenerated images. Our experiments provide a number of interesting observationsand shed light on some intriguing properties of synthetic images: (1) not onlythe GAN models but also the DM and VQ-GAN (Vector Quantized GenerativeAdversarial Networks) models give rise to visible artifacts in the Fourierdomain and exhibit anomalous regular patterns in the autocorrelation; (2) whenthe dataset used to train the model lacks sufficient variety, its biases can betransferred to the generated images; (3) synthetic and real images exhibitsignificant differences in the mid-high frequency signal content, observable intheir radial and angular spectral power distributions.", "output": "Intriguing properties of synthetic images: from generative adversarial networks to diffusion models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The deep learning technique has been shown to be effectively addressedseveral image analysis tasks in the computer-aided diagnosis scheme formammography. The training of an efficacious deep learning model requires largedata with diverse styles and qualities. The diversity of data often comes fromthe use of various scanners of vendors. But, in practice, it is impractical tocollect a sufficient amount of diverse data for training. To this end, a novelcontrastive learning is developed to equip the deep learning models with betterstyle generalization capability. Specifically, the multi-style and multi-viewunsupervised self-learning scheme is carried out to seek robust featureembedding against style diversity as a pretrained model. Afterward, thepretrained network is further fine-tuned to the downstream tasks, e.g., massdetection, matching, BI-RADS rating, and breast density classification. Theproposed method has been evaluated extensively and rigorously with mammogramsfrom various vendor style domains and several public datasets. The experimentalresults suggest that the proposed domain generalization method can effectivelyimprove performance of four mammographic image tasks on the data from both seenand unseen domains, and outperform many state-of-the-art (SOTA) generalizationmethods.", "output": "Domain Generalization for Mammographic Image Analysis with Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, 3D object detection has attracted significant attention andachieved continuous improvement in real road scenarios. The environmentalinformation is collected from a single sensor or multi-sensor fusion to detectinterested objects. However, most of the current 3D object detection approachesfocus on developing advanced network architectures to improve the detectionprecision of the object rather than considering the dynamic driving scenes,where data collected from sensors equipped in the vehicle contain variousperturbation features. As a result, existing work cannot still tackle theperturbation issue. In order to solve this problem, we propose a groupequivariant bird's eye view network (GeqBevNet) based on the group equivarianttheory, which introduces the concept of group equivariant into the BEV fusionobject detection network. The group equivariant network is embedded into thefused BEV feature map to facilitate the BEV-level rotational equivariantfeature extraction, thus leading to lower average orientation error. In orderto demonstrate the effectiveness of the GeqBevNet, the network is verified onthe nuScenes validation dataset in which mAOE can be decreased to 0.325.Experimental results demonstrate that GeqBevNet can extract more rotationalequivariant features in the 3D object detection of the actual road scene andimprove the performance of object orientation prediction.", "output": "Group Equivariant BEV for 3D Object Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "During the continuous evolution of one organism's ancestry, its genesaccumulate extensive experiences and knowledge, enabling newborn descendants torapidly adapt to their specific environments. Motivated by this observation, wepropose a novel machine learning paradigm Learngene to enable learning modelsto incorporate three key characteristics of genes. (i) Accumulating: theknowledge is accumulated during the continuous learning of an ancestry model.(ii) Condensing: the extensive accumulated knowledge is condensed into a muchmore compact information piece, i.e., learngene. (iii) Inheriting: thecondensed learngene is inherited to make it easier for descendant models toadapt to new environments. Since accumulating has been studied inwell-established paradigms like large-scale pre-training and lifelong learning,we focus on condensing and inheriting, which induces three key issues and weprovide the preliminary solutions to these issues in this paper: (i) LearngeneForm: the learngene is set to a few integral layers that can preservesignificance. (ii) Learngene Condensing: we identify which layers among theancestry model have the most similarity as one pseudo descendant model. (iii)Learngene Inheriting: to construct distinct descendant models for the specificdownstream tasks, we stack some randomly initialized layers to the learngenelayers. Extensive experiments across various settings, including usingdifferent network architectures like Vision Transformer (ViT) and ConvolutionalNeural Networks (CNNs) on different datasets, are carried out to confirm fouradvantages of Learngene: it makes the descendant models 1) converge morequickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and4) require fewer training samples to converge.", "output": "Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Automated brain tumor segmentation methods have become well-established andreached performance levels offering clear clinical utility. These methodstypically rely on four input magnetic resonance imaging (MRI) modalities:T1-weighted images with and without contrast enhancement, T2-weighted images,and FLAIR images. However, some sequences are often missing in clinicalpractice due to time constraints or image artifacts, such as patient motion.Consequently, the ability to substitute missing modalities and gainsegmentation performance is highly desirable and necessary for the broaderadoption of these algorithms in the clinical routine. In this work, we presentthe establishment of the Brain MR Image Synthesis Benchmark (BraSyn) inconjunction with the Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2023. The primary objective of this challenge is to evaluate imagesynthesis methods that can realistically generate missing MRI modalities whenmultiple available images are provided. The ultimate aim is to facilitateautomated brain tumor segmentation pipelines. The image dataset used in thebenchmark is diverse and multi-modal, created through collaboration withvarious hospitals and research institutions.", "output": "The Brain Tumor Segmentation (BraTS) Challenge 2023: Brain MR Image Synthesis for Tumor Segmentation (BraSyn)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image segmentation is particularly critical as a prerequisite forrelevant quantitative analysis in the treatment of clinical diseases. Forexample, in clinical cervical cancer radiotherapy, after acquiring subabdominalMRI images, a fast and accurate image segmentation of organs and tumors in MRIimages can optimize the clinical radiotherapy process, whereas traditionalapproaches use manual annotation by specialist doctors, which is time-consumingand laborious, therefore, automatic organ segmentation of subabdominal MRIimages is a valuable research topic.", "output": "An image segmentation algorithm based on multi-scale feature pyramid network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The integration of Computer-Assisted Diagnosis (CAD) with Large LanguageModels (LLMs) holds great potential in clinical applications, specifically inthe roles of digital family doctors and clinic assistants. However, currentworks in this field are plagued by limitations, specifically a restricted scopeof applicable image domains and the provision of unreliable medical advice Thisrestricts their overall processing capabilities. Furthermore, the mismatch inwriting style between LLMs and radiologists undermines their practicalusefulness. To tackle these challenges, we introduce ChatCAD+, which isdesigned to be universal and reliable. It is capable of handling medical imagesfrom diverse domains and leveraging up-to-date information from reputablemedical websites to provide reliable medical advice. Additionally, itincorporates a template retrieval system that improves report generationperformance via exemplar reports, enabling seamless integration into existingclinical workflows. The source code is available at", "output": "ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Autonomous vehicles rely on LiDAR sensors to perceive the environment.Adverse weather conditions like rain, snow, and fog negatively affect thesesensors, reducing their reliability by introducing unwanted noise in themeasurements. In this work, we tackle this problem by proposing a novelapproach for detecting adverse weather effects in LiDAR data. We reformulatethis problem as an outlier detection task and use an energy-based framework todetect outliers in point clouds. More specifically, our method learns toassociate low energy scores with inlier points and high energy scores withoutliers allowing for robust detection of adverse weather effects. In extensiveexperiments, we show that our method performs better in adverse weatherdetection and has higher robustness to unseen weather effects than previousstate-of-the-art methods. Furthermore, we show how our method can be used toperform simultaneous outlier detection and semantic segmentation. Finally, tohelp expand the research field of LiDAR perception in adverse weather, werelease the SemanticSpray dataset, which contains labeled vehicle spray data inhighway-like scenarios. The dataset is available at .", "output": "Energy-based Detection of Adverse Weather Effects in LiDAR Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Eye tracking research is important in computer vision because it can help usunderstand how humans interact with the visual world. Specifically forhigh-risk applications, such as in medical imaging, eye tracking can help us tocomprehend how radiologists and other medical professionals search, analyze,and interpret images for diagnostic and clinical purposes. Hence, theapplication of eye tracking techniques in disease classification has becomeincreasingly popular in recent years. Contemporary works usually transform gazeinformation collected by eye tracking devices into visual attention maps (VAMs)to supervise the learning process. However, this is a time-consumingpreprocessing step, which stops us from applying eye tracking to radiologists'daily work. To solve this problem, we propose a novel gaze-guided graph neuralnetwork (GNN), GazeGNN, to leverage raw eye-gaze data without being convertedinto VAMs. In GazeGNN, to directly integrate eye gaze into imageclassification, we create a unified representation graph that models bothimages and gaze pattern information. With this benefit, we develop a real-time,real-world, end-to-end disease classification algorithm for the first time inthe literature. This achievement demonstrates the practicality and feasibilityof integrating real-time eye tracking techniques into the daily work ofradiologists. To our best knowledge, GazeGNN is the first work that adopts GNNto integrate image and eye-gaze data. Our experiments on the public chest X-raydataset show that our proposed method exhibits the best classificationperformance compared to existing methods. The code is available.", "output": "GazeGNN: A Gaze-Guided Graph Neural Network for Chest X-ray Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning leverages knowledge from other domains and has beensuccessful in many applications. Transfer learning methods rely on the overallsimilarity of the source and target domains. However, in some cases, it isimpossible to provide an overall similar source domain, and only some sourcedomains with similar local features can be provided. Can transfer learning beachieved? In this regard, we propose a multi-source adversarial transferlearning method based on local feature similarity to the source domain tohandle transfer scenarios where the source and target domains have only localsimilarities. This method extracts transferable local features between a singlesource domain and the target domain through a sub-network. Specifically, thefeature extractor of the sub-network is induced by the domain discriminator tolearn transferable knowledge between the source domain and the target domain.The extracted features are then weighted by an attention module to suppressnon-transferable local features while enhancing transferable local features. Inorder to ensure that the data from the target domain in different sub-networksin the same batch is exactly the same, we designed a multi-source domainindependent strategy to provide the possibility for later local feature fusionto complete the key features required. In order to verify the effectiveness ofthe method, we made the dataset \"Local Carvana Image Masking Dataset\". Applyingthe proposed method to the image segmentation task of the proposed datasetachieves better transfer performance than other multi-source transfer learningmethods. It is shown that the designed transfer learning method is feasible fortransfer scenarios where the source and target domains have only localsimilarities.", "output": "Multi-source adversarial transfer learning based on similar source domains with local features."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present an approach to reconstruct humans and track them over time. At thecore of our approach, we propose a fully \"transformerized\" version of a networkfor human mesh recovery. This network, HMR 2.0, advances the state of the artand shows the capability to analyze unusual poses that have in the past beendifficult to reconstruct from single images. To analyze video, we use 3Dreconstructions from HMR 2.0 as input to a tracking system that operates in 3D.This enables us to deal with multiple people and maintain identities throughocclusion events. Our complete approach, 4DHumans, achieves state-of-the-artresults for tracking people from monocular video. Furthermore, we demonstratethe effectiveness of HMR 2.0 on the downstream task of action recognition,achieving significant improvements over previous pose-based action recognitionapproaches. Our code and models are available on the project website:", "output": "Humans in 4D: Reconstructing and Tracking Humans with Transformers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This work will enter the submission stage, so specific information will betemporarily hidden, also hide the title.", "output": "A Work Based on GAN."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "DDIM inversion has revealed the remarkable potential of real image editingwithin diffusion-based methods. However, the accuracy of DDIM reconstructiondegrades as larger classifier-free guidance (CFG) scales being used forenhanced editing. Null-text inversion (NTI) optimizes null embeddings to alignthe reconstruction and inversion trajectories with larger CFG scales, enablingreal image editing with cross-attention control. Negative-prompt inversion(NPI) further offers a training-free closed-form solution of NTI. However, itmay introduce artifacts and is still constrained by DDIM reconstructionquality. To overcome these limitations, we propose proximal guidance andincorporate it to NPI with cross-attention control. We enhance NPI with aregularization term and reconstruction guidance, which reduces artifacts whilecapitalizing on its training-free nature. Additionally, we extend the conceptsto incorporate mutual self-attention control, enabling geometry and layoutalterations in the editing process. Our method provides an efficient andstraightforward approach, effectively addressing real image editing tasks withminimal computational overhead.", "output": "Improving Tuning-Free Real Image Editing with Proximal Guidance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial intelligence (AI) has seen a tremendous surge in capabilitiesthanks to the use of foundation models trained on internet-scale data. On theflip side, the uncurated nature of internet-scale data also poses significantprivacy and legal risks, as they often contain personal information orcopyrighted material that should not be trained on without permission. In thiswork, we propose as a mitigation measure a recipe to train foundation visionmodels with differential privacy (DP) guarantee. We identify maskedautoencoders as a suitable learning algorithm that aligns well with DP-SGD, andtrain ViP -- a Vision transformer with differential Privacy -- under a strictprivacy budget of $epsilon=8$ on the LAION400M dataset. We evaluate thequality of representation learned by ViP using standard downstream visiontasks; in particular, ViP achieves a (non-private) linear probing accuracy of$55.7%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trainedand evaluated on ImageNet). Our result suggests that scaling to internet-scaledata can be practical for private learning. Code is available aturl{", "output": "ViP: A Differentially Private Foundation Model for Computer Vision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Anomaly detection (AD) is a fundamental research problem in machine learningand computer vision, with practical applications in industrial inspection,video surveillance, and medical diagnosis. In medical imaging, AD is especiallyvital for detecting and diagnosing anomalies that may indicate rare diseases orconditions. However, there is a lack of a universal and fair benchmark forevaluating AD methods on medical images, which hinders the development of moregeneralized and robust AD methods in this specific domain. To bridge this gap,we introduce a comprehensive evaluation benchmark for assessing anomalydetection methods on medical images. This benchmark encompasses six reorganizeddatasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT,chest X-ray, and digital histopathology) and three key evaluation metrics, andincludes a total of fourteen state-of-the-art AD algorithms. This standardizedand well-curated medical benchmark with the well-structured codebase enablescomprehensive comparisons among recently proposed anomaly detection methods. Itwill facilitate the community to conduct a fair comparison and advance thefield of AD on medical imaging. More information on BMAD is available in ourGitHub repository: ", "output": "BMAD: Benchmarks for Medical Anomaly Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The existing contrastive learning methods widely adopt one-hot instancediscrimination as pretext task for self-supervised learning, which inevitablyneglects rich inter-instance similarities among natural images, then leading topotential representation degeneration. In this paper, we propose a novel imagemix method, PatchMix, for contrastive learning in Vision Transformer (ViT), tomodel inter-instance similarities among images. Following the nature of ViT, werandomly mix multiple images from mini-batch in patch level to construct mixedimage patch sequences for ViT. Compared to the existing sample mix methods, ourPatchMix can flexibly and efficiently mix more than two images and simulatemore complicated similarity relations among natural images. In this manner, ourcontrastive framework can significantly reduce the gap between contrastiveobjective and ground truth in reality. Experimental results demonstrate thatour proposed method significantly outperforms the previous state-of-the-art onboth ImageNet-1K and CIFAR datasets, e.g., 3.0% linear accuracy improvement onImageNet-1K and 8.7% kNN accuracy improvement on CIFAR100. Moreover, our methodachieves the leading transfer performance on downstream tasks, object detectionand instance segmentation on COCO dataset. The code is available at", "output": "Inter-Instance Similarity Modeling for Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural fields have achieved impressive advancements in view synthesis andscene reconstruction. However, editing these neural fields remains challengingdue to the implicit encoding of geometry and texture information. In thispaper, we propose DreamEditor, a novel framework that enables users to performcontrolled editing of neural fields using text prompts. By representing scenesas mesh-based neural fields, DreamEditor allows localized editing withinspecific regions. DreamEditor utilizes the text encoder of a pretrainedtext-to-Image diffusion model to automatically identify the regions to beedited based on the semantics of the text prompts. Subsequently, DreamEditoroptimizes the editing region and aligns its geometry and texture with the textprompts through score distillation sampling [29]. Extensive experiments havedemonstrated that DreamEditor can accurately edit neural fields of real-worldscenes according to the given text prompts while ensuring consistency inirrelevant areas. DreamEditor generates highly realistic textures and geometry,significantly surpassing previous works in both quantitative and qualitativeevaluations.", "output": "DreamEditor: Text-Driven 3D Scene Editing with Neural Fields."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Multimodal Named Entity Recognition (MNER) and Multimodal Relation Extraction(MRE) necessitate the fundamental reasoning capacity for intricate linguisticand multimodal comprehension. In this study, we explore distilling thereasoning ability of large language models (LLMs) into a more compact studentmodel by generating a textit{chain of thought} (CoT) -- a sequence ofintermediate reasoning steps. Specifically, we commence by exemplifying theelicitation of such reasoning ability from LLMs through CoT prompts coveringmulti-grain (noun, sentence, multimodality) and data-augmentation (style,entity, image) dimensions. Subsequently, we present a novel conditional promptdistillation method to assimilate the commonsense reasoning ability from LLMs,thereby enhancing the utility of the student model in addressing text-onlyinputs without the requisite addition of image and CoT knowledge. Extensiveexperiments reveal that our approach attains state-of-the-art accuracy andmanifests a plethora of advantages concerning interpretability, dataefficiency, and cross-domain generalization on MNER and MRE datasets.", "output": "Chain-of-Thought Prompt Distillation for Multimodal Named Entity and Multimodal Relation Extraction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "When deep neural network has been proposed to assist the dentist in designingthe location of dental implant, most of them are targeting simple cases whereonly one missing tooth is available. As a result, literature works do not workwell when there are multiple missing teeth and easily generate falsepredictions when the teeth are sparsely distributed. In this paper, we aretrying to integrate a weak supervision text, the target region, to the implantposition regression network, to address above issues. We propose a textcondition embedded implant position regression network (TCEIP), to embed thetext condition into the encoder-decoder framework for improvement of theregression performance. A cross-modal interaction that consists of cross-modalattention (CMA) and knowledge alignment module (KAM) is proposed to facilitatethe interaction between features of images and texts. The CMA module performs across-attention between the image feature and the text condition, and the KAMmitigates the knowledge gap between the image feature and the image encoder ofthe CLIP. Extensive experiments on a dental implant dataset through five-foldcross-validation demonstrated that the proposed TCEIP achieves superiorperformance than existing methods.", "output": "TCEIP: Text Condition Embedded Regression Network for Dental Implant Position Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Intrinsic image decomposition and inverse rendering are long-standingproblems in computer vision. To evaluate albedo recovery, most algorithmsreport their quantitative performance with a mean Weighted Human DisagreementRate (WHDR) metric on the IIW dataset. However, WHDR focuses only on relativealbedo values and often fails to capture overall quality of the albedo. Inorder to comprehensively evaluate albedo, we collect a new dataset, MeasuredAlbedo in the Wild (MAW), and propose three new metrics that complement WHDR:intensity, chromaticity and texture metrics. We show that existing algorithmsoften improve WHDR metric but perform poorly on other metrics. We then finetunedifferent algorithms on our MAW dataset to significantly improve the quality ofthe reconstructed albedo both quantitatively and qualitatively. Since theproposed intensity, chromaticity, and texture metrics and the WHDR are allcomplementary we further introduce a relative performance measure that capturesaverage performance. By analysing existing algorithms we show that there issignificant room for improvement. Our dataset and evaluation metrics willenable researchers to develop algorithms that improve albedo reconstruction.Code and Data available at: ", "output": "Measured Albedo in the Wild: Filling the Gap in Intrinsics Evaluation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large-scale vision-language (V-L) models have demonstrated remarkablegeneralization capabilities for downstream tasks through prompt tuning.However, their performance suffers significantly in the presence of classimbalance, a common issue in real-world scenarios. In this paper, weinvestigate the effects of class imbalance on the generalization performance ofV-L models and extend Neural Collapse phenomenon to these models, revealing thegeometric reasons behind the impact of class imbalance on their generalizationability. To address this problem, we propose Neural Collapse based PromptTuning (NPT), a novel method that optimizes prompts so that both text and imagefeatures satisfy the same simplex ETF structure. NPT incorporates tworegularization terms, geometric de-biasing and multi-modal isomorphism, toenhance the robustness of V-L models under class imbalance conditions whilemaintaining their generalization capabilities. Our comprehensive experimentsshow that NPT outperforms existing prompt learning techniques across 11 diverseimage recognition datasets, achieving an absolute average gain of 2.63% fornovel classes and 2.47% for harmonic mean when facing imbalanced data.", "output": "Bridging the Gap: Neural Collapse Inspired Prompt Tuning for Generalization under Class Imbalance."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Ultrasound (US) imaging is a popular tool in clinical diagnosis, offeringsafety, repeatability, and real-time capabilities. Freehand 3D US is atechnique that provides a deeper understanding of scanned regions withoutincreasing complexity. However, estimating elevation displacement andaccumulation error remains challenging, making it difficult to infer therelative position using images alone. The addition of external lightweightsensors has been proposed to enhance reconstruction performance without addingcomplexity, which has been shown to be beneficial. We propose a novel onlineself-consistency network (OSCNet) using multiple inertial measurement units(IMUs) to improve reconstruction performance. OSCNet utilizes a modal-levelself-supervised strategy to fuse multiple IMU information and reducedifferences between reconstruction results obtained from each IMU data.Additionally, a sequence-level self-consistency strategy is proposed to improvethe hierarchical consistency of prediction results among the scanning sequenceand its sub-sequences. Experiments on large-scale arm and carotid datasets withmultiple scanning tactics demonstrate that our OSCNet outperforms previousmethods, achieving state-of-the-art reconstruction performance.", "output": "Multi-IMU with Online Self-Consistency for Freehand 3D Ultrasound Reconstruction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper we demonstrate both theoretically as well as numerically thatneural networks can detect model-free static arbitrage opportunities wheneverthe market admits some. Due to the use of neural networks, our method can beapplied to financial markets with a high number of traded securities andensures almost immediate execution of the corresponding trading strategies. Todemonstrate its tractability, effectiveness, and robustness we provide examplesusing real financial data. From a technical point of view, we prove that asingle neural network can approximately solve a class of convex semi-infiniteprograms, which is the key result in order to derive our theoretical resultsthat neural networks can detect model-free static arbitrage strategies wheneverthe financial market admits such opportunities.", "output": "Neural networks can detect model-free static arbitrage strategies."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the widespread digitization of finance and the increasing popularity ofcryptocurrencies, the sophistication of fraud schemes devised by cybercriminalsis growing. Money laundering -- the movement of illicit funds to conceal theirorigins -- can cross bank and national boundaries, producing complextransaction patterns. The UN estimates 2-5% of global GDP or $0.8 - $2.0trillion dollars are laundered globally each year. Unfortunately, real data totrain machine learning models to detect laundering is generally not available,and previous synthetic data generators have had significant shortcomings. Arealistic, standardized, publicly-available benchmark is needed for comparingmodels and for the advancement of the area.To this end, this paper contributes a synthetic financial transaction datasetgenerator and a set of synthetically generated AML (Anti-Money Laundering)datasets. We have calibrated this agent-based generator to match realtransactions as closely as possible and made the datasets public. We describethe generator in detail and demonstrate how the datasets generated can helpcompare different Graph Neural Networks in terms of their AML abilities. In akey way, using synthetic data in these comparisons can be even better thanusing real data: the ground truth labels are complete, whilst many launderingtransactions in real data are never detected.", "output": "Realistic Synthetic Financial Transactions for Anti-Money Laundering Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the recommendation systems, there are multiple business domains to meetthe diverse interests and needs of users, and the click-through rate(CTR) ofeach domain can be quite different, which leads to the demand for CTRprediction modeling for different business domains. The industry solution is touse domain-specific models or transfer learning techniques for each domain. Thedisadvantage of the former is that the data from other domains is not utilizedby a single domain model, while the latter leverage all the data from differentdomains, but the fine-tuned model of transfer learning may trap the model in alocal optimum of the source domain, making it difficult to fit the targetdomain. Meanwhile, significant differences in data quantity and feature schemasbetween different domains, known as domain shift, may lead to negative transferin the process of transferring. To overcome these challenges, we propose theCollaborative Cross-Domain Transfer Learning Framework (CCTL). CCTL evaluatesthe information gain of the source domain on the target domain using asymmetric companion network and adjusts the information transfer weight of eachsource domain sample using the information flow network. This approach enablesfull utilization of other domain data while avoiding negative migration.Additionally, a representation enhancement network is used as an auxiliary taskto preserve domain-specific features. Comprehensive experiments on both publicand real-world industrial datasets, CCTL achieved SOTA score on offlinemetrics. At the same time, the CCTL algorithm has been deployed in Meituan,bringing 4.37% CTR and 5.43% GMV lift, which is significant to the business.", "output": "A Collaborative Transfer Learning Framework for Cross-domain Recommendation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Accurate generation of realistic future scenarios of renewable energygeneration is crucial for long-term planning and operation of electricalsystems, especially considering the increasing focus on sustainable energy andthe growing penetration of renewable generation in energy matrices. Thesepredictions enable power system operators and energy planners to effectivelymanage the variability and intermittency associated with renewable generation,allowing for better grid stability, improved energy management, and enhanceddecision-making processes. In this paper, we propose an innovative method forgenerating long-term hourly scenarios for wind and solar power generation,taking into consideration the correlation between these two energy sources. Toachieve this, we combine the capabilities of a Variational Autoencoder (VAE)with the additional benefits of incorporating the Radial Basis Function (RBF)kernel in our artificial neural network architecture. By incorporating them, weaim to obtain a latent space with improved regularization properties. Toevaluate the effectiveness of our proposed method, we conduct experiments in arepresentative study scenario, utilizing real-world wind and solar powergeneration data from the Brazil system. We compare the scenarios generated byour model with the observed data and with other sets of scenarios produced by aconventional VAE architecture. Our experimental results demonstrate that theproposed method can generate long-term hourly scenarios for wind and solarpower generation that are highly correlated, accurately capturing the temporaland spatial characteristics of these energy sources. Taking advantage of thebenefits of RBF in obtaining a well-regularized latent space, our approachoffers improved accuracy and robustness in generating long-term hourlyscenarios for renewable energy generation.", "output": "Long-Term Hourly Scenario Generation for Correlated Wind and Solar Power combining Variational Autoencoders with Radial Basis Function Kernels."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine learning (ML) and tensor-based methods have been of significantinterest for the scientific community for the last few decades. In a previouswork we presented a novel tensor-based system identification framework to easethe computational burden of tensor-only architectures while still being able toachieve exceptionally good performance. However, the derived approach onlyallows to process real-valued problems and is therefore not directly applicableon a wide range of signal processing and communications problems, which oftendeal with complex-valued systems. In this work we therefore derive two newarchitectures to allow the processing of complex-valued signals, and show thatthese extensions are able to surpass the trivial, complex-valued extension ofthe original architecture in terms of performance, while only requiring aslight overhead in computational resources to allow for complex-valuedoperations.", "output": "Complex-valued Adaptive System Identification via Low-Rank Tensor Decomposition."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Quantization is commonly used in Deep Neural Networks (DNNs) to reduce thestorage and computational complexity by decreasing the arithmetical precisionof activations and weights, a.k.a. tensors. Efficient hardware architecturesemploy linear quantization to enable the deployment of recent DNNs ontoembedded systems and mobile devices. However, linear uniform quantizationcannot usually reduce the numerical precision to less than 8 bits withoutsacrificing high performance in terms of model accuracy. The performance lossis due to the fact that tensors do not follow uniform distributions. In thispaper, we show that a significant amount of tensors fit into an exponentialdistribution. Then, we propose DNA-TEQ to exponentially quantize DNN tensorswith an adaptive scheme that achieves the best trade-off between numericalprecision and accuracy loss. The experimental results show that DNA-TEQprovides a much lower quantization bit-width compared to previous proposals,resulting in an average compression ratio of 40% over the linear INT8 baseline,with negligible accuracy loss and without retraining the DNNs. Besides, DNA-TEQleads the way in performing dot-product operations in the exponential domain,which saves 66% of energy consumption on average for a set of widely used DNNs.", "output": "DNA-TEQ: An Adaptive Exponential Quantization of Tensors for DNN Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Model-agnostic feature attributions can provide local insights in complex MLmodels. If the explanation is correct, a domain expert can validate and trustthe model's decision. However, if it contradicts the expert's knowledge,related work only corrects irrelevant features to improve the model. To allowfor unlimited interaction, in this paper we provide model-agnosticimplementations for two popular explanation methods (Occlusion and Shapleyvalues) to enforce entirely different attributions in the complex model. For aparticular set of samples, we use the corrected feature attributions togenerate extra local data, which is used to retrain the model to have the rightexplanation for the samples. Through simulated and real data experiments on avariety of models we show how our proposed approach can significantly improvethe model's performance only by augmenting its training dataset based oncorrected explanations. Adding our interactive explanations to active learningsettings increases the sample efficiency significantly and outperforms existingexplanatory interactive strategies. Additionally we explore how a domain expertcan provide feature attributions which are sufficiently correct to improve themodel.", "output": "Increasing Performance And Sample Efficiency With Model-agnostic Interactive Feature Attributions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Modern advancements in large-scale machine learning would be impossiblewithout the paradigm of data-parallel distributed computing. Since distributedcomputing with large-scale models imparts excessive pressure on communicationchannels, significant recent research has been directed toward co-designingcommunication compression strategies and training algorithms with the goal ofreducing communication costs. While pure data parallelism allows better datascaling, it suffers from poor model scaling properties. Indeed, compute nodesare severely limited by memory constraints, preventing further increases inmodel size. For this reason, the latest achievements in training giant neuralnetwork models also rely on some form of model parallelism. In this work, wetake a closer theoretical look at Independent Subnetwork Training (IST), whichis a recently proposed and highly effective technique for solving theaforementioned problems. We identify fundamental differences between IST andalternative approaches, such as distributed methods with compressedcommunication, and provide a precise analysis of its optimization performanceon a quadratic model.", "output": "Towards a Better Theoretical Understanding of Independent Subnetwork Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "For min-max optimization and variational inequalities problems (VIP)encountered in diverse machine learning tasks, Stochastic Extragradient (SEG)and Stochastic Gradient Descent Ascent (SGDA) have emerged as preeminentalgorithms. Constant step-size variants of SEG/SGDA have gained popularity,with appealing benefits such as easy tuning and rapid forgiveness of initialconditions, but their convergence behaviors are more complicated even inrudimentary bilinear models. Our work endeavors to elucidate and quantify theprobabilistic structures intrinsic to these algorithms. By recasting theconstant step-size SEG/SGDA as time-homogeneous Markov Chains, we establish afirst-of-its-kind Law of Large Numbers and a Central Limit Theorem,demonstrating that the average iterate is asymptotically normal with a uniqueinvariant distribution for an extensive range of monotone and non-monotoneVIPs. Specializing to convex-concave min-max optimization, we characterize therelationship between the step-size and the induced bias with respect to theVon-Neumann's value. Finally, we establish that Richardson-Rombergextrapolation can improve proximity of the average iterate to the globalsolution for VIPs. Our probabilistic analysis, underpinned by experimentscorroborating our theoretical discoveries, harnesses techniques fromoptimization, Markov chains, and operator theory.", "output": "Stochastic Methods in Variational Inequalities: Ergodicity, Bias and Refinements."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The two-time scale nature of SAC, which is an actor-critic algorithm, ischaracterised by the fact that the critic estimate has not converged for theactor at any given time, but since the critic learns faster than the actor, itensures eventual consistency between the two. Various strategies have beenintroduced in literature to learn better gradient estimates to help achievebetter convergence. Since gradient estimates depend upon the critic, we positthat improving the critic can provide a better gradient estimate for the actorat each time. Utilizing this, we propose Soft Actor Retrospective Critic(SARC), where we augment the SAC critic loss with another loss term -retrospective loss - leading to faster critic convergence and consequently,better policy gradient estimates for the actor. An existing implementation ofSAC can be easily adapted to SARC with minimal modifications. Through extensiveexperimentation and analysis, we show that SARC provides consistent improvementover SAC on benchmark environments. We plan to open-source the code and allexperiment data at: ", "output": "SARC: Soft Actor Retrospective Critic."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning is a powerful paradigm for large-scale machine learning,but it faces significant challenges due to unreliable network connections, slowcommunication, and substantial data heterogeneity across clients. FedAvg andSCAFFOLD are two fundamental algorithms to address these challenges. Inparticular, FedAvg employs multiple local updates before communicating with acentral server, while SCAFFOLD maintains a control variable on each client tocompensate for \"client drift\" in its local updates. Various methods have beenproposed in literature to enhance the convergence of these two algorithms, butthey either make impractical adjustments to algorithmic structure, or rely onthe assumption of bounded data heterogeneity.This paper explores the utilization of momentum to enhance the performance ofFedAvg and SCAFFOLD. When all clients participate in the training process, wedemonstrate that incorporating momentum allows FedAvg to converge withoutrelying on the assumption of bounded data heterogeneity even using a constantlocal learning rate. This is a novel result since existing analyses for FedAvgrequire bounded data heterogeneity even with diminishing local learning rates.In the case of partial client participation, we show that momentum enablesSCAFFOLD to converge provably faster without imposing any additionalassumptions. Furthermore, we use momentum to develop new variance-reducedextensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergencerates. Our experimental results support all theoretical findings.", "output": "Momentum Benefits Non-IID Federated Learning Simply and Provably."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce the notion of an $varepsilon$-cover for a kernel range space. Akernel range space concerns a set of points $X subset mathbb{R}^d$ and thespace of all queries by a fixed kernel (e.g., a Gaussian kernel $K(p,cdot) =exp(-|p-cdot|^2)$). For a point set $X$ of size $n$, a query returns avector of values $R_p in mathbb{R}^n$, where the $i$th coordinate $(R_p)_i =K(p,x_i)$ for $x_i in X$. An $varepsilon$-cover is a subset of points $Qsubset mathbb{R}^d$ so for any $p in mathbb{R}^d$ that $frac{1}{n} |R_p -R_q|_1leq varepsilon$ for some $q in Q$. This is a smooth analog ofHaussler's notion of $varepsilon$-covers for combinatorial range spaces (e.g.,defined by subsets of points within a ball query) where the resulting vectors$R_p$ are in ${0,1}^n$ instead of $[0,1]^n$. The kernel versions of theserange spaces show up in data analysis tasks where the coordinates may beuncertain or imprecise, and hence one wishes to add some flexibility in thenotion of inside and outside of a query range.Our main result is that, unlike combinatorial range spaces, the size ofkernel $varepsilon$-covers is independent of the input size $n$ and dimension$d$. We obtain a bound of $(1/varepsilon)^{tilde O(1/varepsilon^2)}$, where$tilde{O}(f(1/varepsilon))$ hides log factors in $(1/varepsilon)$ that candepend on the kernel. This implies that by relaxing the notion of boundaries inrange queries, eventually the curse of dimensionality disappears, and may helpexplain the success of machine learning in very high-dimensions. We alsocomplement this result with a lower bound of almost$(1/varepsilon)^{Omega(1/varepsilon)}$, showing the exponential dependenceon $1/varepsilon$ is necessary.", "output": "For Kernel Range Spaces a Constant Number of Queries Are Sufficient."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Numerically solving partial differential equations (PDEs) typically requiresfine discretization to resolve necessary spatiotemporal scales, which can becomputationally expensive. Recent advances in deep learning have provided a newapproach to solving PDEs that involves the use of neural operators. Neuraloperators are neural network architectures that learn mappings between functionspaces and have the capability to solve partial differential equations based ondata. This study utilizes a novel neural operator called Hyena, which employs along convolutional filter that is parameterized by a multilayer perceptron. TheHyena operator is an operation that enjoys sub-quadratic complexity and statespace model to parameterize long convolution that enjoys global receptivefield. This mechanism enhances the model's comprehension of the input's contextand enables data-dependent weight for different PDE instances. To measure howeffective the layers are in solving PDEs, we conduct experiments on Burger'sequation and Navier Stokes equation. Our findings indicate Hyena Neuraloperator can serve as an efficient and accurate model for learning PDEs'solution operator. The data and code used can be found at:", "output": "HNO: Hyena Neural Operator for solving PDEs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Review-Based Recommender Systems (RBRS) have attracted increasing researchinterest due to their ability to alleviate well-known cold-start problems. RBRSutilizes reviews to construct the user and items representations. However, inthis paper, we argue that such a reliance on reviews may instead expose systemsto the risk of being shilled. To explore this possibility, in this paper, wepropose the first generation-based model for shilling attacks against RBRSs.Specifically, we learn a fake review generator through reinforcement learning,which maliciously promotes items by forcing prediction shifts after addinggenerated reviews to the system. By introducing the auxiliary rewards toincrease text fluency and diversity with the aid of pre-trained language modelsand aspect predictors, the generated reviews can be effective for shilling withhigh fidelity. Experimental results demonstrate that the proposed framework cansuccessfully attack three different kinds of RBRSs on the Amazon corpus withthree domains and Yelp corpus. Furthermore, human studies also show that thegenerated reviews are fluent and informative. Finally, equipped with AttackReview Generators (ARGs), RBRSs with adversarial training are much more robustto malicious reviews.", "output": "Shilling Black-box Review-based Recommender Systems through Fake Review Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Background: People's health depends on the use of proper diet as an importantfactor. Today, with the increasing mechanization of people's lives, propereating habits and behaviors are neglected. On the other hand, foodrecommendations in the field of health have also tried to deal with this issue.But with the introduction of the Western nutrition style and the advancement ofWestern chemical medicine, many issues have emerged in the field of diseasetreatment and nutrition. Recent advances in technology and the use ofartificial intelligence methods in information systems have led to the creationof recommender systems in order to improve people's health. Methods: A hybridrecommender system including, collaborative filtering, content-based, andknowledge-based models was used. Machine learning models such as Decision Tree,k-Nearest Neighbors (kNN), AdaBoost, and Bagging were investigated in the fieldof food recommender systems on 2519 students in the nutrition management systemof a university. Student information including profile information for basalmetabolic rate, student reservation records, and selected diet type is receivedonline. Among the 15 features collected and after consulting nutrition experts,the most effective features are selected through feature engineering. Usingmachine learning models based on energy indicators and food selection historyby students, food from the university menu is recommended to students. Results:The AdaBoost model has the highest performance in terms of accuracy with a rateof 73.70 percent. Conclusion: Considering the importance of diet in people'shealth, recommender systems are effective in obtaining useful information froma huge amount of data. Keywords: Recommender system, Food behavior and habits,Machine learning, Classification", "output": "A Food Recommender System in Academic Environments Based on Machine Learning Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent clinical research describes a subset of glioblastoma patients thatexhibit REP prior to start of radiation therapy. Current literature has thusfar described this population using clinicopathologic features. To ourknowledge, this study is the first to investigate the potential of conventionalra-diomics, sophisticated multi-resolution fractal texture features, anddifferent molecular features (MGMT, IDH mutations) as a diagnostic andprognostic tool for prediction of REP from non-REP cases using computationaland statistical modeling methods. Radiation-planning T1 post-contrast (T1C) MRIsequences of 70 patients are analyzed. Ensemble method with 5-fold crossvalidation over 1000 iterations offers AUC of 0.793 with standard deviation of0.082 for REP and non-REP classification. In addition, copula-based modelingunder dependent censoring (where a subset of the patients may not be followedup until death) identifies significant features (p-value <0.05) for survivalprobability and prognostic grouping of patient cases. The prediction ofsurvival for the patients cohort produces precision of 0.881 with standarddeviation of 0.056. The prognostic index (PI) calculated using the fusedfeatures suggests that 84.62% of REP cases fall under the bad prognostic group,suggesting potentiality of fused features to predict a higher percentage of REPcases. The experimental result further shows that mul-ti-resolution fractaltexture features perform better than conventional radiomics features for REPand survival outcomes.", "output": "Prediction of Rapid Early Progression and Survival Risk with Pre-Radiation MRI in WHO Grade 4 Glioma Patients."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Meeting online is becoming the new normal. Creating an immersive experiencefor online meetings is a necessity towards more diverse and seamlessenvironments. Efficient photorealistic rendering of human 3D dynamics is thecore of immersive meetings. Current popular applications achieve real-timeconferencing but fall short in delivering photorealistic human dynamics, eitherdue to limited 2D space or the use of avatars that lack realistic interactionsbetween participants. Recent advances in neural rendering, such as the NeuralRadiance Field (NeRF), offer the potential for greater realism in metaversemeetings. However, the slow rendering speed of NeRF poses challenges forreal-time conferencing. We envision a pipeline for a future extended realitymetaverse conferencing system that leverages monocular video acquisition andfree-viewpoint synthesis to enhance data and hardware efficiency. Towards animmersive conferencing experience, we explore an accelerated NeRF-basedfree-viewpoint synthesis algorithm for rendering photorealistic human dynamicsmore efficiently. We show that our algorithm achieves comparable renderingquality while performing training and inference 44.5% and 213% faster thanstate-of-the-art methods, respectively. Our exploration provides a design basisfor constructing metaverse conferencing systems that can handle complexapplication scenarios, including dynamic scene relighting with customizedthemes and multi-user conferencing that harmonizes real-world people into anextended world.", "output": "Envisioning a Next Generation Extended Reality Conferencing System with Efficient Photorealistic Human Rendering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "As machine learning (ML) based systems are adopted in domains such as lawenforcement, criminal justice, finance, hiring and admissions, ensuring thefairness of ML aided decision-making is becoming increasingly important. Inthis paper, we focus on the problem of fair classification, and introduce anovel min-max F-divergence regularization framework for learning fairclassification models while preserving high accuracy. Our framework consists oftwo trainable networks, namely, a classifier network and a bias/fairnessestimator network, where the fairness is measured using the statistical notionof F-divergence. We show that F-divergence measures possess convexity anddifferentiability properties, and their variational representation make themwidely applicable in practical gradient based training methods. The proposedframework can be readily adapted to multiple sensitive attributes and for highdimensional datasets. We study the F-divergence based training paradigm for twotypes of group fairness constraints, namely, demographic parity and equalizedodds. We present a comprehensive set of experiments for several real-world datasets arising in multiple domains (including COMPAS, Law Admissions, AdultIncome, and CelebA datasets). To quantify the fairness-accuracy tradeoff, weintroduce the notion of fairness-accuracy receiver operating characteristic(FA-ROC) and a corresponding textit{low-bias} FA-ROC, which we argue is anappropriate measure to evaluate different classifiers. In comparison to severalexisting approaches for learning fair classifiers (including pre-processing,post-processing and other regularization methods), we show that the proposedF-divergence based framework achieves state-of-the-art performance with respectto the trade-off between accuracy and fairness.", "output": "Learning Fair Classifiers via Min-Max F-divergence Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the recent development in machine learning, most learning systems arestill under the concept of \"black box\", where the performance cannot beunderstood and derived. With the rise of safety and privacy concerns in public,designing an explainable learning system has become a new trend in machinelearning. In general, many machine learning problems are formulated asminimizing (or maximizing) some loss function. Since real data are most likelygenerated from non-linear models, the loss function is non-convex in general.Unlike the convex optimization problem, gradient descent algorithms will betrapped in spurious local minima in solving non-convex optimization. Therefore,it is challenging to provide explainable algorithms when studying non-convexoptimization problems. In this thesis, two popular non-convex problems arestudied: (1) low-rank matrix completion and (2) neural network learning.", "output": "Non-Convex Optimizations for Machine Learning with Theoretical Guarantee: Robust Matrix Completion and Neural Network Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-dimensional datasets depict a challenge for learning tasks in datamining and machine learning. Feature selection is an effective technique indealing with dimensionality reduction. It is often an essential data processingstep prior to applying a learning algorithm. Over the decades, filter featureselection methods have evolved from simple univariate relevance rankingalgorithms to more sophisticated relevance-redundancy trade-offs and tomultivariate dependencies-based approaches in recent years. This tendency tocapture multivariate dependence aims at obtaining unique information about theclass from the intercooperation among features. This paper presents acomprehensive survey of the state-of-the-art work on filter feature selectionmethods assisted by feature intercooperation, and summarizes the contributionsof different approaches found in the literature. Furthermore, current issuesand challenges are introduced to identify promising future research anddevelopment.", "output": "Feature Selection: A perspective on inter-attribute cooperation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The mean of an unknown variance-$sigma^2$ distribution $f$ can be estimatedfrom $n$ samples with variance $frac{sigma^2}{n}$ and nearly correspondingsubgaussian rate. When $f$ is known up to translation, this can be improvedasymptotically to $frac{1}{nmathcal I}$, where $mathcal I$ is the Fisherinformation of the distribution. Such an improvement is not possible forgeneral unknown $f$, but [Stone, 1975] showed that this asymptotic convergence$textit{is}$ possible if $f$ is $textit{symmetric}$ about its mean. Stone'sbound is asymptotic, however: the $n$ required for convergence depends in anunspecified way on the distribution $f$ and failure probability $delta$. Inthis paper we give finite-sample guarantees for symmetric mean estimation interms of Fisher information. For every $f, n, delta$ with $n > logfrac{1}{delta}$, we get convergence close to a subgaussian with variance$frac{1}{n mathcal I_r}$, where $mathcal I_r$ is the $r$-$textit{smoothed}$Fisher information with smoothing radius $r$ that decays polynomially in $n$.Such a bound essentially matches the finite-sample guarantees in the known-$f$setting.", "output": "Finite-Sample Symmetric Mean Estimation with Fisher Information Rate."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider a decision maker allocating one unit of renewable and divisibleresource in each period on a number of arms. The arms have unknown and randomrewards whose means are proportional to the allocated resource and whosevariances are proportional to an order $b$ of the allocated resource. Inparticular, if the decision maker allocates resource $A_i$ to arm $i$ in aperiod, then the reward $Y_i$ is$Y_i(A_i)=A_i mu_i+A_i^b xi_{i}$, where$mu_i$ is the unknown mean and the noise $xi_{i}$ is independent andsub-Gaussian. When the order $b$ ranges from 0 to 1, the framework smoothlybridges the standard stochastic multi-armed bandit and online learning withfull feedback. We design two algorithms that attain the optimal gap-dependentand gap-independent regret bounds for $bin [0,1]$, and demonstrate a phasetransition at $b=1/2$. The theoretical results hinge on a novel concentrationinequality we have developed that bounds a linear combination of sub-Gaussianrandom variables whose weights are fractional, adapted to the filtration, andmonotonic.", "output": "Allocating Divisible Resources on Arms with Unknown and Random Rewards."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A dynamical system produces a dependent multivariate sequence calleddynamical time series, developed with an evolution function. As variables inthe dynamical time series at the current time-point usually depend on the wholevariables in the previous time-point, existing studies forecast the variablesat the future time-point by estimating the evolution function. However, somevariables in the dynamical time-series are missing in some practicalsituations. In this study, we propose an autoregressive with slack time series(ARS) model. ARS model involves the simultaneous estimation of the evolutionfunction and the underlying missing variables as a slack time series, with theaid of the time-invariance and linearity of the dynamical system. This studyempirically demonstrates the effectiveness of the proposed ARS model.", "output": "Forecasting of the development of a partially-observed dynamical time series with the aid of time-invariance and linearity."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, Transformer-based language models have become the standardapproach for natural language processing tasks. However, stringent throughputand latency requirements in industrial applications are limiting theiradoption. To mitigate the gap, model compression techniques such as structuredpruning are being used to improve inference efficiency. However, most existingneural network inference runtimes lack adequate support for structuredsparsity. In this paper, we propose an efficient sparse deep learning inferencesoftware stack for Transformer-based language models where the weights arepruned with constant block size. Our sparse software accelerator leveragesIntel Deep Learning Boost to maximize the performance of sparse matrix - densematrix multiplication (commonly abbreviated as SpMM) on CPUs. Our SpMM kerneloutperforms the existing sparse libraries (oneMKL, TVM, and LIBXSMM) by anorder of magnitude on a wide range of GEMM shapes under 5 representativesparsity ratios (70%, 75%, 80%, 85%, 90%). Moreover, our SpMM kernel shows upto 5x speedup over dense GEMM kernel of oneDNN, a well-optimized dense librarywidely used in industry. We apply our sparse accelerator on widely-usedTransformer-based language models including Bert-Mini, DistilBERT, Bert-Base,and BERT-Large. Our sparse inference software shows up to 1.5x speedup overNeural Magic's Deepsparse under same configurations on Xeon on Amazon WebServices under proxy production latency constraints. We also compare oursolution with two framework-based inference solutions, ONNX Runtime andPyTorch, and demonstrate up to 37x speedup over ONNX Runtime and 345x overPyTorch on Xeon under the latency constraints. All the source code is publiclyavailable on Github: ", "output": "An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data augmentation is now an essential part of the image training process, asit effectively prevents overfitting and makes the model more robust againstnoisy datasets. Recent mixing augmentation strategies have advanced to generatethe mixup mask that can enrich the saliency information, which is a supervisorysignal. However, these methods incur a significant computational burden tooptimize the mixup mask. From this motivation, we propose a novelsaliency-aware mixup method, GuidedMixup, which aims to retain the salientregions in mixup images with low computational overhead. We develop anefficient pairing algorithm that pursues to minimize the conflict of salientregions of paired images and achieve rich saliency in mixup images. Moreover,GuidedMixup controls the mixup ratio for each pixel to better preserve thesalient region by interpolating two paired images smoothly. The experiments onseveral datasets demonstrate that GuidedMixup provides a good trade-off betweenaugmentation overhead and generalization performance on classificationdatasets. In addition, our method shows good performance in experiments withcorrupted or reduced datasets.", "output": "GuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Machine-learning models are known to be vulnerable to evasion attacks thatperturb model inputs to induce misclassifications. In this work, we identifyreal-world scenarios where the true threat cannot be assessed accurately byexisting attacks. Specifically, we find that conventional metrics measuringtargeted and untargeted robustness do not appropriately reflect a model'sability to withstand attacks from one set of source classes to another set oftarget classes. To address the shortcomings of existing methods, we formallydefine a new metric, termed group-based robustness, that complements existingmetrics and is better-suited for evaluating model performance in certain attackscenarios. We show empirically that group-based robustness allows us todistinguish between models' vulnerability against specific threat models insituations where traditional robustness metrics do not apply. Moreover, tomeasure group-based robustness efficiently and accurately, we 1) propose twoloss functions and 2) identify three new attack strategies. We show empiricallythat with comparable success rates, finding evasive samples using our new lossfunctions saves computation by a factor as large as the number of targetedclasses, and finding evasive samples using our new attack strategies saves timeby up to 99% compared to brute-force search methods. Finally, we propose adefense method that increases group-based robustness by up to 3.52$times$.", "output": "Group-based Robustness: A General Framework for Customized Robustness in the Real World."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Numerous applications in machine learning and data analytics can beformulated as equilibrium computation over Riemannian manifolds. Despite theextensive investigation of their Euclidean counterparts, the performance ofRiemannian gradient-based algorithms remain opaque and poorly understood. Werevisit the original scheme of Riemannian gradient descent (RGD) and analyze itunder a geodesic monotonicity assumption, which includes the well-studiedgeodesically convex-concave min-max optimization problem as a special case. Ourmain contribution is to show that, despite the phenomenon of distancedistortion, the RGD scheme, with a step size that is agnostic to the manifold'scurvature, achieves a curvature-independent and linear last-iterate convergencerate in the geodesically strongly monotone setting. To the best of ourknowledge, the possibility of curvature-independent rates and/or last-iterateconvergence in the Riemannian setting has not been considered before.", "output": "Curvature-Independent Last-Iterate Convergence for Games on Riemannian Manifolds."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "An emerging application of Raman spectroscopy is monitoring the state ofchemical reactors during biologic drug production. Raman shift intensitiesscale linearly with the concentrations of chemical species and thus can be usedto analytically determine real-time concentrations using non-destructive lightirradiation in a label-free manner. Chemometric algorithms are used tointerpret Raman spectra produced from complex mixtures of bioreactor contentsas a reaction evolves. Finding the optimal algorithm for a specific bioreactorenvironment is challenging due to the lack of freely available Raman mixturedatasets. The RaMix Python package addresses this challenge by enabling thegeneration of synthetic Raman mixture datasets with controllable noise levelsto assess the utility of different chemometric algorithm types for real-timemonitoring applications. To demonstrate the capabilities of this package andcompare the performance of different chemometric algorithms, 48 datasets ofsimulated spectra were generated using the RaMix Python package. The fourtested algorithms include partial least squares regression (PLS), a simpleneural network, a simple convolutional neural network (simple CNN), and a 1Dconvolutional neural network with a ResNet architecture (ResNet). Theperformance of the PLS and simple CNN model was found to be comparable, withthe PLS algorithm slightly outperforming the other models on 83% of the datasets. The simple CNN model outperforms the other models on large, high noisedatasets, demonstrating the superior capability of convolutional neuralnetworks compared to PLS in analyzing noisy spectra. These results demonstratethe promise of CNNs to automatically extract concentration information fromunprocessed, noisy spectra, allowing for better process control of industrialdrug production. Code for this project is available atgithub.com/DexterAntonio/RaMix.", "output": "Assessing the Performance of 1D-Convolution Neural Networks to Predict Concentration of Mixture Components from Raman Spectra."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce the first large-scale dataset, MNISQ, for both the Quantum andthe Classical Machine Learning community during the Noisy Intermediate-ScaleQuantum era. MNISQ consists of 4,950,000 data points organized in 9subdatasets. Building our dataset from the quantum encoding of classicalinformation (e.g., MNIST dataset), we deliver a dataset in a dual form: inquantum form, as circuits, and in classical form, as quantum circuitdescriptions (quantum programming language, QASM). In fact, also the MachineLearning research related to quantum computers undertakes a dual challenge:enhancing machine learning exploiting the power of quantum computers, whilealso leveraging state-of-the-art classical machine learning methodologies tohelp the advancement of quantum computing. Therefore, we perform circuitclassification on our dataset, tackling the task with both quantum andclassical models. In the quantum endeavor, we test our circuit dataset withQuantum Kernel methods, and we show excellent results up to $97%$ accuracy. Inthe classical world, the underlying quantum mechanical structures within thequantum circuit data are not trivial. Nevertheless, we test our dataset onthree classical models: Structured State Space sequence model (S4), Transformerand LSTM. In particular, the S4 model applied on the tokenized QASM sequencesreaches an impressive $77%$ accuracy. These findings illustrate that quantumcircuit-related datasets are likely to be quantum advantageous, but also thatstate-of-the-art machine learning methodologies can competently classify andrecognize quantum circuits. We finally entrust the quantum and classicalmachine learning community the fundamental challenge to build morequantum-classical datasets like ours and to build future benchmarks from ourexperiments. The dataset is accessible on GitHub and its circuits are easilyrun in qulacs or qiskit.", "output": "MNISQ: A Large-Scale Quantum Circuit Dataset for Machine Learning on/for Quantum Computers in the NISQ era."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the development of effective deepfake detection models in recentyears, several recent studies have demonstrated that biases in the trainingdata utilized to develop deepfake detection models can lead to unfairperformance for demographic groups of different races and/or genders. Such canresult in these groups being unfairly targeted or excluded from detection,allowing misclassified deepfakes to manipulate public opinion and erode trustin the model. While these studies have focused on identifying and evaluatingthe unfairness in deepfake detection, no methods have been developed to addressthe fairness issue of deepfake detection at the algorithm level. In this work,we make the first attempt to improve deepfake detection fairness by proposingnovel loss functions to train fair deepfake detection models in ways that areagnostic or aware of demographic factors. Extensive experiments on fourdeepfake datasets and five deepfake detectors demonstrate the effectiveness andflexibility of our approach in improving the deepfake detection fairness.", "output": "Improving Fairness in Deepfake Detection."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present the Chinese Elementary School Math Word Problems (CMATH) dataset,comprising 1.7k elementary school-level math word problems with detailedannotations, source from actual Chinese workbooks and exams. This dataset aimsto provide a benchmark tool for assessing the following question: to what gradelevel of elementary school math do the abilities of popular large languagemodels (LLMs) correspond? We evaluate a variety of popular LLMs, including bothcommercial and open-source options, and discover that only GPT-4 achievessuccess (accuracy $geq$ 60%) across all six elementary school grades, whileother models falter at different grade levels. Furthermore, we assess therobustness of several top-performing LLMs by augmenting the original problemsin the CMATH dataset with distracting information. Our findings reveal thatGPT-4 is able to maintains robustness, while other model fail. We anticipatethat our study will expose limitations in LLMs' arithmetic and reasoningcapabilities, and promote their ongoing development and advancement.", "output": "CMATH: Can Your Language Model Pass Chinese Elementary School Math Test?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We consider the problem of approximating a $d times d$ covariance matrix $M$with a rank-$k$ matrix under $(varepsilon,delta)$-differential privacy. Wepresent and analyze a complex variant of the Gaussian mechanism and show thatthe Frobenius norm of the difference between the matrix output by thismechanism and the best rank-$k$ approximation to $M$ is bounded by roughly$tilde{O}(sqrt{kd})$, whenever there is an appropriately large gap betweenthe $k$'th and the $k+1$'th eigenvalues of $M$. This improves on previous workthat requires that the gap between every pair of top-$k$ eigenvalues of $M$ isat least $sqrt{d}$ for a similar bound. Our analysis leverages the fact thatthe eigenvalues of complex matrix Brownian motion repel more than in the realcase, and uses Dyson's stochastic differential equations governing theevolution of its eigenvalues to show that the eigenvalues of the matrix $M$perturbed by complex Gaussian noise have large gaps with high probability. Ourresults contribute to the analysis of low-rank approximations underaverage-case perturbations and to an understanding of eigenvalue gaps forrandom matrices, which may be of independent interest.", "output": "Private Covariance Approximation and Eigenvalue-Gap Bounds for Complex Gaussian Perturbations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce NaturalInversion, a novel model inversion-based method tosynthesize images that agrees well with the original data distribution withoutusing real data. In NaturalInversion, we propose: (1) a Feature TransferPyramid which uses enhanced image prior of the original data by combining themulti-scale feature maps extracted from the pre-trained classifier, (2) aone-to-one approach generative model where only one batch of images aresynthesized by one generator to bring the non-linearity to optimization and toease the overall optimizing process, (3) learnable Adaptive Channel Scalingparameters which are end-to-end trained to scale the output image channel toutilize the original image prior further. With our NaturalInversion, wesynthesize images from classifiers trained on CIFAR-10/100 and show that ourimages are more consistent with original data distribution than prior works byvisualization and additional analysis. Furthermore, our synthesized imagesoutperform prior works on various applications such as knowledge distillationand pruning, demonstrating the effectiveness of our proposed method.", "output": "NaturalInversion: Data-Free Image Synthesis Improving Real-World Consistency."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Procedural Content Generation via Machine Learning (PCGML) faces asignificant hurdle that sets it apart from other fields, such as image or textgeneration, which is limited annotated data. Many existing methods forprocedural level generation via machine learning require a secondaryrepresentation besides level images. However, the current methods for obtainingsuch representations are laborious and time-consuming, which contributes tothis problem. In this work, we aim to address this problem by utilizinggameplay videos of two human-annotated games to develop a novel multi-tailframework that learns to perform simultaneous level translation and generation.The translation tail of our framework can convert gameplay video frames to anequivalent secondary representation, while its generation tail can producenovel level segments. Evaluation results and comparisons between our frameworkand baselines suggest that combining the level generation and translation taskscan lead to an overall improved performance regarding both tasks. Thisrepresents a possible solution to limited annotated level data, and wedemonstrate the potential for future versions to generalize to unseen games.", "output": "Joint Level Generation and Translation Using Gameplay Videos."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Game level blending via machine learning, the process of combining featuresof game levels to create unique and novel game levels using Procedural ContentGeneration via Machine Learning (PCGML) techniques, has gained increasingpopularity in recent years. However, many existing techniques rely onhuman-annotated level representations, which limits game level blending to alimited number of annotated games. Even with annotated games, researchers oftenneed to author an additional shared representation to make blending possible.In this paper, we present a novel approach to game level blending that employsClustering-based Tile Embeddings (CTE), a learned level representationtechnique that can serve as a level representation for unannotated games and aunified level representation across games without the need for humanannotation. CTE represents game level tiles as a continuous vectorrepresentation, unifying their visual, contextual, and behavioral information.We apply this approach to two classic Nintendo games, Lode Runner and TheLegend of Zelda. We run an evaluation comparing the CTE representation to acommon, human-annotated representation in the blending task and find that CTEhas comparable or better performance without the need for human annotation.", "output": "Game Level Blending using a Learned Level Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the increasing popularity and the increasing size of vision transformers(ViTs), there has been an increasing interest in making them more efficient andless computationally costly for deployment on edge devices with limitedcomputing resources. Binarization can be used to help reduce the size of ViTmodels and their computational cost significantly, using popcount operationswhen the weights and the activations are in binary. However, ViTs suffer alarger performance drop when directly applying convolutional neural network(CNN) binarization methods or existing binarization methods to binarize ViTscompared to CNNs on datasets with a large number of classes such asImageNet-1k. With extensive analysis, we find that binary vanilla ViTs such asDeiT miss out on a lot of key architectural properties that CNNs have thatallow binary CNNs to have much higher representational capability than binaryvanilla ViT. Therefore, we propose BinaryViT, in which inspired by the CNNarchitecture, we include operations from the CNN architecture into a pure ViTarchitecture to enrich the representational capability of a binary ViT withoutintroducing convolutions. These include an average pooling layer instead of atoken pooling layer, a block that contains multiple average pooling branches,an affine transformation right before the addition of each main residualconnection, and a pyramid structure. Experimental results on the ImageNet-1kdataset show the effectiveness of these operations that allow a binary pure ViTmodel to be competitive with previous state-of-the-art (SOTA) binary CNNmodels.", "output": "BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The ever-growing complexity of reinforcement learning (RL) tasks demands adistributed RL system to efficiently generate and process a massive amount ofdata to train intelligent agents. However, existing open-source librariessuffer from various limitations, which impede their practical use inchallenging scenarios where large-scale training is necessary. While industrialsystems from OpenAI and DeepMind have achieved successful large-scale RLtraining, their system architecture and implementation details remainundisclosed to the community. In this paper, we present a novel abstraction onthe dataflows of RL training, which unifies practical RL training acrossdiverse applications into a general framework and enables fine-grainedoptimizations. Following this abstraction, we develop a scalable, efficient,and extensible distributed RL system called ReaLly Scalable RL (SRL). Thesystem architecture of SRL separates major RL computation components and allowsmassively parallelized training. Moreover, SRL offers user-friendly andextensible interfaces for customized algorithms. Our evaluation shows that SRLoutperforms existing academic libraries in both a single machine and amedium-sized cluster. In a large-scale cluster, the novel architecture of SRLleads to up to 3.7x speedup compared to the design choices adopted by theexisting libraries. We also conduct a direct benchmark comparison to OpenAI'sindustrial system, Rapid, in the challenging hide-and-seek environment. SRLreproduces the same solution as reported by OpenAI with up to 5x speedup inwall-clock time. Furthermore, we also examine the performance of SRL in a muchharder variant of the hide-and-seek environment and achieve substantiallearning speedup by scaling SRL to over 15k CPU cores and 32 A100 GPUs.Notably, SRL is the first in the academic community to perform RL experimentsat such a large scale.", "output": "SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Implicit Neural Representation (INR) is an innovative approach forrepresenting complex shapes or objects without explicitly defining theirgeometry or surface structure. Instead, INR represents objects as continuousfunctions. Previous research has demonstrated the effectiveness of using neuralnetworks as INR for image compression, showcasing comparable performance totraditional methods such as JPEG. However, INR holds potential for variousapplications beyond image compression. This paper introduces Rapid-INR, a novelapproach that utilizes INR for encoding and compressing images, therebyaccelerating neural network training in computer vision tasks. Our methodologyinvolves storing the whole dataset directly in INR format on a GPU, mitigatingthe significant data communication overhead between the CPU and GPU duringtraining. Additionally, the decoding process from INR to RGB format is highlyparallelized and executed on-the-fly. To further enhance compression, wepropose iterative and dynamic pruning, as well as layer-wise quantization,building upon previous work. We evaluate our framework on the imageclassification task, utilizing the ResNet-18 backbone network and threecommonly used datasets with varying image sizes. Rapid-INR reduces memoryconsumption to only 5% of the original dataset size and achieves a maximum6$times$ speedup over the PyTorch training pipeline, as well as a maximum 1.2xspeedup over the DALI training pipeline, with only a marginal decrease inaccuracy. Importantly, Rapid-INR can be readily applied to other computervision tasks and backbone networks with reasonable engineering efforts. Ourimplementation code is publicly available at", "output": "Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamics models learned from visual observations have shown to be effectivein various robotic manipulation tasks. One of the key questions for learningsuch dynamics models is what scene representation to use. Prior works typicallyassume representation at a fixed dimension or resolution, which may beinefficient for simple tasks and ineffective for more complicated tasks. Inthis work, we investigate how to learn dynamic and adaptive representations atdifferent levels of abstraction to achieve the optimal trade-off betweenefficiency and effectiveness. Specifically, we construct dynamic-resolutionparticle representations of the environment and learn a unified dynamics modelusing graph neural networks (GNNs) that allows continuous selection of theabstraction level. During test time, the agent can adaptively determine theoptimal resolution at each model-predictive control (MPC) step. We evaluate ourmethod in object pile manipulation, a task we commonly encounter in cooking,agriculture, manufacturing, and pharmaceutical applications. Throughcomprehensive evaluations both in the simulation and the real world, we showthat our method achieves significantly better performance than state-of-the-artfixed-resolution baselines at the gathering, sorting, and redistribution ofgranular object piles made with various instances like coffee beans, almonds,corn, etc.", "output": "Dynamic-Resolution Model Learning for Object Pile Manipulation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Federated learning is an approach to collaboratively training machinelearning models for multiple parties that prohibit data sharing. One of thechallenges in federated learning is non-IID data between clients, as a singlemodel can not fit the data distribution for all clients. Meta-learning, such asPer-FedAvg, is introduced to cope with the challenge. Meta-learning learnsshared initial parameters for all clients. Each client employs gradient descentto adapt the initialization to local data distributions quickly to realizemodel personalization. However, due to non-convex loss function and randomnessof sampling update, meta-learning approaches have unstable goals in localadaptation for the same client. This fluctuation in different adaptationdirections hinders the convergence in meta-learning. To overcome thischallenge, we use the historical local adapted model to restrict the directionof the inner loop and propose an elastic-constrained method. As a result, thecurrent round inner loop keeps historical goals and adapts to better solutions.Experiments show our method boosts meta-learning convergence and improvespersonalization without additional calculation and communication. Our methodachieved SOTA on all metrics in three public datasets.", "output": "Elastically-Constrained Meta-Learner for Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study visual question answering in a setting where the answer has to bemined from a pool of relevant and irrelevant images given as a context. Forsuch a setting, a model must first retrieve relevant images from the pool andanswer the question from these retrieved images. We refer to this problem asretrieval-based visual question answering (or RETVQA in short). The RETVQA isdistinctively different and more challenging than the traditionally-studiedVisual Question Answering (VQA), where a given question has to be answered witha single relevant image in context. Towards solving the RETVQA task, we proposea unified Multi Image BART (MI-BART) that takes a question and retrieved imagesusing our relevance encoder for free-form fluent answer generation. Further, weintroduce the largest dataset in this space, namely RETVQA, which has thefollowing salient features: multi-image and retrieval requirement for VQA,metadata-independent questions over a pool of heterogeneous images, expecting amix of classification-oriented and open-ended generative answers. Our proposedframework achieves an accuracy of 76.5% and a fluency of 79.3% on the proposeddataset, namely RETVQA and also outperforms state-of-the-art methods by 4.9%and 11.8% on the image segment of the publicly available WebQA dataset on theaccuracy and fluency metrics, respectively.", "output": "Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Several recent studies have reported negative results when usingheteroskedastic neural regression models to model real-world data. Inparticular, for overparameterized models, the mean and variance networks arepowerful enough to either fit every single data point (while shrinking thepredicted variances to zero), or to learn a constant prediction with an outputvariance exactly matching every predicted residual (i.e., explaining thetargets as pure noise). This paper studies these difficulties from theperspective of statistical physics. We show that the observed instabilities arenot specific to any neural network architecture but are already present in afield theory of an overparameterized conditional Gaussian likelihood model.Under light assumptions, we derive a nonparametric free energy that can besolved numerically. The resulting solutions show excellent qualitativeagreement with empirical model fits on real-world data and, in particular,prove the existence of phase transitions, i.e., abrupt, qualitative differencesin the behaviors of the regressors upon varying the regularization strengths onthe two networks. Our work thus provides a theoretical explanation for thenecessity to carefully regularize heteroskedastic regression models. Moreover,the insights from our theory suggest a scheme for optimizing thisregularization which is quadratically more efficient than the naive approach.", "output": "Understanding Pathologies of Deep Heteroskedastic Regression."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The vulnerability of deep neural network models to adversarial exampleattacks is a practical challenge in many artificial intelligence applications.A recent line of work shows that the use of randomization in adversarialtraining is the key to find optimal strategies against adversarial exampleattacks. However, in a fully randomized setting where both the defender and theattacker can use randomized strategies, there are no efficient algorithm forfinding such an optimal strategy. To fill the gap, we propose the firstalgorithm of its kind, called FRAT, which models the problem with a newinfinite-dimensional continuous-time flow on probability distribution spaces.FRAT maintains a lightweight mixture of models for the defender, withflexibility to efficiently update mixing weights and model parameters at eachiteration. Furthermore, FRAT utilizes lightweight sampling subroutines toconstruct a random strategy for the attacker. We prove that the continuous-timelimit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed bya defender and an attacker. Experimental results also demonstrate theefficiency of FRAT on CIFAR-10 and CIFAR-100 datasets.", "output": "Towards Optimal Randomized Strategies in Adversarial Example Game."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A major challenge to deploying robots widely is navigation in human-populatedenvironments, commonly referred to as social robot navigation. While the fieldof social navigation has advanced tremendously in recent years, the fairevaluation of algorithms that tackle social navigation remains hard because itinvolves not just robotic agents moving in static environments but also dynamichuman agents and their perceptions of the appropriateness of robot behavior. Incontrast, clear, repeatable, and accessible benchmarks have acceleratedprogress in fields like computer vision, natural language processing andtraditional robot navigation by enabling researchers to fairly comparealgorithms, revealing limitations of existing solutions and illuminatingpromising new directions. We believe the same approach can benefit socialnavigation. In this paper, we pave the road towards common, widely accessible,and repeatable benchmarking criteria to evaluate social robot navigation. Ourcontributions include (a) a definition of a socially navigating robot as onethat respects the principles of safety, comfort, legibility, politeness, socialcompetency, agent understanding, proactivity, and responsiveness to context,(b) guidelines for the use of metrics, development of scenarios, benchmarks,datasets, and simulators to evaluate social navigation, and (c) a design of asocial navigation metrics framework to make it easier to compare results fromdifferent simulators, robots and datasets.", "output": "Principles and Guidelines for Evaluating Social Robot Navigation Algorithms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We propose a novel value approximation method, namely EigensubspaceRegularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivatedby an analysis of the dynamics of Q-value approximation error in theTemporal-Difference (TD) method, which follows a path defined by the1-eigensubspace of the transition kernel associated with the Markov DecisionProcess (MDP). It reveals a fundamental property of TD learning that hasremained unused in previous deep RL approaches. In ERC, we propose aregularizer that guides the approximation error tending towards the1-eigensubspace, resulting in a more efficient and stable path of valueapproximation. Moreover, we theoretically prove the convergence of the ERCmethod. Besides, theoretical analysis and experiments demonstrate that ERCeffectively reduces the variance of value functions. Among 26 tasks in theDMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides,it shows significant advantages in Q-value approximation and variancereduction. Our code is available at ", "output": "Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present working notes on transfer learning with semi-supervised datasetannotation for the BirdCLEF 2023 competition, focused on identifying Africanbird species in recorded soundscapes. Our approach utilizes existingoff-the-shelf models, BirdNET and MixIT, to address representation and labelingchallenges in the competition. We explore the embedding space learned byBirdNET and propose a process to derive an annotated dataset for supervisedlearning. Our experiments involve various models and feature engineeringapproaches to maximize performance on the competition leaderboard. The resultsdemonstrate the effectiveness of our approach in classifying bird species andhighlight the potential of transfer learning and semi-supervised datasetannotation in similar tasks.", "output": "Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recently, Ye et al. (Mathematical Programming 2023) designed an algorithm forsolving a specific class of bilevel programs with an emphasis on applicationsrelated to hyperparameter selection, utilizing the difference of convexalgorithm based on the value function approach reformulation. The proposedalgorithm is particularly powerful when the lower level problem is fully convex, such as a support vector machine model or a least absolute shrinkage andselection operator model. In this paper, to suit more applications related tomachine learning and statistics, we substantially weaken the underlyingassumption from lower level full convexity to weak convexity. Accordingly, wepropose a new reformulation using Moreau envelope of the lower level problemand demonstrate that this reformulation is a difference of weakly convexprogram. Subsequently, we develop a sequentially convergent algorithm forsolving this difference of weakly convex program. To evaluate the effectivenessof our approach, we conduct numerical experiments on the bilevel hyperparameterselection problem from elastic net, sparse group lasso, and RBF kernel supportvector machine models.", "output": "Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Today's performance analysis frameworks for deep learning accelerators sufferfrom two significant limitations. First, although modern convolutional neuralnetwork (CNNs) consist of many types of layers other than convolution,especially during training, these frameworks largely focus on convolutionlayers only. Second, these frameworks are generally targeted towards inference,and lack support for training operations. This work proposes a novelperformance analysis framework, SimDIT, for general ASIC-based systolichardware accelerator platforms. The modeling effort of SimDIT comprehensivelycovers convolution and non-convolution operations of both CNN inference andtraining on a highly parameterizable hardware substrate. SimDIT is integratedwith a backend silicon implementation flow and provides detailed end-to-endperformance statistics (i.e., data access cost, cycle counts, energy, andpower) for executing CNN inference and training workloads. SimDIT-enabledperformance analysis reveals that on a 64X64 processing array, non-convolutionoperations constitute 59.5% of total runtime for ResNet-50 training workload.In addition, by optimally distributing available off-chip DRAM bandwidth andon-chip SRAM resources, SimDIT achieves 18X performance improvement over ageneric static resource allocation for ResNet-50 inference.", "output": "Performance Analysis of DNN Inference/Training with Convolution and non-Convolution Operations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The understanding of complex human interactions and group activities hasgarnered attention in human-centric computer vision. However, the advancementof the related tasks is hindered due to the difficulty of obtaining large-scalelabeled real-world datasets. To mitigate the issue, we propose M3Act, amulti-view multi-group multi-person human atomic action and group activity datagenerator. Powered by the Unity engine, M3Act contains simulation-ready 3Dscenes and human assets, configurable lighting and camera systems, highlyparameterized modular group activities, and a large degree of domainrandomization during the data generation process. Our data generator is capableof generating large-scale datasets of human activities with multipleviewpoints, modalities (RGB images, 2D poses, 3D motions), and high-qualityannotations for individual persons and multi-person groups (2D bounding boxes,instance segmentation masks, individual actions and group activity categories).Using M3Act, we perform synthetic data pre-training for 2D skeleton-based groupactivity recognition and RGB-based multi-person pose tracking. The resultsindicate that learning from our synthetic datasets largely improves the modelperformances on real-world datasets, with the highest gain of 5.59% and 7.32%respectively in group and person recognition accuracy on CAD2, as well as animprovement of 6.63 in MOTP on HiEve. Pre-training with our synthetic data alsoleads to faster model convergence on downstream tasks (up to 6.8% faster).Moreover, M3Act opens new research problems for 3D group activity generation.We release M3Act3D, an 87.6-hour 3D motion dataset of human activities withlarger group sizes and higher complexity of inter-person interactions thanprevious multi-person datasets. We define multiple metrics and propose acompetitive baseline for the novel task.", "output": "Learning from Synthetic Human Group Activities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Molecular property is usually observed with a limited number of samples, andresearchers have considered property prediction as a few-shot problem. Oneimportant fact that has been ignored by prior works is that each molecule canbe recorded with several different properties simultaneously. To effectivelyutilize many-to-many correlations of molecules and properties, we propose aGraph Sampling-based Meta-learning (GS-Meta) framework for few-shot molecularproperty prediction. First, we construct a Molecule-Property relation Graph(MPG): molecule and properties are nodes, while property labels decide edges.Then, to utilize the topological information of MPG, we reformulate an episodein meta-learning as a subgraph of the MPG, containing a target property node,molecule nodes, and auxiliary property nodes. Third, as episodes in the form ofsubgraphs are no longer independent of each other, we propose to schedule thesubgraph sampling process with a contrastive loss function, which considers theconsistency and discrimination of subgraphs. Extensive experiments on 5commonly-used benchmarks show GS-Meta consistently outperforms state-of-the-artmethods by 5.71%-6.93% in ROC-AUC and verify the effectiveness of each proposedmodule. Our code is available at ", "output": "Graph Sampling-based Meta-Learning for Molecular Property Prediction."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks can be significantly compressed by pruning, leading to sparsemodels requiring considerably less storage and floating-point operations whilemaintaining predictive performance. Model soups (Wortsman et al., 2022) improvegeneralization and out-of-distribution performance by averaging the parametersof multiple models into a single one without increased inference time. However,identifying models in the same loss basin to leverage both sparsity andparameter averaging is challenging, as averaging arbitrary sparse modelsreduces the overall sparsity due to differing sparse connectivities. In thiswork, we address these challenges by demonstrating that exploring a singleretraining phase of Iterative Magnitude Pruning (IMP) with varyinghyperparameter configurations, such as batch ordering or weight decay, producesmodels that are suitable for averaging and share the same sparse connectivityby design. Averaging these models significantly enhances generalizationperformance compared to their individual components. Building on this idea, weintroduce Sparse Model Soups (SMS), a novel method for merging sparse models byinitiating each prune-retrain cycle with the averaged model of the previousphase. SMS maintains sparsity, exploits sparse network benefits being modularand fully parallelizable, and substantially improves IMP's performance.Additionally, we demonstrate that SMS can be adapted to enhance the performanceof state-of-the-art pruning during training approaches.", "output": "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "To make reinforcement learning more sample efficient, we need better creditassignment methods that measure an action's influence on future rewards.Building upon Hindsight Credit Assignment (HCA), we introduce CounterfactualContribution Analysis (COCOA), a new family of model-based credit assignmentalgorithms. Our algorithms achieve precise credit assignment by measuring thecontribution of actions upon obtaining subsequent rewards, by quantifying acounterfactual query: \"Would the agent still have reached this reward if it hadtaken another action?\". We show that measuring contributions w.r.t. rewardingstates, as is done in HCA, results in spurious estimates of contributions,causing HCA to degrade towards the high-variance REINFORCE estimator in manyrelevant environments. Instead, we measure contributions w.r.t. rewards orlearned representations of the rewarding objects, resulting in gradientestimates with lower variance. We run experiments on a suite of problemsspecifically designed to evaluate long-term credit assignment capabilities. Byusing dynamic programming, we measure ground-truth policy gradients and showthat the improved performance of our new model-based credit assignment methodsis due to lower bias and variance compared to HCA and common baselines. Ourresults demonstrate how modeling action contributions towards rewardingoutcomes can be leveraged for credit assignment, opening a new path towardssample-efficient reinforcement learning.", "output": "Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Perceptually Aligned Gradients (PAG) refer to an intriguing property observedin robust image classification models, wherein their input gradients align withhuman perception and pose semantic meanings. While this phenomenon has gainedsignificant research attention, it was solely studied in the context ofunimodal vision-only architectures. In this work, we extend the study of PAG toVision-Language architectures, which form the foundations for diverseimage-text tasks and applications. Through an adversarial robustificationfinetuning of CLIP, we demonstrate that robust Vision-Language models exhibitPAG in contrast to their vanilla counterparts. This work reveals the merits ofCLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, weshow that seamlessly integrating CLIPAG in a \"plug-n-play\" manner leads tosubstantial improvements in vision-language generative applications.Furthermore, leveraging its PAG property, CLIPAG enables text-to-imagegeneration without any generative model, which typically requires hugegenerators.", "output": "CLIPAG: Towards Generator-Free Text-to-Image Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Neural networks are very effective when trained on large datasets for a largenumber of iterations. However, when they are trained on non-stationary streamsof data and in an online fashion, their performance is reduced (1) by theonline setup, which limits the availability of data, (2) due to catastrophicforgetting because of the non-stationary nature of the data. Furthermore,several recent works (Caccia et al., 2022; Lange et al., 2023)arXiv:2205.1345(2) showed that replay methods used in continual learning sufferfrom the stability gap, encountered when evaluating the model continually(rather than only on task boundaries). In this article, we study the effect ofmodel ensembling as a way to improve performance and stability in onlinecontinual learning. We notice that naively ensembling models coming from avariety of training tasks increases the performance in online continuallearning considerably. Starting from this observation, and drawing inspirationsfrom semi-supervised learning ensembling methods, we use a lightweight temporalensemble that computes the exponential moving average of the weights (EMA) attest time, and show that it can drastically increase the performance andstability when used in combination with several methods from the literature.", "output": "Improving Online Continual Learning Performance and Stability with Temporal Ensembles."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inpatient length of stay (LoS) is an important managerial metric which ifknown in advance can be used to efficiently plan admissions, allocate resourcesand improve care. Using historical patient data and machine learningtechniques, LoS prediction models can be developed. Ethically, these models cannot be used for patient discharge in lieu of unit heads but are of utmostnecessity for hospital management systems in charge of effective hospitalplanning. Therefore, the design of the prediction system should be adapted towork in a true hospital setting. In this study, we predict early hospital LoSat the granular level of admission units by applying domain adaptation toleverage information learned from a potential source domain. Time-varying datafrom 110,079 and 60,492 patient stays to 8 and 9 intensive care units wererespectively extracted from eICU-CRD and MIMIC-IV. These were fed into aLong-Short Term Memory and a Fully connected network to train a source domainmodel, the weights of which were transferred either partially or fully toinitiate training in target domains. Shapley Additive exPlanations (SHAP)algorithms were used to study the effect of weight transfer on modelexplanability. Compared to the benchmark, the proposed weight transfer modelshowed statistically significant gains in prediction accuracy (between 1% and5%) as well as computation time (up to 2hrs) for some target domains. Theproposed method thus provides an adapted clinical decision support system forhospital management that can ease processes of data access via ethicalcommittee, computation infrastructures and time.", "output": "Length of Stay prediction for Hospital Management using Domain Adaptation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Over recent years, denoising diffusion generative models have come to beconsidered as state-of-the-art methods for synthetic data generation,especially in the case of generating images. These approaches have also provedsuccessful in other applications such as tabular and graph data generation.However, due to computational complexity, to this date, the application ofthese techniques to graph data has been restricted to small graphs, such asthose used in molecular modeling. In this paper, we propose SaGess, a discretedenoising diffusion approach, which is able to generate large real-worldnetworks by augmenting a diffusion model (DiGress) with a generalizeddivide-and-conquer framework. The algorithm is capable of generating largergraphs by sampling a covering of subgraphs of the initial graph in order totrain DiGress. SaGess then constructs a synthetic graph using the subgraphsthat have been generated by DiGress. We evaluate the quality of the syntheticdata sets against several competitor methods by comparing graph statisticsbetween the original and synthetic samples, as well as evaluating the utilityof the synthetic data set produced by using it to train a task-driven model,namely link prediction. In our experiments, SaGess, outperforms most of theone-shot state-of-the-art graph generating methods by a significant factor,both on the graph metrics and on the link prediction task.", "output": "SaGess: Sampling Graph Denoising Diffusion Model for Scalable Graph Generation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a probability distribution, combined with an efficient samplingalgorithm, for weights and biases of fully-connected neural networks. In asupervised learning context, no iterative optimization or gradient computationsof internal network parameters are needed to obtain a trained network. Thesampling is based on the idea of random feature models. However, instead of adata-agnostic distribution, e.g., a normal distribution, we use both the inputand the output training data of the supervised learning problem to sample bothshallow and deep networks. We prove that the sampled networks we construct areuniversal approximators. We also show that our sampling scheme is invariant torigid body transformations and scaling of the input data. This implies manypopular pre-processing techniques are no longer required. For Barron functions,we show that the $L^2$-approximation error of sampled shallow networksdecreases with the square root of the number of neurons. In numericalexperiments, we demonstrate that sampled networks achieve comparable accuracyas iteratively trained ones, but can be constructed orders of magnitude faster.Our test cases involve a classification benchmark from OpenML, sampling ofneural operators to represent maps in function spaces, and transfer learningusing well-known architectures.", "output": "Sampling weights of deep neural networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Kernel ridge regression, KRR, is a non-linear generalization of linear ridgeregression. Here, we introduce an equivalent formulation of the objectivefunction of KRR, opening up both for using other penalties than the ridgepenalty and for studying kernel ridge regression from the perspective ofgradient descent. Using a continuous-time perspective, we derive a closed-formsolution, kernel gradient flow, KGF, with regularization through earlystopping, which allows us to theoretically bound the differences between KGFand KRR. We generalize KRR by replacing the ridge penalty with the $ell_1$ and$ell_infty$ penalties and utilize the fact that analogously to thesimilarities between KGF and KRR, the solutions obtained when using thesepenalties are very similar to those obtained from forward stagewise regression(also known as coordinate descent) and sign gradient descent in combinationwith early stopping. Thus the need for computationally heavy proximal gradientdescent algorithms can be alleviated. We show theoretically and empirically howthese penalties, and corresponding gradient-based optimization algorithms,produce signal-driven and robust regression solutions, respectively. We alsoinvestigate kernel gradient descent where the kernel is allowed to changeduring training, and theoretically address the effects this has ongeneralization. Based on our findings, we propose an update scheme for thebandwidth of translational-invariant kernels, where we let the bandwidthdecrease to zero during training, thus circumventing the need forhyper-parameter selection. We demonstrate on real and synthetic data howdecreasing the bandwidth during training outperforms using a constantbandwidth, selected by cross-validation and marginal likelihood maximization.We also show that using a decreasing bandwidth, we are able to achieve bothzero training error and a double descent behavior.", "output": "Solving Kernel Ridge Regression with Gradient-Based Optimization Methods."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The development of very large-scale integration (VLSI) technology has posednew challenges for electronic design automation (EDA) techniques in chipfloorplanning. During this process, macro placement is an important subproblem,which tries to determine the positions of all macros with the aim of minimizinghalf-perimeter wirelength (HPWL) and avoiding overlapping. Previous methodsinclude packing-based, analytical and reinforcement learning methods. In thispaper, we propose a new black-box optimization (BBO) framework (calledWireMask-BBO) for macro placement, by using a wire-mask-guided greedy procedurefor objective evaluation. Equipped with different BBO algorithms, WireMask-BBOempirically achieves significant improvements over previous methods, i.e.,achieves significantly shorter HPWL by using much less time. Furthermore, itcan fine-tune existing placements by treating them as initial solutions, whichcan bring up to 50% improvement in HPWL. WireMask-BBO has the potential tosignificantly improve the quality and efficiency of chip floorplanning, whichmakes it appealing to researchers and practitioners in EDA and will alsopromote the application of BBO.", "output": "Macro Placement by Wire-Mask-Guided Black-Box Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We examine the assumption that the hidden-state vectors of recurrent neuralnetworks (RNNs) tend to form clusters of semantically similar vectors, which wedub the clustering hypothesis. While this hypothesis has been assumed in theanalysis of RNNs in recent years, its validity has not been studied thoroughlyon modern neural network architectures. We examine the clustering hypothesis inthe context of RNNs that were trained to recognize regular languages. Thisenables us to draw on perfect ground-truth automata in our evaluation, againstwhich we can compare the RNN's accuracy and the distribution of thehidden-state vectors.We start with examining the (piecewise linear) separability of an RNN'shidden-state vectors into semantically different classes. We continue theanalysis by computing clusters over the hidden-state vector space with multiplestate-of-the-art unsupervised clustering approaches. We formally analyze theaccuracy of computed clustering functions and the validity of the clusteringhypothesis by determining whether clusters group semantically similar vectorsto the same state in the ground-truth model.Our evaluation supports the validity of the clustering hypothesis in themajority of examined cases. We observed that the hidden-state vectors ofwell-trained RNNs are separable, and that the unsupervised clusteringtechniques succeed in finding clusters of similar state vectors.", "output": "On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present ArrayBot, a distributed manipulation system consisting of a $16times 16$ array of vertically sliding pillars integrated with tactile sensors,which can simultaneously support, perceive, and manipulate the tabletopobjects. Towards generalizable distributed manipulation, we leveragereinforcement learning (RL) algorithms for the automatic discovery of controlpolicies. In the face of the massively redundant actions, we propose to reshapethe action space by considering the spatially local action patch and thelow-frequency actions in the frequency domain. With this reshaped action space,we train RL agents that can relocate diverse objects through tactileobservations only. Surprisingly, we find that the discovered policy can notonly generalize to unseen object shapes in the simulator but also transfer tothe physical robot without any domain randomization. Leveraging the deployedpolicy, we present abundant real-world manipulation tasks, illustrating thevast potential of RL on ArrayBot for distributed manipulation.", "output": "ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep neural networks (DNNs) have become ubiquitous in machine learning, buttheir energy consumption remains a notable issue. Lowering the supply voltageis an effective strategy for reducing energy consumption. However, aggressivelyscaling down the supply voltage can lead to accuracy degradation due to randombit flips in static random access memory (SRAM) where model parameters arestored. To address this challenge, we introduce NeuralFuse, a novel add-onmodule that addresses the accuracy-energy tradeoff in low-voltage regimes bylearning input transformations to generate error-resistant datarepresentations. NeuralFuse protects DNN accuracy in both nominal andlow-voltage scenarios. Moreover, NeuralFuse is easy to implement and can bereadily applied to DNNs with limited access, such as non-configurable hardwareor remote access to cloud-based APIs. Experimental results demonstrate that, ata 1% bit error rate, NeuralFuse can reduce SRAM memory access energy by up to24% while improving accuracy by up to 57%. To the best of our knowledge, thisis the first model-agnostic approach (i.e., no model retraining) to addresslow-voltage-induced bit errors. The source code is available at", "output": "NeuralFuse: Learning to Improve the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of two-stage few-shot classification methods, in theepisodic meta-training stage, the model suffers severe overfitting. Wehypothesize that it is caused by over-discrimination, i.e., the model learns toover-rely on the superficial features that fit for base class discriminationwhile suppressing the novel class generalization. To penalizeover-discrimination, we introduce knowledge distillation techniques to keepnovel generalization knowledge from the teacher model during training.Specifically, we select the teacher model as the one with the best validationaccuracy during meta-training and restrict the symmetric Kullback-Leibler (SKL)divergence between the output distribution of the linear classifier of theteacher model and that of the student model. This simple approach outperformsthe standard meta-training process. We further propose the Nearest NeighborSymmetric Kullback-Leibler (NNSKL) divergence for meta-training to push thelimits of knowledge distillation techniques. NNSKL takes few-shot tasks asinput and penalizes the output of the nearest neighbor classifier, whichpossesses an impact on the relationships between query embedding and supportcenters. By combining SKL and NNSKL in meta-training, the model achieves evenbetter performance and surpasses state-of-the-art results on severalbenchmarks.", "output": "Understanding the Overfitting of the Episodic Meta-training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Purpose: The development of machine learning models for surgical workflow andinstrument recognition from temporal data represents a challenging task due tothe complex nature of surgical workflows. In particular, the imbalanceddistribution of data is one of the major challenges in the domain of surgicalworkflow recognition. In order to obtain meaningful results, carefulpartitioning of data into training, validation, and test sets, as well as theselection of suitable evaluation metrics are crucial. Methods: In this work, wepresent an openly available web-based application that enables interactiveexploration of dataset partitions. The proposed visual framework facilitatesthe assessment of dataset splits for surgical workflow recognition, especiallywith regard to identifying sub-optimal dataset splits. Currently, it supportsvisualization of surgical phase and instrument annotations. Results: In orderto validate the dedicated interactive visualizations, we use a dataset split ofthe Cholec80 dataset. This dataset split was specifically selected to reflect acase of strong data imbalance. Using our software, we were able to identifyphases, phase transitions, and combinations of surgical instruments that werenot represented in one of the sets. Conclusion: In order to obtain meaningfulresults in highly unbalanced class distributions, special care should be takenwith respect to the selection of an appropriate split. Interactive datavisualization represents a promising approach for the assessment of machinelearning datasets. The source code is available at", "output": "Surgical Phase and Instrument Recognition: How to identify appropriate Dataset Splits."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Policy-Space Response Oracles (PSRO) is an influential algorithm frameworkfor approximating a Nash Equilibrium (NE) in multi-agent non-transitive games.Many previous studies have been trying to promote policy diversity in PSRO. Amajor weakness in existing diversity metrics is that a more diverse (accordingto their diversity metrics) population does not necessarily mean (as we provedin the paper) a better approximation to a NE. To alleviate this problem, wepropose a new diversity metric, the improvement of which guarantees a betterapproximation to a NE. Meanwhile, we develop a practical and well-justifiedmethod to optimize our diversity metric using only state-action samples. Byincorporating our diversity regularization into the best response solving inPSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). Wepresent the convergence property of PSD-PSRO. Empirically, extensiveexperiments on various games demonstrate that PSD-PSRO is more effective inproducing significantly less exploitable policies than state-of-the-art PSROvariants.", "output": "Policy Space Diversity for Non-Transitive Games."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Feature transformation aims to reconstruct an effective representation spaceby mathematically refining the existing features. It serves as a pivotalapproach to combat the curse of dimensionality, enhance model generalization,mitigate data sparsity, and extend the applicability of classical models.Existing research predominantly focuses on domain knowledge-based featureengineering or learning latent representations. However, these methods, whileinsightful, lack full automation and fail to yield a traceable and optimalrepresentation space. An indispensable question arises: Can we concurrentlyaddress these limitations when reconstructing a feature space for amachine-learning task? Our initial work took a pioneering step towards thischallenge by introducing a novel self-optimizing framework. This frameworkleverages the power of three cascading reinforced agents to automaticallyselect candidate features and operations for generating improved featuretransformation combinations. Despite the impressive strides made, there wasroom for enhancing its effectiveness and generalization capability. In thisextended journal version, we advance our initial work from two distinct yetinterconnected perspectives: 1) We propose a refinement of the originalframework, which integrates a graph-based state representation method tocapture the feature interactions more effectively and develop differentQ-learning strategies to alleviate Q-value overestimation further. 2) Weutilize a new optimization technique (actor-critic) to train the entireself-optimizing framework in order to accelerate the model convergence andimprove the feature transformation performance. Finally, to validate theimproved effectiveness and generalization capability of our framework, weperform extensive experiments and conduct comprehensive analyses.", "output": "Traceable Group-Wise Self-Optimizing Feature Transformation Learning: A Dual Optimization Perspective."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Numerical data imputation algorithms replace missing values by estimates toleverage incomplete data sets. Current imputation methods seek to minimize theerror between the unobserved ground truth and the imputed values. But thisstrategy can create artifacts leading to poor imputation in the presence ofmultimodal or complex distributions. To tackle this problem, we introduce the$k$NN$times$KDE algorithm: a data imputation method combining nearest neighborestimation ($k$NN) and density estimation with Gaussian kernels (KDE). Wecompare our method with previous data imputation methods using artificial andreal-world data with different data missing scenarios and various data missingrates, and show that our method can cope with complex original data structure,yields lower data imputation errors, and provides probabilistic estimates withhigher likelihood than current methods. We release the code in open-source forthe community: ", "output": "Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Optimizing a machine learning pipeline for a task at hand requires carefulconfiguration of various hyperparameters, typically supported by an AutoMLsystem that optimizes the hyperparameters for the given training dataset. Yet,depending on the AutoML system's own second-order meta-configuration, theperformance of the AutoML process can vary significantly. Current AutoMLsystems cannot automatically adapt their own configuration to a specific usecase. Further, they cannot compile user-defined application constraints on theeffectiveness and efficiency of the pipeline and its generation. In this paper,we propose Caml, which uses meta-learning to automatically adapt its own AutoMLparameters, such as the search strategy, the validation strategy, and thesearch space, for a task at hand. The dynamic AutoML strategy of Caml takesuser-defined constraints into account and obtains constraint-satisfyingpipelines with high predictive performance.", "output": "AutoML in Heavily Constrained Applications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce ordered transfer hyperparameter optimisation (OTHPO), a versionof transfer learning for hyperparameter optimisation (HPO) where the tasksfollow a sequential order. Unlike for state-of-the-art transfer HPO, theassumption is that each task is most correlated to those immediately before it.This matches many deployed settings, where hyperparameters are retuned as moredata is collected; for instance tuning a sequence of movie recommendationsystems as more movies and ratings are added. We propose a formal definition,outline the differences to related problems and propose a basic OTHPO methodthat outperforms state-of-the-art transfer HPO. We empirically show theimportance of taking order into account using ten benchmarks. The benchmarksare in the setting of gradually accumulating data, and span XGBoost, randomforest, approximate k-nearest neighbor, elastic net, support vector machinesand a separate real-world motivated optimisation problem. We open source thebenchmarks to foster future research on ordered transfer HPO.", "output": "Obeying the Order: Introducing Ordered Transfer Hyperparameter Optimisation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Estimating camera motion in deformable scenes poses a complex and openresearch challenge. Most existing non-rigid structure from motion techniquesassume to observe also static scene parts besides deforming scene parts inorder to establish an anchoring reference. However, this assumption does nothold true in certain relevant application cases such as endoscopies. Deformableodometry and SLAM pipelines, which tackle the most challenging scenario ofexploratory trajectories, suffer from a lack of robustness and properquantitative evaluation methodologies. To tackle this issue with a commonbenchmark, we introduce the Drunkard's Dataset, a challenging collection ofsynthetic data targeting visual navigation and reconstruction in deformableenvironments. This dataset is the first large set of exploratory cameratrajectories with ground truth inside 3D scenes where every surface exhibitsnon-rigid deformations over time. Simulations in realistic 3D buildings lets usobtain a vast amount of data and ground truth labels, including camera poses,RGB images and depth, optical flow and normal maps at high resolution andquality. We further present a novel deformable odometry method, dubbed theDrunkard's Odometry, which decomposes optical flow estimates into rigid-bodycamera motion and non-rigid scene deformations. In order to validate our data,our work contains an evaluation of several baselines as well as a noveltracking error metric which does not require ground truth data. Dataset andcode: ", "output": "The Drunkard's Odometry: Estimating Camera Motion in Deforming Scenes."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Experimental results have shown that curriculum learning, i.e., presentingsimpler examples before more complex ones, can improve the efficiency oflearning. Some recent theoretical results also showed that changing thesampling distribution can help neural networks learn parities, with formalresults only for large learning rates and one-step arguments. Here we show aseparation result in the number of training steps with standard (bounded)learning rates on a common sample distribution: if the data distribution is amixture of sparse and dense inputs, there exists a regime in which a 2-layerReLU neural network trained by a curriculum noisy-GD (or SGD) algorithm thatuses sparse examples first, can learn parities of sufficiently large degree,while any fully connected neural network of possibly larger width or depthtrained by noisy-GD on the unordered samples cannot learn without additionalsteps. We also provide experimental results supporting the qualitativeseparation beyond the specific regime of the theoretical results.", "output": "Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Traditional large-scale neuroscience models and machine learning utilizesimplified models of individual neurons, relying on collective activity andproperly adjusted connections to perform complex computations. However, eachbiological cortical neuron is inherently a sophisticated computational device,as corroborated in a recent study where it took a deep artificial neuralnetwork with millions of parameters to replicate the input-output relationshipof a detailed biophysical model of a cortical pyramidal neuron. We question thenecessity for these many parameters and introduce the Expressive Leaky Memory(ELM) neuron, a biologically inspired, computationally expressive, yetefficient model of a cortical neuron. Remarkably, our ELM neuron requires only8K trainable parameters to match the aforementioned input-output relationshipaccurately. We find that an accurate model necessitates multiple memory-likehidden states and intricate nonlinear synaptic integration. To assess thecomputational ramifications of this design, we evaluate the ELM neuron onvarious tasks with demanding temporal structures, including a sequentialversion of the CIFAR-10 classification task, the challenging Pathfinder-X task,and a new dataset based on the Spiking Heidelberg Digits dataset. Our ELMneuron outperforms most transformer-based models on the Pathfinder-X task with77% accuracy, demonstrates competitive performance on Sequential CIFAR-10, andsuperior performance compared to classic LSTM models on the variant of theSpiking Heidelberg Digits dataset. These findings indicate a potential forbiologically motivated, computationally efficient neuronal models to enhanceperformance in challenging machine learning tasks.", "output": "The ELM Neuron: an Efficient and Expressive Cortical Neuron Model Can Solve Long-Horizon Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce a novel approach to boost the efficiency of the importancenested sampling (INS) technique for Bayesian posterior and evidence estimationusing deep learning. Unlike rejection-based sampling methods such as vanillanested sampling (NS) or Markov chain Monte Carlo (MCMC) algorithms, importancesampling techniques can use all likelihood evaluations for posterior andevidence estimation. However, for efficient importance sampling, one needsproposal distributions that closely mimic the posterior distributions. We showhow to combine INS with deep learning via neural network regression toaccomplish this task. We also introduce NAUTILUS, a reference open-sourcePython implementation of this technique for Bayesian posterior and evidenceestimation. We compare NAUTILUS against popular NS and MCMC packages, includingEMCEE, DYNESTY, ULTRANEST and POCOMC, on a variety of challenging syntheticproblems and real-world applications in exoplanet detection, galaxy SED fittingand cosmology. In all applications, the sampling efficiency of NAUTILUS issubstantially higher than that of all other samplers, often by more than anorder of magnitude. Simultaneously, NAUTILUS delivers highly accurate resultsand needs fewer likelihood evaluations than all other samplers tested. We alsoshow that NAUTILUS has good scaling with the dimensionality of the likelihoodand is easily parallelizable to many CPUs.", "output": "NAUTILUS: boosting Bayesian importance nested sampling with deep learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributed deep learning (DDL) is a promising research area, which aims toincrease the efficiency of training deep learning tasks with large size ofdatasets and models. As the computation capability of DDL nodes continues toincrease, the network connection between nodes is becoming a major bottleneck.Various methods of gradient compression and improved model synchronization havebeen proposed to address this bottleneck in Parameter-Server-based DDL.However, these two types of methods can result in accuracy loss due todiscarded gradients and have limited enhancement on the throughput of modelsynchronization, respectively. To address these challenges, we propose a newmodel synchronization method named Overlapped Synchronization Parallel (OSP),which achieves efficient communication with a 2-stage synchronization approachand uses Local-Gradient-based Parameter correction (LGP) to avoid accuracy losscaused by stale parameters. The prototype of OSP has been implemented usingPyTorch and evaluated on commonly used deep learning models and datasets with a9-node testbed. Evaluation results show that OSP can achieve up to 50%improvement in throughput without accuracy loss compared to popularsynchronization models.", "output": "OSP: Boosting Distributed Model Training with 2-stage Synchronization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The autonomous driving community has witnessed a rapid growth in approachesthat embrace an end-to-end algorithm framework, utilizing raw sensor input togenerate vehicle motion plans, instead of concentrating on individual taskssuch as detection and motion prediction. End-to-end systems, in comparison tomodular pipelines, benefit from joint feature optimization for perception andplanning. This field has flourished due to the availability of large-scaledatasets, closed-loop evaluation, and the increasing need for autonomousdriving algorithms to perform effectively in challenging scenarios. In thissurvey, we provide a comprehensive analysis of more than 250 papers, coveringthe motivation, roadmap, methodology, challenges, and future trends inend-to-end autonomous driving. We delve into several critical challenges,including multi-modality, interpretability, causal confusion, robustness, andworld models, amongst others. Additionally, we discuss current advancements infoundation models and visual pre-training, as well as how to incorporate thesetechniques within the end-to-end driving framework. To facilitate futureresearch, we maintain an active repository that contains up-to-date links torelevant literature and open-source projects at", "output": "End-to-end Autonomous Driving: Challenges and Frontiers."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Invariance to spatial transformations such as translations and rotations is adesirable property and a basic design principle for classification neuralnetworks. However, the commonly used convolutional neural networks (CNNs) areactually very sensitive to even small translations. There exist vast works toachieve exact or approximate transformation invariance by designingtransformation-invariant models or assessing the transformations. These worksusually make changes to the standard CNNs and harm the performance on standarddatasets. In this paper, rather than modifying the classifier, we propose apre-classifier restorer to recover translated (or even rotated) inputs to theoriginal ones which will be fed into any classifier for the same dataset. Therestorer is based on a theoretical result which gives a sufficient andnecessary condition for an affine operator to be translational equivariant on atensor space.", "output": "Restore Translation Using Equivariant Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Computing homotopy groups of spheres has long been a fundamental objective inalgebraic topology. Various theoretical and algorithmic approaches have beendeveloped to tackle this problem. In this paper we take a step towards the goalof comprehending the group-theoretic structure of the generators of thesehomotopy groups by leveraging the power of machine learning. Specifically, inthe simplicial group setting of Wu's formula, we reformulate the problem ofgenerating simplicial cycles as a problem of sampling from the intersection ofalgorithmic datasets related to Dyck languages. We present and evaluatelanguage modelling approaches that employ multi-label information for inputsequences, along with the necessary group-theoretic toolkit and non-neuralbaselines.", "output": "Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "High-order Graph Neural Networks (HO-GNNs) have been developed to inferconsistent latent spaces in the heterophilic regime, where the labeldistribution is not correlated with the graph structure. However, most of theexisting HO-GNNs are hop-based, i.e., they rely on the powers of the transitionmatrix. As a result, these architectures are not fully reactive to theclassification loss and the achieved structural filters have static supports.In other words, neither the filters' supports nor their coefficients can belearned with these networks. They are confined, instead, to learn combinationsof filters. To address the above concerns, we propose Diffusion-jump GNNs amethod relying on asymptotic diffusion distances that operates on jumps. Adiffusion-pump generates pairwise distances whose projections determine boththe support and coefficients of each structural filter. These filters arecalled jumps because they explore a wide range of scales in order to find bondsbetween scattered nodes with the same label. Actually, the full process iscontrolled by the classification loss. Both the jumps and the diffusiondistances react to classification errors (i.e. they are learnable).Homophiliation, i.e., the process of learning piecewise smooth latent spaces inthe heterophilic regime, is formulated as a Dirichlet problem: the known labelsdetermine the border nodes and the diffusion-pump ensures a minimal deviationof the semi-supervised grouping from a canonical unsupervised grouping. Thistriggers the update of both the diffusion distances and, consequently, thejumps in order to minimize the classification error. The Dirichlet formulationhas several advantages. It leads to the definition of structural heterophily, anovel measure beyond edge heterophily. It also allows us to investigate linkswith (learnable) diffusion distances, absorbing random walks and stochasticdiffusion.", "output": "Diffusion-Jump GNNs: Homophiliation via Learnable Metric Filters."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Coverage path planning is the problem of finding the shortest path thatcovers the entire free space of a given confined area, with applicationsranging from robotic lawn mowing and vacuum cleaning, to demining andsearch-and-rescue tasks. While offline methods can find provably complete, andin some cases optimal, paths for known environments, their value is limited inonline scenarios where the environment is not known beforehand, especially inthe presence of non-static obstacles. We propose an end-to-end reinforcementlearning-based approach in continuous state and action space, for the onlinecoverage path planning problem that can handle unknown environments. Weconstruct the observation space from both global maps and local sensory inputs,allowing the agent to plan a long-term path, and simultaneously act onshort-term obstacle detections. To account for large-scale environments, wepropose to use a multi-scale map input representation. Furthermore, we proposea novel total variation reward term for eliminating thin strips of uncoveredspace in the learned path. To validate the effectiveness of our approach, weperform extensive experiments in simulation with a distance sensor, surpassingthe performance of a recent reinforcement learning-based approach.", "output": "End-to-end Reinforcement Learning for Online Coverage Path Planning in Unknown Environments."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regularization is a set of techniques that are used to improve thegeneralization ability of deep neural networks. In this paper, we introduceweight compander (WC), a novel effective method to improve generalization byreparameterizing each weight in deep neural networks using a nonlinearfunction. It is a general, intuitive, cheap and easy to implement method, whichcan be combined with various other regularization techniques. Large weights indeep neural networks are a sign of a more complex network that is overfitted tothe training data. Moreover, regularized networks tend to have a greater rangeof weights around zero with fewer weights centered at zero. We introduce aweight reparameterization function which is applied to each weight andimplicitly reduces overfitting by restricting the magnitude of the weightswhile forcing them away from zero at the same time. This leads to a moredemocratic decision-making in the network. Firstly, individual weights cannothave too much influence in the prediction process due to the restriction oftheir magnitude. Secondly, more weights are used in the prediction process,since they are forced away from zero during the training. This promotes theextraction of more features from the input data and increases the level ofweight redundancy, which makes the network less sensitive to statisticaldifferences between training and test data. We extend our method to learn thehyperparameters of the introduced weight reparameterization function. Thisavoids hyperparameter search and gives the network the opportunity to align theweight reparameterization with the training progress. We show experimentallythat using weight compander in addition to standard regularization methodsimproves the performance of neural networks.", "output": "Weight Compander: A Simple Weight Reparameterization for Regularization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Regularization is a set of techniques that are used to improve thegeneralization ability of deep neural networks. In this paper, we introducespectral batch normalization (SBN), a novel effective method to improvegeneralization by normalizing feature maps in the frequency (spectral) domain.The activations of residual networks without batch normalization (BN) tend toexplode exponentially in the depth of the network at initialization. This leadsto extremely large feature map norms even though the parameters are relativelysmall. These explosive dynamics can be very detrimental to learning. BN makesweight decay regularization on the scaling factors $gamma, beta$approximately equivalent to an additive penalty on the norm of the featuremaps, which prevents extremely large feature map norms to a certain degree.However, we show experimentally that, despite the approximate additive penaltyof BN, feature maps in deep neural networks (DNNs) tend to explode at thebeginning of the network and that feature maps of DNNs contain large valuesduring the whole training. This phenomenon also occurs in a weakened form innon-residual networks. SBN addresses large feature maps by normalizing them inthe frequency domain. In our experiments, we empirically show that SBN preventsexploding feature maps at initialization and large feature map values duringthe training. Moreover, the normalization of feature maps in the frequencydomain leads to more uniform distributed frequency components. This discouragesthe DNNs to rely on single frequency components of feature maps. These,together with other effects of SBN, have a regularizing effect on the trainingof residual and non-residual networks. We show experimentally that using SBN inaddition to standard regularization methods improves the performance of DNNs bya relevant margin, e.g. ResNet50 on ImageNet by 0.71%.", "output": "Spectral Batch Normalization: Normalization in the Frequency Domain."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Approaching the era of ubiquitous computing, human motion sensing plays acrucial role in smart systems for decision making, user interaction, andpersonalized services. Extensive research has been conducted on human tracking,pose estimation, gesture recognition, and activity recognition, which arepredominantly based on cameras in traditional methods. However, the intrusivenature of cameras limits their use in smart home applications. To address this,mmWave radars have gained popularity due to their privacy-friendly features. Inthis work, we propose textit{milliFlow}, a novel deep learning method forscene flow estimation as a complementary motion information for mmWave pointcloud, serving as an intermediate level of features and directly benefitingdownstream human motion sensing tasks. Experimental results demonstrate thesuperior performance of our method with an average 3D endpoint error of 4.6cm,significantly surpassing the competing approaches. Furthermore, byincorporating scene flow information, we achieve remarkable improvements inhuman activity recognition, human parsing, and human body part tracking. Tofoster further research in this area, we provide our codebase and dataset foropen access.", "output": "milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Compositionality is a critical aspect of scalable system design.Reinforcement learning (RL) has recently shown substantial success in tasklearning, but has only recently begun to truly leverage composition. In thispaper, we focus on Boolean composition of learned tasks as opposed tofunctional or sequential composition. Existing Boolean composition for RLfocuses on reaching a satisfying absorbing state in environments with discreteaction spaces, but does not support composable safety (i.e., avoidance)constraints. We advance the state of the art in Boolean composition of learnedtasks with three contributions: i) introduce two distinct notions of safety inthis framework; ii) show how to enforce either safety semantics, provecorrectness (under some assumptions), and analyze the trade-offs between thetwo safety notions; and iii) extend Boolean composition from discrete actionspaces to continuous action spaces. We demonstrate these techniques usingmodified versions of value iteration in a grid world, Deep Q-Network (DQN) in agrid world with image observations, and Twin Delayed DDPG (TD3) in acontinuous-observation and continuous-action Bullet physics environment. Webelieve that these contributions advance the theory of safe reinforcementlearning by allowing zero-shot composition of policies satisfying safetyproperties.", "output": "Safety-Aware Task Composition for Discrete and Continuous Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Evolutionary differential equation discovery proved to be a tool to obtainequations with less a priori assumptions than conventional approaches, such assparse symbolic regression over the complete possible terms library. Theequation discovery field contains two independent directions. The first one ispurely mathematical and concerns differentiation, the object of optimizationand its relation to the functional spaces and others. The second one isdedicated purely to the optimizational problem statement. Both topics are worthinvestigating to improve the algorithm's ability to handle experimental data amore artificial intelligence way, without significant pre-processing and apriori knowledge of their nature. In the paper, we consider the prevalence ofeither single-objective optimization, which considers only the discrepancybetween selected terms in the equation, or multi-objective optimization, whichadditionally takes into account the complexity of the obtained equation. Theproposed comparison approach is shown on classical model examples -- Burgersequation, wave equation, and Korteweg - de Vries equation.", "output": "Comparison of Single- and Multi- Objective Optimization Quality for Evolutionary Equation Discovery."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many applications, e.g., in shared mobility, require coordinating a largenumber of agents. Mean-field reinforcement learning addresses the resultingscalability challenge by optimizing the policy of a representative agent. Inthis paper, we address an important generalization where there exist globalconstraints on the distribution of agents (e.g., requiring capacity constraintsor minimum coverage requirements to be met). We propose Safe-$text{M}^3$-UCRL,the first model-based algorithm that attains safe policies even in the case ofunknown transition dynamics. As a key ingredient, it uses epistemic uncertaintyin the transition model within a log-barrier approach to ensure pessimisticconstraints satisfaction with high probability. We showcaseSafe-$text{M}^3$-UCRL on the vehicle repositioning problem faced by manyshared mobility operators and evaluate its performance through simulationsbuilt on Shenzhen taxi trajectory data. Our algorithm effectively meets thedemand in critical areas while ensuring service accessibility in regions withlow demand.", "output": "Safe Model-Based Multi-Agent Mean-Field Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In recent years, channel state information (CSI) at sub-6 GHz has been widelyexploited for Wi-Fi sensing, particularly for activity and gesture recognition.In this work, we instead explore mmWave (60 GHz) Wi-Fi signals for gesturerecognition/pose estimation. Our focus is on the mmWave Wi-Fi signals so thatthey can be used not only for high data rate communication but also forimproved sensing e.g., for extended reality (XR) applications. For this reason,we extract spatial beam signal-to-noise ratios (SNRs) from the periodic beamtraining employed by IEEE 802.11ad devices. We consider a set of 10gestures/poses motivated by XR applications. We conduct experiments in twoenvironments and with three people.As a comparison, we also collect CSI fromIEEE 802.11ac devices. To extract features from the CSI and the beam SNR, weleverage a deep neural network (DNN). The DNN classifier achieves promisingresults on the beam SNR task with state-of-the-art 96.7% accuracy in a singleenvironment, even with a limited dataset. We also investigate the robustness ofthe beam SNR against CSI across different environments. Our experiments revealthat features from the CSI generalize without additional re-training, whilethose from beam SNRs do not. Therefore, re-training is required in the lattercase.", "output": "Gesture Recognition with mmWave Wi-Fi Access Points: Lessons Learned."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Temporal Point Processes (TPPs) serve as the standard mathematical frameworkfor modeling asynchronous event sequences in continuous time. However,classical TPP models are often constrained by strong assumptions, limitingtheir ability to capture complex real-world event dynamics. To overcome thislimitation, researchers have proposed Neural TPPs, which leverage neuralnetwork parametrizations to offer more flexible and efficient modeling. Whilerecent studies demonstrate the effectiveness of Neural TPPs, they often lack aunified setup, relying on different baselines, datasets, and experimentalconfigurations. This makes it challenging to identify the key factors drivingimprovements in predictive accuracy, hindering research progress. To bridgethis gap, we present a comprehensive large-scale experimental study thatsystematically evaluates the predictive accuracy of state-of-the-art neural TPPmodels. Our study encompasses multiple real-world and synthetic event sequencedatasets, following a carefully designed unified setup. We thoroughlyinvestigate the influence of major architectural components such as eventencoding, history encoder, and decoder parametrization on both time and markprediction tasks. Additionally, we delve into the less explored area ofprobabilistic calibration for neural TPP models. By analyzing our results, wedraw insightful conclusions regarding the significance of history size and theimpact of architectural components on predictive accuracy. Furthermore, we shedlight on the miscalibration of mark distributions in neural TPP models. Ourstudy aims to provide valuable insights into the performance andcharacteristics of neural TPP models, contributing to a better understanding oftheir strengths and limitations.", "output": "On the Predictive Accuracy of Neural Temporal Point Process Models for Continuous-time Event Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Large Language Models (LLMs) have been successfully used in manynatural-language tasks and applications including text generation and AIchatbots. They also are a promising new technology for concept-oriented deeplearning (CODL). However, the prerequisite is that LLMs understand concepts andensure conceptual consistency. We discuss these in this paper, as well as majoruses of LLMs for CODL including concept extraction from text, concept graphextraction from text, and concept learning. Human knowledge consists of bothsymbolic (conceptual) knowledge and embodied (sensory) knowledge. Text-onlyLLMs, however, can represent only symbolic (conceptual) knowledge. MultimodalLLMs, on the other hand, are capable of representing the full range (conceptualand sensory) of human knowledge. We discuss conceptual understanding invisual-language LLMs, the most important multimodal LLMs, and major uses ofthem for CODL including concept extraction from image, concept graph extractionfrom image, and concept learning. While uses of LLMs for CODL are valuablestandalone, they are particularly valuable as part of LLM applications such asAI chatbots.", "output": "Concept-Oriented Deep Learning with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph neural networks (GNNs) have been widely applied in multi-variatetime-series forecasting (MTSF) tasks because of their capability in capturingthe correlations among different time-series. These graph-based learningapproaches improve the forecasting performance by discovering and understandingthe underlying graph structures, which represent the data correlation. When theexplicit prior graph structures are not available, most existing works cannotguarantee the sparsity of the generated graphs that make the overall modelcomputational expensive and less interpretable. In this work, we propose adecoupled training method, which includes a graph generating module and a GNNsforecasting module. First, we use Graphical Lasso (or GraphLASSO) to directlyexploit the sparsity pattern from data to build graph structures in both staticand time-varying cases. Second, we fit these graph structures and the inputdata into a Graph Convolutional Recurrent Network (GCRN) to train a forecastingmodel. The experimental results on three real-world datasets show that ournovel approach has competitive performance against existing state-of-the-artforecasting algorithms while providing sparse, meaningful and explainable graphstructures and reducing training time by approximately 40%. Our PyTorchimplementation is publicly available at ", "output": "Sparsity exploitation via discovering graphical models in multi-variate time-series forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Continual learning (CL) is an approach to address catastrophic forgetting,which refers to forgetting previously learned knowledge by neural networks whentrained on new tasks or data distributions. The adversarial robustness hasdecomposed features into robust and non-robust types and demonstrated thatmodels trained on robust features significantly enhance adversarial robustness.However, no study has been conducted on the efficacy of robust features fromthe lens of the CL model in mitigating catastrophic forgetting in CL. In thispaper, we introduce the CL robust dataset and train four baseline models onboth the standard and CL robust datasets. Our results demonstrate that the CLmodels trained on the CL robust dataset experienced less catastrophicforgetting of the previously learned tasks than when trained on the standarddataset. Our observations highlight the significance of the features providedto the underlying CL models, showing that CL robust features can alleviatecatastrophic forgetting.", "output": "The Importance of Robust Features in Mitigating Catastrophic Forgetting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We introduce RL4CO, an extensive reinforcement learning (RL) forcombinatorial optimization (CO) benchmark. RL4CO employs state-of-the-artsoftware libraries as well as best practices in implementation, such asmodularity and configuration management, to be efficient and easily modifiableby researchers for adaptations of neural network architecture, environments,and algorithms. Contrary to the existing focus on specific tasks like thetraveling salesman problem (TSP) for performance assessment, we underline theimportance of scalability and generalization capabilities for diverseoptimization tasks. We also systematically benchmark sample efficiency,zero-shot generalization, and adaptability to changes in data distributions ofvarious models. Our experiments show that some recent state-of-the-art methodsfall behind their predecessors when evaluated using these new metrics,suggesting the necessity for a more balanced view of the performance of neuralCO solvers. We hope RL4CO will encourage the exploration of novel solutions tocomplex real-world tasks, allowing to compare with existing methods through astandardized interface that decouples the science from the softwareengineering. We make our library publicly available at", "output": "RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Robot motor skills can be learned through deep reinforcement learning (DRL)by neural networks as state-action mappings. While the selection of stateobservations is crucial, there has been a lack of quantitative analysis todate. Here, we present a systematic saliency analysis that quantitativelyevaluates the relative importance of different feedback states for motor skillslearned through DRL. Our approach can identify the most essential feedbackstates for locomotion skills, including balance recovery, trotting, bounding,pacing and galloping. By using only key states including joint positions,gravity vector, base linear and angular velocities, we demonstrate that asimulated quadruped robot can achieve robust performance in various testscenarios across these distinct skills. The benchmarks using task performancemetrics show that locomotion skills learned with key states can achievecomparable performance to those with all states, and the task performance orlearning success rate will drop significantly if key states are missing. Thiswork provides quantitative insights into the relationship between stateobservations and specific types of motor skills, serving as a guideline forrobot motor learning. The proposed method is applicable to differentiablestate-action mapping, such as neural network based control policies, enablingthe learning of a wide range of motor skills with minimal sensing dependencies.", "output": "Identifying Important Sensory Feedback for Learning Locomotion Skills."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Recent work has observed an intriguing ''Neural Collapse'' phenomenon inwell-trained neural networks, where the last-layer representations of trainingsamples with the same label collapse into each other. This appears to suggestthat the last-layer representations are completely determined by the labels,and do not depend on the intrinsic structure of input distribution. We provideevidence that this is not a complete description, and that the apparentcollapse hides important fine-grained structure in the representations.Specifically, even when representations apparently collapse, the small amountof remaining variation can still faithfully and accurately captures theintrinsic structure of input distribution. As an example, if we train onCIFAR-10 using only 5 coarse-grained labels (by combining two classes into onesuper-class) until convergence, we can reconstruct the original 10-class labelsfrom the learned representations via unsupervised clustering. The reconstructedlabels achieve $93%$ accuracy on the CIFAR-10 test set, nearly matching thenormal CIFAR-10 accuracy for the same architecture. We also provide an initialtheoretical result showing the fine-grained representation structure in asimplified synthetic setting. Our results show concretely how the structure ofinput data can play a significant role in determining the fine-grainedstructure of neural representations, going beyond what Neural Collapsepredicts.", "output": "Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There has been an explosion in interest in machine learning (ML) in recentyears due to its applications to science and engineering. However, as MLtechniques have advanced, tools for explaining and visualizing novel MLalgorithms have lagged behind. Animation has been shown to be a powerful toolfor making engaging visualizations of systems that dynamically change overtime, which makes it well suited to the task of communicating ML algorithms.However, the current approach to animating ML algorithms is to handcraftapplications that highlight specific algorithms or use complex generalizedanimation software. We developed ManimML, an open-source Python library foreasily generating animations of ML algorithms directly from code. We sought toleverage ML practitioners' preexisting knowledge of programming rather thanrequiring them to learn complex animation software. ManimML has a familiarsyntax for specifying neural networks that mimics popular deep learningframeworks like Pytorch. A user can take a preexisting neural networkarchitecture and easily write a specification for an animation in ManimML,which will then automatically compose animations for different components ofthe system into a final animation of the entire neural network. ManimML is opensource and available at ", "output": "ManimML: Communicating Machine Learning Architectures with Animation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Using machine learning models to generate synthetic data has become common inmany fields. Technology to generate synthetic transactions that can be used todetect fraud is also growing fast. Generally, this synthetic data contains onlyinformation about the transaction, such as the time, place, and amount ofmoney. It does not usually contain the individual user's characteristics (ageand gender are occasionally included). Using relatively complex syntheticdemographic data may improve the complexity of transaction data features, thusimproving the fraud detection performance. Benefiting from developments ofmachine learning, some deep learning models have potential to perform betterthan other well-established synthetic data generation methods, such asmicrosimulation. In this study, we built a deep-learning Generative AdversarialNetwork (GAN), called DGGAN, which will be used for demographic datageneration. Our model generates samples during model training, which we foundimportant to overcame class imbalance issues. This study can help improve thecognition of synthetic data and further explore the application of syntheticdata generation in card fraud detection.", "output": "Synthetic Demographic Data Generation for Card Fraud Detection Using GANs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This research paper focuses on the implementation of radial Basis Function(RBF) Support Vector Machines (SVM) for classifying asteroid orbits. Asteroidsare important astronomical objects, and their orbits play a crucial role inunderstanding the dynamics of the solar system. The International AstronomicalUnion maintains data archives that provide a playground to experiment withvarious machine-learning techniques. In this study, we explore the applicationof RBF SVM algorithm to classify asteroids. The results show that the RBF SVMalgorithm provides a good efficiency and accuracy to the dataset. We alsoanalyze the impact of various parameters on the performance of the RBF SVMalgorithm and present the optimal parameter settings. Our study highlights theimportance of using machine learning techniques for classifying asteroid orbitsand the effectiveness of the RBF SVM algorithm in this regard.", "output": "Orbit Classification of asteroids using implementation of radial Basis Function on Support Vector Machines."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In the problem of aggregation, the aim is to combine a given class of basepredictors to achieve predictions nearly as accurate as the best one. In thisflexible framework, no assumption is made on the structure of the class or thenature of the target. Aggregation has been studied in both sequential andstatistical contexts. Despite some important differences between the twoproblems, the classical results in both cases feature the same globalcomplexity measure. In this paper, we revisit and tighten classical results inthe theory of aggregation in the statistical setting by replacing the globalcomplexity with a smaller, local one. Some of our proofs build on the PAC-Bayeslocalization technique introduced by Catoni. Among other results, we provelocalized versions of the classical bound for the exponential weights estimatordue to Leung and Barron and deviation-optimal bounds for the Q-aggregationestimator. These bounds improve over the results of Dai, Rigollet and Zhang forfixed design regression and the results of Lecu'e and Rigollet for randomdesign regression.", "output": "Local Risk Bounds for Statistical Aggregation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We present a model that can perform multiple vision tasks and can be adaptedto other downstream tasks efficiently. Despite considerable progress inmulti-task learning, most efforts focus on learning from multi-label data: asingle image set with multiple task labels. Such multi-label data sets arerare, small, and expensive. We say heterogeneous to refer to image sets withdifferent task labels, or to combinations of single-task datasets. Few haveexplored training on such heterogeneous datasets. General-purpose vision modelsare still dominated by single-task pretraining, and it remains unclear how toscale up multi-task models by leveraging mainstream vision datasets designedfor different purposes. The challenges lie in managing large intrinsicdifferences among vision tasks, including data distribution, architectures,task-specific modules, dataset scales, and sampling strategies. To addressthese challenges, we propose to modify and scale up mixture-of-experts (MoE)vision transformers, so that they can simultaneously learn classification,detection, and segmentation on diverse mainstream vision datasets includingImageNet, COCO, and ADE20K. Our approach achieves comparable results tosingle-task state-of-the-art models and demonstrates strong generalization ondownstream tasks. Due to its emergent modularity, this general-purpose modeldecomposes into high-performing components, efficiently adapting to downstreamtasks. We can fine-tune it with fewer training parameters, fewer modelparameters, and less computation. Additionally, its modularity allows for easyexpansion in continual-learning-without-forgetting scenarios. Finally, thesefunctions can be controlled and combined to meet various demands of downstreamtasks.", "output": "An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The scattering transform is a multilayered wavelet-based deep learningarchitecture that acts as a model of convolutional neural networks. Recently,several works have introduced generalizations of the scattering transform fornon-Euclidean settings such as graphs. Our work builds upon these constructionsby introducing windowed and non-windowed geometric scattering transforms forgraphs based upon a very general class of asymmetric wavelets. We show thatthese asymmetric graph scattering transforms have many of the same theoreticalguarantees as their symmetric counterparts. As a result, the proposedconstruction unifies and extends known theoretical results for many of theexisting graph scattering architectures. In doing so, this work helps bridgethe gap between geometric scattering and other graph neural networks byintroducing a large family of networks with provable stability and invarianceguarantees. These results lay the groundwork for future deep learningarchitectures for graph-structured data that have learned filters and alsoprovably have desirable theoretical properties.", "output": "Understanding Graph Neural Networks with Generalized Geometric Scattering Transforms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Successful teaching requires an assumption of how the learner learns - howthe learner uses experiences from the world to update their internal states. Weinvestigate what expectations people have about a learner when they teach themin an online manner using rewards and punishment. We focus on a commonreinforcement learning method, Q-learning, and examine what assumptions peoplehave using a behavioral experiment. To do so, we first establish a normativestandard, by formulating the problem as a machine teaching optimizationproblem. To solve the machine teaching optimization problem, we use a deeplearning approximation method which simulates learners in the environment andlearns to predict how feedback affects the learner's internal states. What dopeople assume about a learner's learning and discount rates when they teachthem an idealized exploration-exploitation task? In a behavioral experiment, wefind that people can teach the task to Q-learners in a relatively efficient andeffective manner when the learner uses a small value for its discounting rateand a large value for its learning rate. However, they still are suboptimal. Wealso find that providing people with real-time updates of how possible feedbackwould affect the Q-learner's internal states weakly helps them teach. Ourresults reveal how people teach using evaluative feedback and provide guidancefor how engineers should design machine agents in a manner that is intuitivefor people.", "output": "Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The framework for Simulation of Human and Artificial Emotion (SHArE)describes the architecture of emotion in terms of parameters transferablebetween psychology, neuroscience, and artificial intelligence. These parameterscan be defined as abstract concepts or granularized down to the voltage levelsof individual neurons. This model enables emotional trajectory design forhumans which may lead to novel therapeutic solutions for various mental healthconcerns. For artificial intelligence, this work provides a compact notationwhich can be applied to neural networks as a means to observe the emotions andmotivations of machines.", "output": "Simulation of Human and Artificial Emotion (SHArE)."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Minimax problems arise in a wide range of important applications includingrobust adversarial learning and Generative Adversarial Network (GAN) training.Recently, algorithms for minimax problems in the Federated Learning (FL)paradigm have received considerable interest. Existing federated algorithms forgeneral minimax problems require the full aggregation (i.e., aggregation oflocal model information from all clients) in each training round. Thus, theyare inapplicable to an important setting of FL known as the cross-devicesetting, which involves numerous unreliable mobile/IoT devices. In this paper,we develop the first practical algorithm named CDMA for general minimaxproblems in the cross-device FL setting. CDMA is based on aStart-Immediately-With-Enough-Responses mechanism, in which the server firstsignals a subset of clients to perform local computation and then starts toaggregate the local results reported by clients once it receives responses fromenough clients in each round. With this mechanism, CDMA is resilient to the lowclient availability. In addition, CDMA is incorporated with a lightweightglobal correction in the local update steps of clients, which mitigates theimpact of slow network connections. We establish theoretical guarantees of CDMAunder different choices of hyperparameters and conduct experiments on AUCmaximization, robust adversarial network training, and GAN training tasks.Theoretical and experimental results demonstrate the efficiency of CDMA.", "output": "CDMA: A Practical Cross-Device Federated Learning Algorithm for General Minimax Problems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Most Artificial Intelligence applications are based on supervised machinelearning (ML), which ultimately grounds on manually annotated data. Theannotation process is often performed in terms of a majority vote and this hasbeen proved to be often problematic, as highlighted by recent studies on theevaluation of ML models. In this article we describe and advocate for adifferent paradigm, which we call data perspectivism, which moves away fromtraditional gold standard datasets, towards the adoption of methods thatintegrate the opinions and perspectives of the human subjects involved in theknowledge representation step of ML processes. Drawing on previous works whichinspired our proposal we describe the potential of our proposal for not onlythe more subjective tasks (e.g. those related to human language) but also totasks commonly understood as objective (e.g. medical decision making), andpresent the main advantages of adopting a perspectivist stance in ML, as wellas possible disadvantages, and various ways in which such a stance can beimplemented in practice. Finally, we share a set of recommendations and outlinea research agenda to advance the perspectivist stance in ML.", "output": "Toward a Perspectivist Turn in Ground Truthing for Predictive Computing."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Wasserstein gradient flows on probability measures have found a host ofapplications in various optimization problems. They typically arise as thecontinuum limit of exchangeable particle systems evolving by some mean-fieldinteraction involving a gradient-type potential. However, in many problems,such as in multi-layer neural networks, the so-called particles are edgeweights on large graphs whose nodes are exchangeable. Such large graphs areknown to converge to continuum limits called graphons as their size grow toinfinity. We show that the Euclidean gradient flow of a suitable function ofthe edge-weights converges to a novel continuum limit given by a curve on thespace of graphons that can be appropriately described as a gradient flow or,more technically, a curve of maximal slope. Several natural functions ongraphons, such as homomorphism functions and the scalar entropy, are covered byour set-up, and the examples have been worked out in detail.", "output": "Gradient flows on graphons: existence, convergence, continuity equations."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, we classify scientific articles in the domain of naturallanguage processing (NLP) and machine learning (ML), as core subfields ofartificial intelligence (AI), into whether (i) they extend the currentstate-of-the-art by the introduction of novel techniques which beat existingmodels or whether (ii) they mainly criticize the existing state-of-the-art,i.e. that it is deficient with respect to some property (e.g. wrong evaluation,wrong datasets, misleading task specification). We refer to contributions under(i) as having a 'positive stance' and contributions under (ii) as having a'negative stance' (to related work). We annotate over 1.5 k papers from NLP andML to train a SciBERT-based model to automatically predict the stance of apaper based on its title and abstract. We then analyse large-scale trends onover 41 k papers from the last approximately 35 years in NLP and ML, findingthat papers have become substantially more positive over time, but negativepapers also got more negative and we observe considerably more negative papersin recent years. Negative papers are also more influential in terms ofcitations they receive.", "output": "Did AI get more negative recently?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Effective resistance (ER) is an attractive way to interrogate the structureof graphs. It is an alternative to computing the eigen-vectors of the graphLaplacian. Graph laplacians are used to find low dimensional structures in highdimensional data. Here too, ER based analysis has advantages over eign-vectorbased methods. Unfortunately Von Luxburg et al. (2010) show that, when verticescorrespond to a sample from a distribution over a metric space, the limit ofthe ER between distant points converges to a trivial quantity that holds noinformation about the structure of the graph. We show that by using scalingresistances in a graph with $n$ vertices by $n^2$, one gets a meaningful limitof the voltages and of effective resistances. We also show that by adding a\"ground\" node to a metric graph one gets a simple and natural way to computeall of the distances from a chosen point to all other points.", "output": "Structure from Voltage."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Adversarial attacks are a major challenge faced by current machine learningresearch. These purposely crafted inputs fool even the most advanced models,precluding their deployment in safety-critical applications. Extensive researchin computer vision has been carried to develop reliable defense strategies.However, the same issue remains less explored in natural language processing.Our work presents a model-agnostic detector of adversarial text examples. Theapproach identifies patterns in the logits of the target classifier whenperturbing the input text. The proposed detector improves the currentstate-of-the-art performance in recognizing adversarial inputs and exhibitsstrong generalization capabilities across different NLP models, datasets, andword-level attacks.", "output": "\"That Is a Suspicious Reaction!\": Interpreting Logits Variation to Detect NLP Adversarial Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite significant progress, previous multi-view unsupervised featureselection methods mostly suffer from two limitations. First, they generallyutilize either cluster structure or similarity structure to guide the featureselection, which neglect the possibility of a joint formulation with mutualbenefits. Second, they often learn the similarity structure by either globalstructure learning or local structure learning, which lack the capability ofgraph learning with both global and local structural awareness. In light ofthis, this paper presents a joint multi-view unsupervised feature selection andgraph learning (JMVFG) approach. Particularly, we formulate the multi-viewfeature selection with orthogonal decomposition, where each target matrix isdecomposed into a view-specific basis matrix and a view-consistent clusterindicator. The cross-space locality preservation is incorporated to bridge thecluster structure learning in the projected space and the similarity learning(i.e., graph learning) in the original space. Further, a unified objectivefunction is presented to enable the simultaneous learning of the clusterstructure, the global and local similarity structures, and the multi-viewconsistency and inconsistency, upon which an alternating optimization algorithmis developed with theoretically proved convergence. Extensive experiments on avariety of real-world multi-view datasets demonstrate the superiority of ourapproach for both the multi-view feature selection and graph learning tasks.The code is available at ", "output": "Joint Multi-view Unsupervised Feature Selection and Graph Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Real-world image recognition systems often face corrupted input images, whichcause distribution shifts and degrade the performance of models. These systemsoften use a single prediction model in a central server and process images sentfrom various environments, such as cameras distributed in cities or cars. Suchsingle models face images corrupted in heterogeneous ways in test time. Thus,they require to instantly adapt to the multiple corruptions during testingrather than being re-trained at a high cost. Test-time adaptation (TTA), whichaims to adapt models without accessing the training dataset, is one of thesettings that can address this problem. Existing TTA methods indeed work wellon a single corruption. However, the adaptation ability is limited whenmultiple types of corruption occur, which is more realistic. We hypothesizethis is because the distribution shift is more complicated, and the adaptationbecomes more difficult in case of multiple corruptions. In fact, weexperimentally found that a larger distribution gap remains after TTA. Toaddress the distribution gap during testing, we propose a novel TTA methodnamed Covariance-Aware Feature alignment (CAFe). We empirically show that CAFeoutperforms prior TTA methods on image corruptions, including multiple types ofcorruptions.", "output": "Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation to Multiple Image Corruptions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Context modeling and recognition represent complex tasks that allow mobileand ubiquitous computing applications to adapt to the user's situation. Currentsolutions mainly focus on limited context information generally processed oncentralized architectures, potentially exposing users' personal data to privacyleakage, and missing personalization features. For these reasons on-devicecontext modeling and recognition represent the current research trend in thisarea. Among the different information characterizing the user's context inmobile environments, social interactions and visited locations remarkablycontribute to the characterization of daily life scenarios. In this paper wepropose a novel, unsupervised and lightweight approach to model the user'ssocial context and her locations based on ego networks directly on the usermobile device. Relying on this model, the system is able to extract high-leveland semantic-rich context features from smartphone-embedded sensors data.Specifically, for the social context it exploits data related to both physicaland cyber social interactions among users and their devices. As far as locationcontext is concerned, we assume that it is more relevant to model thefamiliarity degree of a specific location for the user's context than the rawlocation data, both in terms of GPS coordinates and proximity devices. By using5 real-world datasets, we assess the structure of the social and location egonetworks, we provide a semantic evaluation of the proposed models and acomplexity evaluation in terms of mobile computing performance. Finally, wedemonstrate the relevance of the extracted features by showing the performanceof 3 machine learning algorithms to recognize daily-life situations, obtainingan improvement of 3% of AUROC, 9% of Precision, and 5% in terms of Recall withrespect to use only features related to physical context.", "output": "On-device modeling of user's social context and familiar places from smartphone-embedded sensor data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Graph Neural Network (GNN) with its ability to integrate graph informationhas been widely used for data analyses. However, the expressive power of GNNhas only been studied for graph-level tasks but not for node-level tasks, suchas node classification, where one tries to interpolate missing nodal labelsfrom the observed ones. In this paper, we study the expressive power of GNN forthe said classification task, which is in essence a function interpolationproblem. Explicitly, we derive the number of weights and layers needed for aGNN to interpolate a band-limited function in $mathbb{R}^d$. Our result showsthat, the number of weights needed to $epsilon$-approximate a bandlimitedfunction using the GNN architecture is much fewer than the best known one usinga fully connected neural network (NN) - in particular, one only needs $O((logepsilon^{-1})^{d})$ weights using a GNN trained by $O((logepsilon^{-1})^{d})$ samples to $epsilon$-approximate a discretizedbandlimited signal in $mathbb{R}^d$. The result is obtained by drawing aconnection between the GNN structure and the classical sampling theorems,making our work the first attempt in this direction.", "output": "Superiority of GNN over NN in generalizing bandlimited functions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Knowledge embeddings (KE) represent a knowledge graph (KG) by embeddingentities and relations into continuous vector spaces. Existing methods aremainly structure-based or description-based. Structure-based methods learnrepresentations that preserve the inherent structure of KGs. They cannot wellrepresent abundant long-tail entities in real-world KGs with limited structuralinformation. Description-based methods leverage textual information andlanguage models. Prior approaches in this direction barely outperformstructure-based ones, and suffer from problems like expensive negative samplingand restrictive description demand. In this paper, we propose LMKE, whichadopts Language Models to derive Knowledge Embeddings, aiming at both enrichingrepresentations of long-tail entities and solving problems of priordescription-based methods. We formulate description-based KE learning with acontrastive learning framework to improve efficiency in training andevaluation. Experimental results show that LMKE achieves state-of-the-artperformance on KE benchmarks of link prediction and triple classification,especially for long-tail entities.", "output": "Language Models as Knowledge Embeddings."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Software testing activities scrutinize the artifacts and the behavior of asoftware product to find possible defects and ensure that the product meets itsexpected requirements. Recently, Deep Reinforcement Learning (DRL) has beensuccessfully employed in complex testing tasks such as game testing, regressiontesting, and test case prioritization to automate the process and providecontinuous adaptation. Practitioners can employ DRL by implementing fromscratch a DRL algorithm or using a DRL framework. DRL frameworks offerwell-maintained implemented state-of-the-art DRL algorithms to facilitate andspeed up the development of DRL applications. Developers have widely used theseframeworks to solve problems in various domains including software testing.However, to the best of our knowledge, there is no study that empiricallyevaluates the effectiveness and performance of implemented algorithms in DRLframeworks. Moreover, some guidelines are lacking from the literature thatwould help practitioners choose one DRL framework over another. In this paper,we empirically investigate the applications of carefully selected DRLalgorithms on two important software testing tasks: test case prioritization inthe context of Continuous Integration (CI) and game testing. For the gametesting task, we conduct experiments on a simple game and use DRL algorithms toexplore the game to detect bugs. Results show that some of the selected DRLframeworks such as Tensorforce outperform recent approaches in the literature.To prioritize test cases, we run experiments on a CI environment where DRLalgorithms from different frameworks are used to rank the test cases. Ourresults show that the performance difference between implemented algorithms insome cases is considerable, motivating further investigation.", "output": "A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "This paper outlines an end-to-end optimized lossy image compression frameworkusing diffusion generative models. The approach relies on the transform codingparadigm, where an image is mapped into a latent space for entropy coding and,from there, mapped back to the data space for reconstruction. In contrast toVAE-based neural compression, where the (mean) decoder is a deterministicneural network, our decoder is a conditional diffusion model. Our approach thusintroduces an additional \"content\" latent variable on which the reversediffusion process is conditioned and uses this variable to store informationabout the image. The remaining \"texture\" variables characterizing the diffusionprocess are synthesized at decoding time. We show that the model's performancecan be tuned toward perceptual metrics of interest. Our extensive experimentsinvolving multiple datasets and image quality assessment metrics show that ourapproach yields stronger reported FID scores than the GAN-based model, whilealso yielding competitive performance with VAE-based models in severaldistortion metrics. Furthermore, training the diffusion with X-parameterizationenables high-quality reconstructions in only a handful of decoding steps,greatly affecting the model's practicality.", "output": "Lossy Image Compression with Conditional Diffusion Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We offer a method for one-shot mask-guided image synthesis that allowscontrolling manipulations of a single image by inverting a quasi-robustclassifier equipped with strong regularizers. Our proposed method, entitledMAGIC, leverages structured gradients from a pre-trained quasi-robustclassifier to better preserve the input semantics while preserving itsclassification accuracy, thereby guaranteeing credibility in the synthesis.Unlike current methods that use complex primitives to supervise the process oruse attention maps as a weak supervisory signal, MAGIC aggregates gradientsover the input, driven by a guide binary mask that enforces a strong, spatialprior. MAGIC implements a series of manipulations with a single frameworkachieving shape and location control, intense non-rigid shape deformations, andcopy/move operations in the presence of repeating objects and gives users firmcontrol over the synthesis by requiring to simply specify binary guide masks.Our study and findings are supported by various qualitative comparisons withthe state-of-the-art on the same images sampled from ImageNet and quantitativeanalysis using machine perception along with a user survey of 100+ participantsthat endorse our synthesis quality. Project page at Code is available at", "output": "MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "There is a longstanding interest in capturing the error behaviour of objectdetectors by finding images where their performance is likely to beunsatisfactory. In real-world applications such as autonomous driving, it isalso crucial to characterise potential failures beyond simple requirements ofdetection performance. For example, a missed detection of a pedestrian close toan ego vehicle will generally require closer inspection than a missed detectionof a car in the distance. The problem of predicting such potential failures attest time has largely been overlooked in the literature and conventionalapproaches based on detection uncertainty fall short in that they are agnosticto such fine-grained characterisation of errors. In this work, we propose toreformulate the problem of finding \"hard\" images as a query-based hard imageretrieval task, where queries are specific definitions of \"hardness\", and offera simple and intuitive method that can solve this task for a large family ofqueries. Our method is entirely post-hoc, does not require ground-truthannotations, is independent of the choice of a detector, and relies on anefficient Monte Carlo estimation that uses a simple stochastic model in placeof the ground-truth. We show experimentally that it can be applied successfullyto a wide variety of queries for which it can reliably identify hard images fora given detector without any labelled data. We provide results on ranking andclassification tasks using the widely used RetinaNet, Faster-RCNN, Mask-RCNN,and Cascade Mask-RCNN object detectors. The code for this project is availableat ", "output": "Query-based Hard-Image Retrieval for Object Detection at Test Time."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Methods for erasing human-interpretable concepts from neural representationsthat assume linearity have been found to be tractable and useful. However, theimpact of this removal on the behavior of downstream classifiers trained on themodified representations is not fully understood. In this work, we formallydefine the notion of log-linear guardedness as the inability of an adversary topredict the concept directly from the representation, and study itsimplications. We show that, in the binary case, under certain assumptions, adownstream log-linear model cannot recover the erased concept. However, wedemonstrate that a multiclass log-linear model emph{can} be constructed thatindirectly recovers the concept in some cases, pointing to the inherentlimitations of log-linear guardedness as a downstream bias mitigationtechnique. These findings shed light on the theoretical limitations of linearerasure methods and highlight the need for further research on the connectionsbetween intrinsic and extrinsic bias in neural models.", "output": "Log-linear Guardedness and its Implications."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Generalization in Reinforcement Learning (RL) aims to learn an agent duringtraining that generalizes to the target environment. This paper studies RLgeneralization from a theoretical aspect: how much can we expect pre-trainingover training environments to be helpful? When the interaction with the targetenvironment is not allowed, we certify that the best we can obtain is anear-optimal policy in an average sense, and we design an algorithm thatachieves this goal. Furthermore, when the agent is allowed to interact with thetarget environment, we give a surprising result showing that asymptotically,the improvement from pre-training is at most a constant factor. On the otherhand, in the non-asymptotic regime, we design an efficient algorithm and provea distribution-based regret bound in the target environment that is independentof the state-action space.", "output": "On the Power of Pre-training for Generalization in RL: Provable Benefits and Hardness."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformer models have recently gained popularity in graph representationlearning as they have the potential to learn complex relationships beyond theones captured by regular graph neural networks. The main research question ishow to inject the structural bias of graphs into the transformer architecture,and several proposals have been made for undirected molecular graphs and,recently, also for larger network graphs. In this paper, we study transformersover directed acyclic graphs (DAGs) and propose architecture adaptationstailored to DAGs: (1) An attention mechanism that is considerably moreefficient than the regular quadratic complexity of transformers and at the sametime faithfully captures the DAG structure, and (2) a positional encoding ofthe DAG's partial order, complementing the former. We rigorously evaluate ourapproach over various types of tasks, ranging from classifying source codegraphs to nodes in citation networks, and show that it is effective in twoimportant aspects: in making graph transformers generally outperform graphneural networks tailored to DAGs and in improving SOTA graph transformerperformance in terms of both quality and efficiency.", "output": "Transformers over Directed Acyclic Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Several companies often safeguard their trained deep models (i.e., details ofarchitecture, learnt weights, training details etc.) from third-party users byexposing them only as black boxes through APIs. Moreover, they may not evenprovide access to the training data due to proprietary reasons or sensitivityconcerns. In this work, we propose a novel defense mechanism for black boxmodels against adversarial attacks in a data-free set up. We constructsynthetic data via generative model and train surrogate network using modelstealing techniques. To minimize adversarial contamination on perturbedsamples, we propose 'wavelet noise remover' (WNR) that performs discretewavelet decomposition on input images and carefully select only a few importantcoefficients determined by our 'wavelet coefficient selection module' (WCSM).To recover the high-frequency content of the image after noise removal via WNR,we further train a 'regenerator' network with the objective of retrieving thecoefficients such that the reconstructed image yields similar to originalpredictions on the surrogate model. At test time, WNR combined with trainedregenerator network is prepended to the black box network, resulting in a highboost in adversarial accuracy. Our method improves the adversarial accuracy onCIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared tobaseline, even when the attacker uses surrogate architecture (Alexnet-half andAlexnet) similar to the black box architecture (Alexnet) with same modelstealing strategy as defender. The code is available at", "output": "Data-free Defense of Black Box Models Against Adversarial Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Explainable artificial intelligence (XAI) provides explanations for notinterpretable machine learning (ML) models. While many technical approachesexist, there is a lack of validation of these techniques on real-worlddatasets. In this work, we present a use-case of XAI: an ML model which istrained to estimate electrification rates based on mobile phone data inSenegal. The data originate from the Data for Development challenge by Orangein 2014/15. We apply two model-agnostic, local explanation techniques and findthat while the model can be verified, it is biased with respect to thepopulation density. We conclude our paper by pointing to the two mainchallenges we encountered during our work: data processing and model designthat might be restricted by currently available XAI methods, and the importanceof domain knowledge to interpret explanations.", "output": "Explainability in Practice: Estimating Electrification Rates from Mobile Phone Data in Senegal."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A general, {em rectangular} kernel matrix may be defined as $K_{ij} =kappa(x_i,y_j)$ where $kappa(x,y)$ is a kernel function and where$X={x_i}_{i=1}^m$ and $Y={y_i}_{i=1}^n$ are two sets of points. In thispaper, we seek a low-rank approximation to a kernel matrix where the sets ofpoints $X$ and $Y$ are large and are arbitrarily distributed, such as away fromeach other, ``intermingled'', identical, etc. Such rectangular kernel matricesmay arise, for example, in Gaussian process regression where $X$ corresponds tothe training data and $Y$ corresponds to the test data. In this case, thepoints are often high-dimensional. Since the point sets are large, we mustexploit the fact that the matrix arises from a kernel function, and avoidforming the matrix, and thus ruling out most algebraic techniques. Inparticular, we seek methods that can scale linearly or nearly linear withrespect to the size of data for a fixed approximation rank. The main idea inthis paper is to {em geometrically} select appropriate subsets of points toconstruct a low rank approximation. An analysis in this paper guides how thisselection should be performed.", "output": "Data-Driven Linear Complexity Low-Rank Approximation of General Kernel Matrices: A Geometric Approach."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Many deep learning applications benefit from using large models with billionsof parameters. Training these models is notoriously expensive due to the needfor specialized HPC clusters. In this work, we consider alternative setups fortraining large models: using cheap \"preemptible\" instances or pooling existingresources from multiple regions. We analyze the performance of existingmodel-parallel algorithms in these conditions and find configurations wheretraining larger models becomes less communication-intensive. Based on thesefindings, we propose SWARM parallelism, a model-parallel training algorithmdesigned for poorly connected, heterogeneous and unreliable devices. SWARMcreates temporary randomized pipelines between nodes that are rebalanced incase of failure. We empirically validate our findings and compare SWARMparallelism with existing large-scale training approaches. Finally, we combineour insights with compression strategies to train a large Transformer languagemodel with 1B shared parameters (approximately 13B before sharing) onpreemptible T4 GPUs with less than 200Mb/s network.", "output": "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep learning techniques have become one of the main propellers for solvingengineering problems effectively and efficiently. For instance, PredictiveMaintenance methods have been used to improve predictions of when maintenanceis needed on different machines and operative contexts. However, deep learningmethods are not without limitations, as these models are normally trained on afixed distribution that only reflects the current state of the problem. Due tointernal or external factors, the state of the problem can change, and theperformance decreases due to the lack of generalization and adaptation.Contrary to this stationary training set, real-world applications change theirenvironments constantly, creating the need to constantly adapt the model toevolving scenarios. To aid in this endeavor, Continual Learning methods proposeways to constantly adapt prediction models and incorporate new knowledge afterdeployment. Despite the advantages of these techniques, there are stillchallenges to applying them to real-world problems. In this work, we present abrief introduction to predictive maintenance, non-stationary environments, andcontinual learning, together with an extensive review of the current state ofapplying continual learning in real-world applications and specifically inpredictive maintenance. We then discuss the current challenges of bothpredictive maintenance and continual learning, proposing future directions atthe intersection of both areas. Finally, we propose a novel way to createbenchmarks that favor the application of continuous learning methods in morerealistic environments, giving specific examples of predictive maintenance.", "output": "Continual Learning for Predictive Maintenance: Overview and Challenges."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transformers were originally proposed as a sequence-to-sequence model fortext but have become vital for a wide range of modalities, including images,audio, video, and undirected graphs. However, transformers for directed graphsare a surprisingly underexplored topic, despite their applicability toubiquitous domains, including source code and logic circuits. In this work, wepropose two direction- and structure-aware positional encodings for directedgraphs: (1) the eigenvectors of the Magnetic Laplacian - a direction-awaregeneralization of the combinatorial Laplacian; (2) directional random walkencodings. Empirically, we show that the extra directionality information isuseful in various downstream tasks, including correctness testing of sortingnetworks and source code understanding. Together with a data-flow-centric graphconstruction, our model outperforms the prior state of the art on the OpenGraph Benchmark Code2 relatively by 14.7%.", "output": "Transformers Meet Directed Graphs."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dot-product attention mechanism plays a crucial role in modern deeparchitectures (e.g., Transformer) for sequence modeling, however, na\"ive exactcomputation of this model incurs quadratic time and memory complexities insequence length, hindering the training of long-sequence models. Criticalbottlenecks are due to the computation of partition functions in thedenominator of softmax function as well as the multiplication of the softmaxmatrix with the matrix of values. Our key observation is that the former can bereduced to a variant of the kernel density estimation (KDE) problem, and anefficient KDE solver can be further utilized to accelerate the latter viasubsampling-based fast matrix products. Our proposed KDEformer can approximatethe attention in sub-quadratic time with provable spectral norm bounds, whileall prior results merely provide entry-wise error bounds. Empirically, weverify that KDEformer outperforms other attention approximations in terms ofaccuracy, memory, and runtime on various pre-trained models. On BigGAN imagegeneration, we achieve better generative scores than the exact computation withover $4times$ speedup. For ImageNet classification with T2T-ViT, KDEformershows over $18times$ speedup while the accuracy drop is less than $0.5%$.", "output": "KDEformer: Accelerating Transformers via Kernel Density Estimation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Data poisoning considers cases when an adversary manipulates the behavior ofmachine learning algorithms through malicious training data. Existing threatmodels of data poisoning center around a single metric, the number of poisonedsamples. In consequence, if attackers can poison more samples than expectedwith affordable overhead, as in many practical scenarios, they may be able torender existing defenses ineffective in a short time. To address this issue, weleverage timestamps denoting the birth dates of data, which are often availablebut neglected in the past. Benefiting from these timestamps, we propose atemporal threat model of data poisoning with two novel metrics, earliness andduration, which respectively measure how long an attack started in advance andhow long an attack lasted. Using these metrics, we define the notions oftemporal robustness against data poisoning, providing a meaningful sense ofprotection even with unbounded amounts of poisoned samples. We present abenchmark with an evaluation protocol simulating continuous data collection andperiodic deployments of updated models, thus enabling empirical evaluation oftemporal robustness. Lastly, we develop and also empirically verify a baselinedefense, namely temporal aggregation, offering provable temporal robustness andhighlighting the potential of our temporal threat model for data poisoning.", "output": "Temporal Robustness against Data Poisoning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Our goal is to produce methods for observational causal inference that areauditable, easy to troubleshoot, accurate for treatment effect estimation, andscalable to high-dimensional data. We describe a general framework calledModel-to-Match that achieves these goals by (i) learning a distance metric viaoutcome modeling, (ii) creating matched groups using the distance metric, and(iii) using the matched groups to estimate treatment effects. Model-to-Matchuses variable importance measurements to construct a distance metric, making ita flexible framework that can be adapted to various applications. Concentratingon the scalability of the problem in the number of potential confounders, weoperationalize the Model-to-Match framework with LASSO. We derive performanceguarantees for settings where LASSO outcome modeling consistently identifiesall confounders (importantly without requiring the linear model to be correctlyspecified). We also provide experimental results demonstrating the method'sauditability, accuracy, and scalability as well as extensions to more generalnonparametric outcome modeling.", "output": "Variable Importance Matching for Causal Inference."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We show that convex-concave Lipschitz stochastic saddle point problems (alsoknown as stochastic minimax optimization) can be solved under the constraint of$(epsilon,delta)$-differential privacy with emph{strong (primal-dual) gap}rate of $tilde Obig(frac{1}{sqrt{n}} + frac{sqrt{d}}{nepsilon}big)$,where $n$ is the dataset size and $d$ is the dimension of the problem. Thisrate is nearly optimal, based on existing lower bounds in differentiallyprivate stochastic optimization. Specifically, we prove a tight upper bound onthe strong gap via novel implementation and analysis of the recursiveregularization technique repurposed for saddle point problems. We show thatthis rate can be attained with$Obig(minbig{frac{n^2epsilon^{1.5}}{sqrt{d}}, n^{3/2}big}big)$gradient complexity, and $tilde{O}(n)$ gradient complexity if the lossfunction is smooth. As a byproduct of our method, we develop a generalalgorithm that, given a black-box access to a subroutine satisfying a certain$alpha$ primal-dual accuracy guarantee with respect to the empiricalobjective, gives a solution to the stochastic saddle point problem with astrong gap of $tilde{O}(alpha+frac{1}{sqrt{n}})$. We show that this$alpha$-accuracy condition is satisfied by standard algorithms for theempirical saddle point problem such as the proximal point method and thestochastic gradient descent ascent algorithm. Further, we show that even forsimple problems it is possible for an algorithm to have zero weak gap andsuffer from $Omega(1)$ strong gap. We also show that there exists afundamental tradeoff between stability and accuracy. Specifically, we show thatany $Delta$-stable algorithm has empirical gap $Omegabig(frac{1}{Deltan}big)$, and that this bound is tight. This result also holds also morespecifically for empirical risk minimization problems and may be of independentinterest.", "output": "Differentially Private Algorithms for the Stochastic Saddle Point Problem with Optimal Rates for the Strong Gap."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Forecasting the water level of the Han River is essential to control trafficand avoid natural disasters. The stream flow of the Han River is affected byvarious and intricately connected factors. Thus, a simple forecasting machinefrequently fails to capture its serial pattern. On the other hand, a complexpredictive model loses the interpretability of the model output. This workproposes a neural network model with a novel transformer exploiting a causalrelationship based on prior knowledge. The transformer consists ofspatiotemporal attention weight that describes the spatial and temporalcausation with multilayer networks with masking. Our model has twodistinguished advantages against the existing spatiotemporal forecastingmodels. First, the model allows the heterogeneous predictors for each site suchthat a flexible regression is applicable to the causal network. Next, the modelis adapted to partially identified causal structures. As a result, we haverelaxed the constraints of the applicable causal network through our model. Inreal data analysis, we use the Han River dataset from 2016 to 2021, compare theproposed model with deep learning models, and confirm that our model providesan interpretable and consistent model with prior knowledge, such as aseasonality arising from the tidal force. Furthermore, in predictionperformance, our model is better than or competitive with the state-of-the-artmodels.", "output": "Interpretable Water Level Forecaster with Spatiotemporal Causal Attention Mechanisms."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Transfer learning in Reinforcement Learning (RL) has been widely studied toovercome training issues of Deep-RL, i.e., exploration cost, data availabilityand convergence time, by introducing a way to enhance training phase withexternal knowledge. Generally, knowledge is transferred from expert-agents tonovices. While this fixes the issue for a novice agent, a good understanding ofthe task on expert agent is required for such transfer to be effective. As analternative, in this paper we propose Expert-Free Online Transfer Learning(EF-OnTL), an algorithm that enables expert-free real-time dynamic transferlearning in multi-agent system. No dedicated expert exists, and transfer sourceagent and knowledge to be transferred are dynamically selected at each transferstep based on agents' performance and uncertainty. To improve uncertaintyestimation, we also propose State Action Reward Next-State Random NetworkDistillation (sars-RND), an extension of RND that estimates uncertainty from RLagent-environment interaction. We demonstrate EF-OnTL effectiveness against ano-transfer scenario and advice-based baselines, with and without expertagents, in three benchmark tasks: Cart-Pole, a grid-based Multi-TeamPredator-Prey (mt-pp) and Half Field Offense (HFO). Our results show thatEF-OnTL achieve overall comparable performance when compared againstadvice-based baselines while not requiring any external input nor thresholdtuning. EF-OnTL outperforms no-transfer with an improvement related to thecomplexity of the task addressed.", "output": "Expert-Free Online Transfer Learning in Multi-Agent Reinforcement Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "In this paper, both empirically and theoretically, we show that severalAI-text detectors are not reliable in practical scenarios. Empirically, we showthat paraphrasing attacks, where a light paraphraser is applied on top of alarge language model (LLM), can break a whole range of detectors, includingones using watermarking schemes as well as neural network-based detectors andzero-shot classifiers. Our experiments demonstrate that retrieval-baseddetectors, designed to evade paraphrasing attacks, are still vulnerable torecursive paraphrasing. We then provide a theoretical impossibility resultindicating that as language models become more sophisticated and better atemulating human text, the performance of even the best-possible detectordecreases. For a sufficiently advanced language model seeking to imitate humantext, even the best-possible detector may only perform marginally better than arandom classifier. Our result is general enough to capture specific scenariossuch as particular writing styles, clever prompt design, or text paraphrasing.We also extend the impossibility result to include the case where pseudorandomnumber generators are used for AI-text generation instead of true randomness.We show that the same result holds with a negligible correction term for allpolynomial-time computable detectors. Finally, we show that even LLMs protectedby watermarking schemes can be vulnerable against spoofing attacks whereadversarial humans can infer hidden LLM text signatures and add them tohuman-generated text to be detected as text generated by the LLMs, potentiallycausing reputational damage to their developers. We believe these results canopen an honest conversation in the community regarding the ethical and reliableuse of AI-generated text.", "output": "Can AI-Generated Text be Reliably Detected?."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Miscalibration-the mismatch between predicted probability and the truecorrectness likelihood-has been frequently identified in modern deep neuralnetworks. Recent work in the field aims to address this problem by trainingcalibrated models directly by optimizing a proxy of the calibration erroralongside the conventional objective. Recently, Meta-Calibration (MC) showedthe effectiveness of using meta-learning for learning better calibrated models.In this work, we extend MC with two main components: (1) gamma network(gamma-net), a meta network to learn a sample-wise gamma at a continuous spacefor focal loss for optimizing backbone network; (2) smooth expected calibrationerror (SECE), a Gaussian-kernel based unbiased and differentiable ECE whichaims to smoothly optimizing gamma-net. The proposed method regularizes neuralnetwork towards better calibration meanwhile retain predictive performance. Ourexperiments show that (a) learning sample-wise gamma at continuous space caneffectively perform calibration; (b) SECE smoothly optimise gamma-net towardsbetter robustness to binning schemes; (c) the combination of gamma-net and SECEachieve the best calibration performance across various calibration metrics andretain very competitive predictive performance as compared to multiple recentlyproposed methods on three datasets.", "output": "Meta-Calibration Regularized Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Serial crystallography at X-ray free electron laser (XFEL) and synchrotronfacilities has experienced tremendous progress in recent times enabling novelscientific investigations into macromolecular structures and molecularprocesses. However, these experiments generate a significant amount of dataposing computational challenges in data reduction and real-time feedback. Braggpeak finding algorithm is used to identify useful images and also providereal-time feedback about hit-rate and resolution. Shot-to-shot intensityfluctuations and strong background scattering from buffer solution, injectionnozzle and other shielding materials make this a time-consuming optimizationproblem. Here, we present PeakNet, an autonomous Bragg peak finder thatutilizes deep neural networks. The development of this system 1) eliminates theneed for manual algorithm parameter tuning, 2) reduces false-positive peaks byadjusting to shot-to-shot variations in strong background scattering inreal-time, 3) eliminates the laborious task of manually creating bad pixelmasks and the need to store these masks per event since these can beregenerated on demand. PeakNet also exhibits exceptional runtime efficiency,processing a 1920-by-1920 pixel image around 90 ms on an NVIDIA 1080 Ti GPU,with the potential for further enhancements through parallelized analysis orGPU stream processing. PeakNet is well-suited for expert-level real-time serialcrystallography data analysis at high data rates.", "output": "PeakNet: An Autonomous Bragg Peak Finder with Deep Neural Networks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the success of deep-learning models in many tasks, there have beenconcerns about such models learning shortcuts, and their lack of robustness toirrelevant confounders. When it comes to models directly trained on humanfaces, a sensitive confounder is that of human identities. Many face-relatedtasks should ideally be identity-independent, and perform uniformly acrossdifferent individuals (i.e. be fair). One way to measure and enforce suchrobustness and performance uniformity is through enforcing it during training,assuming identity-related information is available at scale. However, due toprivacy concerns and also the cost of collecting such information, this isoften not the case, and most face datasets simply contain input images andtheir corresponding task-related labels. Thus, improving identity-relatedrobustness without the need for such annotations is of great importance. Here,we explore using face-recognition embedding vectors, as proxies for identities,to enforce such robustness. We propose to use the structure in theface-recognition embedding space, to implicitly emphasize rare samples withineach class. We do so by weighting samples according to their conditionalinverse density (CID) in the proxy embedding space. Our experiments suggestthat such a simple sample weighting scheme, not only improves the trainingrobustness, it often improves the overall performance as a result of suchrobustness. We also show that employing such constraints during trainingresults in models that are significantly less sensitive to different levels ofbias in the dataset.", "output": "Improving Identity-Robustness for Face Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Convolutional Neural Networks (CNNs) are the predominant model used for avariety of medical image analysis tasks. At inference time, these models arecomputationally intensive, especially with volumetric data. In principle, it ispossible to trade accuracy for computational efficiency by manipulating therescaling factor in the downsample and upsample layers of CNN architectures.However, properly exploring the accuracy-efficiency trade-off is prohibitivelyexpensive with existing models. To address this, we introduce Scale-SpaceHyperNetworks (SSHN), a method that learns a spectrum of CNNs with varyinginternal rescaling factors. A single SSHN characterizes an entire Paretoaccuracy-efficiency curve of models that match, and occasionally surpass, theoutcomes of training many separate networks with fixed rescaling factors. Wedemonstrate the proposed approach in several medical image analysisapplications, comparing SSHN against strategies with both fixed and dynamicrescaling factors. We find that SSHN consistently provides a betteraccuracy-efficiency trade-off at a fraction of the training cost. Trained SSHNsenable the user to quickly choose a rescaling factor that appropriatelybalances accuracy and computational efficiency for their particular needs atinference.", "output": "Scale-Space Hypernetworks for Efficient Biomedical Imaging."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Physicians considering clinical trials for their patients are met with thelaborious process of checking many text based eligibility criteria. LargeLanguage Models (LLMs) have shown to perform well for clinical informationextraction and clinical reasoning, including medical tests, but not yet inreal-world scenarios. This paper investigates the use of InstructGPT to assistphysicians in determining eligibility for clinical trials based on a patient'ssummarised medical profile. Using a prompting strategy combining one-shot,selection-inference and chain-of-thought techniques, we investigate theperformance of LLMs on 10 synthetically created patient profiles. Performanceis evaluated at four levels: ability to identify screenable eligibilitycriteria from a trial given a medical profile; ability to classify for eachindividual criterion whether the patient qualifies; the overall classificationwhether a patient is eligible for a clinical trial and the percentage ofcriteria to be screened by physician. We evaluated against 146 clinical trialsand a total of 4,135 eligibility criteria. The LLM was able to correctlyidentify the screenability of 72% (2,994/4,135) of the criteria. Additionally,72% (341/471) of the screenable criteria were evaluated correctly. Theresulting trial level classification as eligible or ineligible resulted in arecall of 0.5. By leveraging LLMs with a physician-in-the-loop, a recall of 1.0and precision of 0.71 on clinical trial level can be achieved while reducingthe amount of criteria to be checked by an estimated 90%. LLMs can be used toassist physicians with pre-screening of patients for clinical trials. Byforcing instruction-tuned LLMs to produce chain-of-thought responses, thereasoning can be made transparent to and the decision process becomes amenableby physicians, thereby making such a system feasible for use in real-worldscenarios.", "output": "Improving Patient Pre-screening for Clinical Trials: Assisting Physicians with Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hypernetworks, neural networks that predict the parameters of another neuralnetwork, are powerful models that have been successfully used in diverseapplications from image generation to multi-task learning. Unfortunately,existing hypernetworks are often challenging to train. Training typicallyconverges far more slowly than for non-hypernetwork models, and the rate ofconvergence can be very sensitive to hyperparameter choices. In this work, weidentify a fundamental and previously unidentified problem that contributes tothe challenge of training hypernetworks: a magnitude proportionality betweenthe inputs and outputs of the hypernetwork. We demonstrate both analyticallyand empirically that this can lead to unstable optimization, thereby slowingdown convergence, and sometimes even preventing any learning. We present asimple solution to this problem using a revised hypernetwork formulation thatwe call Magnitude Invariant Parametrizations (MIP). We demonstrate the proposedsolution on several hypernetwork tasks, where it consistently stabilizestraining and achieves faster convergence. Furthermore, we perform acomprehensive ablation study including choices of activation function,normalization strategies, input dimensionality, and hypernetwork architecture;and find that MIP improves training in all scenarios. We provide easy-to-usecode that can turn existing networks into MIP-based hypernetworks.", "output": "Magnitude Invariant Parametrizations Improve Hypernetwork Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The deep learning technique has been shown to be effectively addressedseveral image analysis tasks in the computer-aided diagnosis scheme formammography. The training of an efficacious deep learning model requires largedata with diverse styles and qualities. The diversity of data often comes fromthe use of various scanners of vendors. But, in practice, it is impractical tocollect a sufficient amount of diverse data for training. To this end, a novelcontrastive learning is developed to equip the deep learning models with betterstyle generalization capability. Specifically, the multi-style and multi-viewunsupervised self-learning scheme is carried out to seek robust featureembedding against style diversity as a pretrained model. Afterward, thepretrained network is further fine-tuned to the downstream tasks, e.g., massdetection, matching, BI-RADS rating, and breast density classification. Theproposed method has been evaluated extensively and rigorously with mammogramsfrom various vendor style domains and several public datasets. The experimentalresults suggest that the proposed domain generalization method can effectivelyimprove performance of four mammographic image tasks on the data from both seenand unseen domains, and outperform many state-of-the-art (SOTA) generalizationmethods.", "output": "Domain Generalization for Mammographic Image Analysis with Contrastive Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "During the continuous evolution of one organism's ancestry, its genesaccumulate extensive experiences and knowledge, enabling newborn descendants torapidly adapt to their specific environments. Motivated by this observation, wepropose a novel machine learning paradigm Learngene to enable learning modelsto incorporate three key characteristics of genes. (i) Accumulating: theknowledge is accumulated during the continuous learning of an ancestry model.(ii) Condensing: the extensive accumulated knowledge is condensed into a muchmore compact information piece, i.e., learngene. (iii) Inheriting: thecondensed learngene is inherited to make it easier for descendant models toadapt to new environments. Since accumulating has been studied inwell-established paradigms like large-scale pre-training and lifelong learning,we focus on condensing and inheriting, which induces three key issues and weprovide the preliminary solutions to these issues in this paper: (i) LearngeneForm: the learngene is set to a few integral layers that can preservesignificance. (ii) Learngene Condensing: we identify which layers among theancestry model have the most similarity as one pseudo descendant model. (iii)Learngene Inheriting: to construct distinct descendant models for the specificdownstream tasks, we stack some randomly initialized layers to the learngenelayers. Extensive experiments across various settings, including usingdifferent network architectures like Vision Transformer (ViT) and ConvolutionalNeural Networks (CNNs) on different datasets, are carried out to confirm fouradvantages of Learngene: it makes the descendant models 1) converge morequickly, 2) exhibit less sensitivity to hyperparameters, 3) perform better, and4) require fewer training samples to converge.", "output": "Learngene: Inheriting Condensed Knowledge from the Ancestry Model to Descendant Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "A growing body of research on probabilistic programs and causal models hashighlighted the need to reason compositionally about model classes that extenddirected graphical models. Both probabilistic programs and causal models definea joint probability density over a set of random variables, and exhibit sparsestructure that can be used to reason about causation and conditionalindependence. This work builds on recent work on Markov categories ofprobabilistic mappings to define a category whose morphisms combine a jointdensity, factorized over each sample space, with a deterministic mapping fromsamples to return values. This is a step towards closing the gap between recentcategory-theoretic descriptions of probability measures, and the operationaldefinitions of factorized densities that are commonly employed in probabilisticprogramming and causal inference.", "output": "String Diagrams with Factorized Densities."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "We study the problem of learning mixtures of Gaussians with censored data.Statistical learning with censored data is a classical problem, with numerouspractical applications, however, finite-sample guarantees for even simplelatent variable models such as Gaussian mixtures are missing. Formally, we aregiven censored data from a mixture of univariate Gaussians $$sum_{i=1}^k w_i mathcal{N}(mu_i,sigma^2), $$ i.e. the sample is observedonly if it lies inside a set $S$. The goal is to learn the weights $w_i$ andthe means $mu_i$. We propose an algorithm that takes only$frac{1}{varepsilon^{O(k)}}$ samples to estimate the weights $w_i$ and themeans $mu_i$ within $varepsilon$ error.", "output": "Learning Mixtures of Gaussians with Censored Data."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Though robustness of networks to random attacks has been widely studied,intentional destruction by an intelligent agent is not tractable with previousmethods. Here we devise a single-player game on a lattice that mimics the logicof an attacker attempting to destroy a network. The objective of the game is todisable all nodes in the fewest number of steps. We develop a reinforcementlearning approach using deep Q-learning that is capable of learning to playthis game successfully, and in so doing, to optimally attack a network. Becausethe learning algorithm is universal, we train agents on different definitionsof robustness and compare the learned strategies. We find that superficiallysimilar definitions of robustness induce different strategies in the trainedagent, implying that optimally attacking or defending a network is sensitivethe particular objective. Our method provides a new approach to understandnetwork robustness, with potential applications to other discrete processes indisordered systems.", "output": "Mastering Percolation-like Games with Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Solving Partial Differential Equations (PDEs) is the core of many fields ofscience and engineering. While classical approaches are often prohibitivelyslow, machine learning models often fail to incorporate complete systeminformation. Over the past few years, transformers have had a significantimpact on the field of Artificial Intelligence and have seen increased usage inPDE applications. However, despite their success, transformers currently lackintegration with physics and reasoning. This study aims to address this issueby introducing PITT: Physics Informed Token Transformer. The purpose of PITT isto incorporate the knowledge of physics by embedding partial differentialequations (PDEs) into the learning process. PITT uses an equation tokenizationmethod to learn an analytically-driven numerical update operator. By tokenizingPDEs and embedding partial derivatives, the transformer models become aware ofthe underlying knowledge behind physical processes. To demonstrate this, PITTis tested on challenging 1D and 2D PDE neural operator prediction tasks. Theresults show that PITT outperforms popular neural operator models and has theability to extract physically relevant information from governing equations.", "output": "Physics Informed Token Transformer."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Dynamic learning systems subject to selective labeling exhibit censoring,i.e. persistent negative predictions assigned to one or more subgroups ofpoints. In applications like consumer finance, this results in groups ofapplicants that are persistently denied and thus never enter into the trainingdata. In this work, we formalize censoring, demonstrate how it can arise, andhighlight difficulties in detection. We consider safeguards against censoring -recourse and randomized-exploration - both of which ensure we collect labelsfor points that would otherwise go unobserved. The resulting techniques allowexamples from censored groups to enter into the training data and correct themodel. Our results highlight the otherwise unmeasured harms of censoring anddemonstrate the effectiveness of mitigation strategies across a range of datagenerating processes.", "output": "Algorithmic Censoring in Dynamic Learning Systems."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Deep image classification models trained on vast amounts of web-scraped dataare susceptible to data poisoning - a mechanism for backdooring models. A smallnumber of poisoned samples seen during training can severely undermine amodel's integrity during inference. Existing work considers an effectivedefense as one that either (i) restores a model's integrity through repair or(ii) detects an attack. We argue that this approach overlooks a crucialtrade-off: Attackers can increase robustness at the expense of detectability(over-poisoning) or decrease detectability at the cost of robustness(under-poisoning). In practice, attacks should remain both undetectable androbust. Detectable but robust attacks draw human attention and rigorous modelevaluation or cause the model to be re-trained or discarded. In contrast,attacks that are undetectable but lack robustness can be repaired with minimalimpact on model accuracy. Our research points to intrinsic flaws in currentattack evaluation methods and raises the bar for all data poisoning attackerswho must delicately balance this trade-off to remain robust and undetectable.To demonstrate the existence of more potent defenders, we propose defensesdesigned to (i) detect or (ii) repair poisoned models using a limited amount oftrusted image-label pairs. Our results show that an attacker who needs to berobust and undetectable is substantially less threatening. Our defensesmitigate all tested attacks with a maximum accuracy decline of 2% using only 1%of clean data on CIFAR-10 and 2.5% on ImageNet. We demonstrate the scalabilityof our defenses by evaluating large vision-language models, such as CLIP.Attackers who can manipulate the model's parameters pose an elevated risk asthey can achieve higher robustness at low detectability compared to datapoisoning attackers.", "output": "Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Medical image segmentation is particularly critical as a prerequisite forrelevant quantitative analysis in the treatment of clinical diseases. Forexample, in clinical cervical cancer radiotherapy, after acquiring subabdominalMRI images, a fast and accurate image segmentation of organs and tumors in MRIimages can optimize the clinical radiotherapy process, whereas traditionalapproaches use manual annotation by specialist doctors, which is time-consumingand laborious, therefore, automatic organ segmentation of subabdominal MRIimages is a valuable research topic.", "output": "An image segmentation algorithm based on multi-scale feature pyramid network."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Tensor network (TN) representation is a powerful technique for data analysisand machine learning. It practically involves a challenging TN structure search(TN-SS) problem, which aims to search for the optimal structure to achieve acompact representation. Existing TN-SS methods mainly adopt a bi-leveloptimization method that leads to excessive computational costs due to repeatedstructure evaluations. To address this issue, we propose an efficientintegrated (single-level) method named SVD-inspired TN decomposition(SVDinsTN), eliminating the need for repeated tedious structure evaluation. Byinserting a diagonal factor for each edge of the fully-connected TN, wecalculate TN cores and diagonal factors simultaneously, with factor sparsityrevealing the most compact TN structure. Experimental results on real-worlddata demonstrate that SVDinsTN achieves approximately $10sim{}10^3$ timesacceleration in runtime compared to the existing TN-SS methods whilemaintaining a comparable level of representation ability.", "output": "SVDinsTN: An Integrated Method for Tensor Network Representation with Efficient Structure Search."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Principal Component Analysis (PCA) is a pivotal technique widely utilized inthe realms of machine learning and data analysis. It aims to reduce thedimensionality of a dataset while minimizing the loss of information. In recentyears, there have been endeavors to utilize homomorphic encryption inprivacy-preserving PCA algorithms for the secure cloud computing scenario.These approaches commonly employ a PCA routine known as PowerMethod, whichtakes the covariance matrix as input and generates an approximate eigenvectorcorresponding to the primary component of the dataset. However, theirperformance is constrained by the absence of an efficient homomorphiccovariance matrix computation circuit and an accurate homomorphic vectornormalization strategy in the PowerMethod algorithm. In this study, we proposea novel approach to privacy-preserving PCA that addresses these limitations,resulting in superior efficiency, accuracy, and scalability compared toprevious approaches", "output": "Improved Privacy-Preserving PCA Using Space-optimized Homomorphic Matrix Multiplication."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Hyperparameter (HP) optimization of deep learning (DL) is essential for highperformance. As DL often requires several hours to days for its training, HPoptimization (HPO) of DL is often prohibitively expensive. This boosted theemergence of tabular or surrogate benchmarks, which enable querying the(predictive) performance of DL with a specific HP configuration in a fraction.However, since the actual runtime of a DL training is significantly differentfrom its query response time, simulators of an asynchronous HPO, e.g.multi-fidelity optimization, must wait for the actual runtime at each iterationin a na\"ive implementation; otherwise, the evaluation order during simulationdoes not match with the real experiment. To ease this issue, we developed aPython wrapper and describe its usage. This wrapper forces each worker to waitso that we yield exactly the same evaluation order as in the real experimentwith only $10^{-2}$ seconds of waiting instead of waiting several hours. Ourimplementation is available at", "output": "Python Wrapper for Simulating Multi-Fidelity Optimization on HPO Benchmarks without Any Wait."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Inference and simulation in the context of high-dimensional dynamical systemsremain computationally challenging problems. Some form of dimensionalityreduction is required to make the problem tractable in general. In this paper,we propose a novel approximate Gaussian filtering and smoothing method whichpropagates low-rank approximations of the covariance matrices. This isaccomplished by projecting the Lyapunov equations associated with theprediction step to a manifold of low-rank matrices, which are then solved by arecently developed, numerically stable, dynamical low-rank integrator.Meanwhile, the update steps are made tractable by noting that the covarianceupdate only transforms the column space of the covariance matrix, which islow-rank by construction. The algorithm differentiates itself from existingensemble-based approaches in that the low-rank approximations of the covariancematrices are deterministic, rather than stochastic. Crucially, this enables themethod to reproduce the exact Kalman filter as the low-rank dimensionapproaches the true dimensionality of the problem. Our method reducescomputational complexity from cubic (for the Kalman filter) to emph{quadratic}in the state-space size in the worst-case, and can achieve emph{linear}complexity if the state-space model satisfies certain criteria. Through a setof experiments in classical data-assimilation and spatio-temporal regression,we show that the proposed method consistently outperforms the ensemble-basedmethods in terms of error in the mean and covariance with respect to the exactKalman filter. This comes at no additional cost in terms of asymptoticcomputational complexity.", "output": "The Rank-Reduced Kalman Filter: Approximate Dynamical-Low-Rank Filtering In High Dimensions."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Artificial intelligence (AI) has seen a tremendous surge in capabilitiesthanks to the use of foundation models trained on internet-scale data. On theflip side, the uncurated nature of internet-scale data also poses significantprivacy and legal risks, as they often contain personal information orcopyrighted material that should not be trained on without permission. In thiswork, we propose as a mitigation measure a recipe to train foundation visionmodels with differential privacy (DP) guarantee. We identify maskedautoencoders as a suitable learning algorithm that aligns well with DP-SGD, andtrain ViP -- a Vision transformer with differential Privacy -- under a strictprivacy budget of $epsilon=8$ on the LAION400M dataset. We evaluate thequality of representation learned by ViP using standard downstream visiontasks; in particular, ViP achieves a (non-private) linear probing accuracy of$55.7%$ on ImageNet, comparable to that of end-to-end trained AlexNet (trainedand evaluated on ImageNet). Our result suggests that scaling to internet-scaledata can be practical for private learning. Code is available aturl{", "output": "ViP: A Differentially Private Foundation Model for Computer Vision."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Energy time-series analysis describes the process of analyzing past energyobservations and possibly external factors so as to predict the future.Different tasks are involved in the general field of energy time-seriesanalysis and forecasting, with electric load demand forecasting, personalizedenergy consumption forecasting, as well as renewable energy generationforecasting being among the most common ones. Following the exceptionalperformance of Deep Learning (DL) in a broad area of vision tasks, DL modelshave successfully been utilized in time-series forecasting tasks. This paperaims to provide insight into various DL methods geared towards improving theperformance in energy time-series forecasting tasks, with special emphasis inGreek Energy Market, and equip the reader with the necessary knowledge to applythese methods in practice.", "output": "Deep Learning for Energy Time-Series Analysis and Forecasting."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Communication overhead is one of the major challenges in FederatedLearning(FL). A few classical schemes assume the server can extract theauxiliary information about training data of the participants from the localmodels to construct a central dummy dataset. The server uses the dummy datasetto finetune aggregated global model to achieve the target test accuracy infewer communication rounds. In this paper, we summarize the above solutionsinto a data-based communication-efficient FL framework. The key of the proposedframework is to design an efficient extraction module(EM) which ensures thedummy dataset has a positive effect on finetuning aggregated global model.Different from the existing methods that use generator to design EM, ourproposed method, FedINIBoost borrows the idea of gradient match to constructEM. Specifically, FedINIBoost builds a proxy dataset of the real dataset in twosteps for each participant at each communication round. Then the serveraggregates all the proxy datasets to form a central dummy dataset, which isused to finetune aggregated global model. Extensive experiments verify thesuperiority of our method compared with the existing classical method, FedAVG,FedProx, Moon and FedFTG. Moreover, FedINIBoost plays a significant role infinetuning the performance of aggregated global model at the initial stage ofFL.", "output": "An Efficient Virtual Data Generation Method for Reducing Communication in Federated Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "With the rise of Large Language Models (LLMs) and their ubiquitous deploymentin diverse domains, measuring language model behavior on realistic data isimperative. For example, a company deploying a client-facing chatbot mustensure that the model will not respond to client requests with profanity.Current evaluations approach this problem using small, domain-specific datasetswith human-curated labels. These evaluation sets are often sampled from anarrow and simplified distribution, and data sources can unknowingly be leakedinto the training set which can lead to misleading evaluations. To bypass thesedrawbacks, we propose a framework for self-supervised evaluation of LLMs byanalyzing their sensitivity or invariance to transformations on the input text.Self-supervised evaluation can directly monitor LLM behavior on datasetscollected in the wild or streamed during live model deployment. We demonstrateself-supervised evaluation strategies for measuring closed-book knowledge,toxicity, and long-range context dependence, in addition to sensitivity togrammatical structure and tokenization errors. When comparisons to similarhuman-labeled benchmarks are available, we find strong correlations betweenself-supervised and human-supervised evaluations. The self-supervised paradigmcomplements current evaluation strategies that rely on labeled data.", "output": "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Despite the fact that adversarial training has become the de facto method forimproving the robustness of deep neural networks, it is well-known that vanillaadversarial training suffers from daunting robust overfitting, resulting inunsatisfactory robust generalization. A number of approaches have been proposedto address these drawbacks such as extra regularization, adversarial weightsperturbation, and training with more data over the last few years. However, therobust generalization improvement is yet far from satisfactory. In this paper,we approach this challenge with a brand new perspective -- refining historicaloptimization trajectories. We propose a new method named textbf{WeightedOptimization Trajectories (WOT)} that leverages the optimization trajectoriesof adversarial training in time. We have conducted extensive experiments todemonstrate the effectiveness of WOT under various state-of-the-art adversarialattacks. Our results show that WOT integrates seamlessly with the existingadversarial training methods and consistently overcomes the robust overfittingissue, resulting in better adversarial robustness. For example, WOT boosts therobust accuracy of AT-PGD under AA-$L_{infty}$ attack by 1.53% $sim$ 6.11%and meanwhile increases the clean accuracy by 0.55%$sim$5.47% across SVHN,CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.", "output": "Enhancing Adversarial Training via Reweighting Optimization Trajectory."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The widely used stochastic gradient methods for minimizing nonconvexcomposite objective functions require the Lipschitz smoothness of thedifferentiable part. But the requirement does not hold true for problem classesincluding quadratic inverse problems and training neural networks. To addressthis issue, we investigate a family of stochastic Bregman proximal gradient(SBPG) methods, which only require smooth adaptivity of the differentiablepart. SBPG replaces the upper quadratic approximation used in SGD with theBregman proximity measure, resulting in a better approximation model thatcaptures the non-Lipschitz gradients of the nonconvex objective. We formulatethe vanilla SBPG and establish its convergence properties under nonconvexsetting without finite-sum structure. Experimental results on quadratic inverseproblems testify the robustness of SBPG. Moreover, we propose a momentum-basedversion of SBPG (MSBPG) and prove it has improved convergence properties. Weapply MSBPG to the training of deep neural networks with a polynomial kernelfunction, which ensures the smooth adaptivity of the loss function.Experimental results on representative benchmarks demonstrate the effectivenessand robustness of MSBPG in training neural networks. Since the additionalcomputation cost of MSBPG compared with SGD is negligible in large-scaleoptimization, MSBPG can potentially be employed as an universal open-sourceoptimizer in the future.", "output": "Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Spectral clustering is one of the most popular unsupervised machine learningmethods. Constructing similarity matrix is crucial to this type of method. Inmost existing works, the similarity matrix is computed once for all or isupdated alternatively. However, the former is difficult to reflectcomprehensive relationships among data points, and the latter is time-consumingand is even infeasible for large-scale problems. In this work, we propose arestarted clustering framework with self-guiding and block diagonalrepresentation. An advantage of the strategy is that some useful clusteringinformation obtained from previous cycles could be preserved as much aspossible. To the best of our knowledge, this is the first work that appliesrestarting strategy to spectral clustering. The key difference is that wereclassify the samples in each cycle of our method, while they are classifiedonly once in existing methods. To further release the overhead, we introduce ablock diagonal representation with Nystr\"{o}m approximation for constructingthe similarity matrix. Theoretical results are established to show therationality of inexact computations in spectral clustering. Comprehensiveexperiments are performed on some benchmark databases, which show thesuperiority of our proposed algorithms over many state-of-the-art algorithmsfor large-scale problems. Specifically, our framework has a potential boost forclustering algorithms and works well even using an initial guess chosenrandomly.", "output": "A Restarted Large-Scale Spectral Clustering with Self-Guiding and Block Diagonal Representation."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "The Rashomon Effect describes the following phenomenon: for a given datasetthere may exist many models with equally good performance but with differentsolution strategies. The Rashomon Effect has implications for ExplainableMachine Learning, especially for the comparability of explanations. We providea unified view on three different comparison scenarios and conduct aquantitative evaluation across different datasets, models, attribution methods,and metrics. We find that hyperparameter-tuning plays a role and that metricselection matters. Our results provide empirical support for previouslyanecdotal evidence and exhibit challenges for both scientists andpractitioners.", "output": "An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Vertical Federated Learning (VFL) attracts increasing attention because itempowers multiple parties to jointly train a privacy-preserving model oververtically partitioned data. Recent research has shown that applyingzeroth-order optimization (ZOO) has many advantages in building a practical VFLalgorithm. However, a vital problem with the ZOO-based VFL is its slowconvergence rate, which limits its application in handling modern large models.To address this problem, we propose a cascaded hybrid optimization method inVFL. In this method, the downstream models (clients) are trained with ZOO toprotect privacy and ensure that no internal information is shared. Meanwhile,the upstream model (server) is updated with first-order optimization (FOO)locally, which significantly improves the convergence rate, making it feasibleto train the large models without compromising privacy and security. Wetheoretically prove that our VFL framework converges faster than the ZOO-basedVFL, as the convergence of our framework is not limited by the size of theserver model, making it effective for training large models with the major parton the server. Extensive experiments demonstrate that our method achievesfaster convergence than the ZOO-based VFL framework, while maintaining anequivalent level of privacy protection. Moreover, we show that the convergenceof our VFL is comparable to the unsafe FOO-based VFL baseline. Additionally, wedemonstrate that our method makes the training of a large model feasible.", "output": "Secure and Fast Asynchronous Vertical Federated Learning via Cascaded Hybrid Optimization."}, {"instruction": "If you are an expert in writing papers, please generate a good paper title for this paper based on other authors' descriptions of their abstracts.", "input": "Distributed Machine Learning (DML) systems are utilized to enhance the speedof model training in data centers (DCs) and edge nodes. The Parameter Server(PS) communication architecture is commonly employed, but it faces severelong-tail latency caused by many-to-one \"incast\" traffic patterns, negativelyimpacting training throughput. To address this challenge, we design thetextbf{L}oss-tolerant textbf{T}ransmission textbf{P}rotocol (LTP), whichpermits partial loss of gradients during synchronization to avoid unneededretransmission and contributes to faster synchronization per iteration. LTPimplements loss-tolerant transmission through textit{out-of-ordertransmission} and textit{out-of-order Acknowledges (ACKs)}. LTP employstextit{Early Close} to adjust the loss-tolerant threshold based on networkconditions and textit{Bubble Filling} for data correction to maintain trainingaccuracy. LTP is implemented by C++ and integrated into PyTorch. Evaluations ona testbed of 8 worker nodes and one PS node demonstrate that LTP cansignificantly improve DML training task throughput by up to 30x compared totraditional TCP congestion controls, with no sacrifice to final accuracy.", "output": "Boosting Distributed Machine Learning Training Through Loss-tolerant Transmission Protocol."}] \ No newline at end of file