-
Notifications
You must be signed in to change notification settings - Fork 144
/
all.bib
1761 lines (1683 loc) · 89.9 KB
/
all.bib
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
@misc{paulus2024advprompter,
title={AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs},
author={Anselm Paulus* and Arman Zharmagambetov* and Chuan Guo and Brandon Amos$^{\dagger}$ and Yuandong Tian$^{\dagger}$},
year={2024},
url={https://arxiv.org/abs/2404.16873},
codeurl={https://github.com/facebookresearch/advprompter},
_venue={arXiv},
selected={true},
abstract={
While recently Large Language Models (LLMs) have achieved remarkable successes, they are vulnerable to certain jailbreaking attacks that lead to generation of inappropriate or harmful content. Manual red-teaming requires finding adversarial prompts that cause such jailbreaking, e.g. by appending a suffix to a given instruction, which is inefficient and time-consuming.
On the other hand, automatic adversarial prompt generation often leads to semantically meaningless attacks that can easily be detected by perplexity-based filters, may require gradient information from the target LLM, or do not scale well due to time-consuming discrete optimization processes over the token space. In this paper, we present a novel method that uses another LLM, called the AdvPrompter, to generate human-readable adversarial prompts in seconds, approximately 800 times faster than existing optimization-based approaches.
We train the AdvPrompter using a novel algorithm that does not require access to the gradients of the target LLM. This process alternates between two steps: (1) generating high-quality target adversarial suffixes by optimizing the AdvPrompter predictions, and (2) low-rank fine-tuning of the AdvPrompter with the generated adversarial suffixes. The trained AdvPrompter generates suffixes that veil the input instruction without changing its meaning, such that the target LLM is lured to give a harmful response. Experimental results on popular open source target LLMs show state-of-the-art results on the AdvBench dataset, that also transfer to closed-source black-box LLM APIs. Further, we demonstrate that by fine-tuning on a synthetic dataset generated by Advprompter, LLMs can be made more robust against jailbreaking attacks while maintaining performance, i.e. high MMLU scores.
}
}
@misc{pooladian2024neural,
title = {Neural Optimal Transport with Lagrangian Costs},
author = {Aram-Alexandre Pooladian and Carles Domingo-Enrich and Ricky T. Q. Chen and Brandon Amos},
year = {2024},
_venue={UAI},
selected={true},
url={https://arxiv.org/abs/2406.00288},
codeurl={https://github.com/facebookresearch/lagrangian-ot},
abstract={
We investigate the optimal transport problem between probability measures when the underlying cost function is understood to satisfy a least action principle, also known as a Lagrangian cost. These generalizations are useful when connecting observations from a physical system, where the transport dynamics are influenced by the geometry of the system, such as obstacles, (e.g., incorporating barrier functions in the Lagrangian) and allows practitioners to incorporate a priori knowledge of the underlying system such as non-Euclidean geometries (e.g., paths must be circular). Our contributions are of computational interest, where we demonstrate the ability to efficiently compute geodesics and amortize spline-based paths, which has not been done before, even in low dimensional problems. Unlike prior work, we also output the resulting Lagrangian optimal transport map without requiring an ODE solver. We demonstrate the effectiveness of our formulation on low-dimensional examples taken from prior work.
}
}
@misc{sambharya2024learning,
title={Learning to Warm-Start Fixed-Point Optimization Algorithms},
author={Rajiv Sambharya and Georgina Hall and Brandon Amos and Bartolomeo Stellato},
year={2024},
url={https://arxiv.org/abs/2309.07835},
codeurl={https://github.com/stellatogrp/l2ws},
_venue={JMLR},
abstract={
We introduce a machine-learning framework to warm-start fixed-point optimization algorithms. Our architecture consists of a neural network mapping problem parameters to warm starts, followed by a predefined number of fixed-point iterations. We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution. In this way, the neural network predicts warm starts with the end-to-end goal of minimizing the downstream loss. An important feature of our architecture is its flexibility, in that it can predict a warm start for fixed-point algorithms run for any number of steps, without being limited to the number of steps it has been trained on. We provide PAC-Bayes generalization bounds on unseen data for common classes of fixed-point operators: contractive, linearly convergent, and averaged. Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts.
}
}
@misc{silvestri2024score,
title={Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning},
author={Mattia Silvestri and Senne Berden and Jayanta Mandi and Ali İrfan Mahmutoğulları and Brandon Amos and Tias Guns and Michele Lombardi},
year={2024},
url={https://arxiv.org/abs/2307.05213},
_venue={arXiv},
abstract={
Many real-world optimization problems contain parameters that are unknown before deployment time, either due to stochasticity or to lack of information (e.g., demand or travel times in delivery problems). A common strategy in such cases is to estimate said parameters via machine learning (ML) models trained to minimize the prediction error, which however is not necessarily aligned with the downstream task-level error. The decision-focused learning (DFL) paradigm overcomes this limitation by training to directly minimize a task loss, e.g. regret. Since the latter has non-informative gradients for combinatorial problems, state-of-the-art DFL methods introduce surrogates and approximations that enable training. But these methods exploit specific assumptions about the problem structures (e.g., convex or linear problems, unknown parameters only in the objective function). We propose an alternative method that makes no such assumptions, it combines stochastic smoothing with score function gradient estimation which works on any task loss. This opens up the use of DFL methods to nonlinear objectives, uncertain parameters in the problem constraints, and even two-stage stochastic optimization. Experiments show that it typically requires more epochs, but that it is on par with specialized methods and performs especially well for the difficult case of problems with uncertainty in the constraints, in terms of solution quality, scalability, or both.
}
}
@inproceedings{atanackovic2024meta,
title={Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold},
author={Lazar Atanackovic and Xi Zhang and Brandon Amos and Mathieu Blanchette and Leo J Lee and Yoshua Bengio and Alexander Tong and Kirill Neklyudov},
booktitle={ICML 2024 Workshop on Geometry-grounded Representation Learning and Generative Modeling},
year={2024},
url={https://arxiv.org/abs/2408.14608},
_venue={ICML GRaM Workshop},
abstract={
Numerous biological and physical processes can be modeled as systems of interacting entities evolving continuously over time, e.g. the dynamics of communicating cells or physical particles. Learning the dynamics of such systems is essential for predicting the temporal evolution of populations across novel samples and unseen environments. Flow-based models allow for learning these dynamics at the population level - they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We argue that multiple processes in natural sciences have to be represented as vector fields on the Wasserstein manifold of probability densities. That is, the change of the population at any moment in time depends on the population itself due to the interactions between samples. In particular, this is crucial for personalized medicine where the development of diseases and their respective treatment response depends on the microenvironment of cells specific to each patient. We propose Meta Flow Matching (MFM), a practical approach to integrating along these vector fields on the Wasserstein manifold by amortizing the flow model over the initial populations. Namely, we embed the population of samples using a Graph Neural Network (GNN) and use these embeddings to train a Flow Matching model. This gives MFM the ability to generalize over the initial distributions unlike previously proposed methods. We demonstrate the ability of MFM to improve prediction of individual treatment responses on a large scale multi-patient single-cell drug screen dataset.
}
}
@inproceedings{lotfi2024unlocking,
title={Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models},
author={Sanae Lotfi and Yilun Kuang and Marc Anton Finzi and Brandon Amos and Micah Goldblum and Andrew Gordon Wilson},
_venue={ICML TF2M Workshop},
_note={Best paper},
year={2024},
url={https://openreview.net/forum?id=cQWsTeTSkZ},
abstract={
Large language models (LLMs) with billions of parameters excel at
predicting the next token in a sequence. Recent work
computes non-vacuous compression-based
generalization bounds for LLMs, but these bounds are
vacuous for large models at the billion-parameter
scale. Moreover, these bounds are obtained through
restrictive compression techniques, bounding
compressed models that generate low-quality
text. Additionally, the tightness of these existing
bounds depends on the number of IID documents in a
training set rather than the much larger number of
non-IID constituent tokens, leaving untapped
potential for tighter bounds. In this work, we
instead use properties of martingales to derive
generalization bounds that benefit from the vast
number of tokens in LLM training sets. Since a
dataset contains far more tokens than documents, our
generalization bounds not only tolerate but actually
benefit from far less restrictive compression
schemes. With Monarch matrices, Kronecker
factorizations, and post-training quantization, we
achieve non-vacuous generalization bounds for LLMs
as large as LLaMA2-70B. Unlike previous approaches,
our work achieves the first non-vacuous bounds for
models that are deployed in practice and generate
high-quality text.
}
}
@misc{amos2023tutorial,
title={Tutorial on amortized optimization},
author={Brandon Amos},
year={2023},
url={https://arxiv.org/abs/2202.00665},
_venue={Foundations and Trends in Machine Learning},
codeurl={https://github.com/facebookresearch/amortized-optimization-tutorial},
selected={true},
abstract={
Optimization is a ubiquitous modeling tool and is often deployed
in settings which repeatedly solve similar instances
of the same problem. Amortized optimization methods
use learning to predict the solutions to problems in
these settings, exploiting the shared structure
between similar problem instances. These methods
have been crucial in variational inference and
reinforcement learning and are capable of solving
optimization problems many orders of magnitudes
times faster than traditional optimization methods
that do not use amortization. This tutorial presents
an introduction to the amortized optimization
foundations behind these advancements and overviews
their applications in variational inference, sparse
coding, gradient-based meta-learning, control,
reinforcement learning, convex optimization, optimal
transport, and deep equilibrium networks.
}
}
@misc{amos2023amortizing,
title={On amortizing convex conjugates for optimal transport},
author={Brandon Amos},
year={2023},
url={https://arxiv.org/abs/2210.12153},
codeurl={https://github.com/facebookresearch/w2ot},
_venue={ICLR},
selected={true},
abstract={
This paper focuses on computing the convex conjugate operation that
arises when solving Euclidean Wasserstein-2 optimal
transport problems. This conjugation, which is also
referred to as the Legendre-Fenchel conjugate or
c-transform, is considered difficult to compute and
in practice, Wasserstein-2 methods are limited by
not being able to exactly conjugate the dual
potentials in continuous space. I show that
combining amortized approximations to the conjugate
with a solver for fine-tuning is computationally
easy. This combination significantly improves the
quality of transport maps learned for the
Wasserstein-2 benchmark by Korotin et al. (2021) and
is able to model many 2-dimensional couplings and
flows considered in the literature.
}
}
@misc{sambharya2023l2a,
title={End-to-End Learning to Warm-Start for Real-Time Quadratic Optimization},
author={Rajiv Sambharya and Georgina Hall and Brandon Amos and Bartolomeo Stellato},
year={2023},
url={https://arxiv.org/abs/2212.08260},
codeurl={https://github.com/stellatogrp/l2ws},
_venue={L4DC},
abstract={
First-order methods are widely used to solve convex quadratic programs
(QPs) in real-time applications because of their low
per-iteration cost. However, they can suffer from
slow convergence to accurate solutions. In this
paper, we present a framework which learns an
effective warm-start for a popular first-order
method in real-time applications, Douglas-Rachford
(DR) splitting, across a family of parametric
QPs. This framework consists of two modules: a
feedforward neural network block, which takes as
input the parameters of the QP and outputs a
warm-start, and a block which performs a fixed
number of iterations of DR splitting from this
warm-start and outputs a candidate solution. A key
feature of our framework is its ability to do
end-to-end learning as we differentiate through the
DR iterations. To illustrate the effectiveness of
our method, we provide generalization bounds (based
on Rademacher complexity) that improve with the
number of training problems and number of iterations
simultaneously. We further apply our method to three
real-time applications and observe that, by learning
good warm-starts, we are able to significantly
reduce the number of iterations required to obtain
high-quality solutions.
}
}
@misc{amos2023meta,
title={Meta Optimal Transport},
author={Brandon Amos and Samuel Cohen and Giulia Luise and Ievgen Redko},
year={2023},
url={https://arxiv.org/abs/2206.05262},
codeurl={https://github.com/facebookresearch/meta-ot},
_venue={ICML},
selected={true},
abstract={
We study the use of amortized optimization to predict optimal
transport (OT) maps from the input measures, which
we call Meta OT. This helps repeatedly solve similar
OT problems between different measures by leveraging
the knowledge and information present from past
problems to rapidly predict and solve new
problems. Otherwise, standard methods ignore the
knowledge of the past solutions and suboptimally
re-solve each problem from scratch. Meta OT models
surpass the standard convergence rates of
log-Sinkhorn solvers in the discrete setting and
convex potentials in the continuous setting. We
improve the computational time of standard OT
solvers by multiple orders of magnitude in discrete
and continuous transport settings between images,
spherical data, and color palettes.
}
}
@misc{pooladian2023multisample,
title={Multisample Flow Matching: Straightening Flows with Minibatch Couplings},
author={Aram-Alexandre Pooladian and Heli Ben-Hamu and Carles Domingo-Enrich and Brandon Amos and Yaron Lipman and Ricky T. Q. Chen},
year={2023},
_venue={ICML},
url={https://arxiv.org/abs/2304.14772},
abstract={
Simulation-free methods for training continuous-time generative models
construct probability paths that go between noise
distributions and individual data samples. Recent
works, such as Flow Matching, derived paths that are
optimal for each data sample. However, these
algorithms rely on independent data and noise
samples, and do not exploit underlying structure in
the data distribution for constructing probability
paths. We propose Multisample Flow Matching, a more
general framework that uses non-trivial couplings
between data and noise samples while satisfying the
correct marginal constraints. At very small overhead
costs, this generalization allows us to (i) reduce
gradient variance during training, (ii) obtain
straighter flows for the learned vector field, which
allows us to generate high-quality samples using
fewer function evaluations, and (iii) obtain
transport maps with lower cost in high dimensions,
which has applications beyond generative
modeling. Importantly, we do so in a completely
simulation-free manner with a simple minimization
objective. We show that our proposed methods improve
sample consistency on downsampled ImageNet data
sets, and lead to better low-cost sample generation.
}
}
@misc{zheng2023semi,
title = {Semi-Supervised Offline Reinforcement Learning with Action-Free Trajectories},
author = {Zheng, Qinqing and Henaff, Mikael and Amos, Brandon and Grover, Aditya},
year = {2023},
url = {https://arxiv.org/abs/2210.06518},
_venue={ICML},
abstract={
Natural agents can effectively learn from multiple data sources that
differ in size, quality, and types of
measurements. We study this heterogeneity in the
context of offline reinforcement learning (RL) by
introducing a new, practically motivated
semi-supervised setting. Here, an agent has access
to two sets of trajectories: labelled trajectories
containing state, action, reward triplets at every
timestep, along with unlabelled trajectories that
contain only state and reward information. For this
setting, we develop a simple meta-algorithmic
pipeline that learns an inverse-dynamics model on
the labelled data to obtain proxy-labels for the
unlabelled data, followed by the use of any offline
RL algorithm on the true and proxy-labelled
trajectories. Empirically, we find this simple
pipeline to be highly successful -- on several D4RL
benchmarks, certain offline RL
algorithms can match the performance of variants
trained on a fully labeled dataset even when we
label only 10\% trajectories from the low return
regime. Finally, we perform a large-scale controlled
empirical study investigating the interplay of
data-centric properties of the labelled and
unlabelled datasets, with algorithmic design choices
(e.g., inverse dynamics, offline RL algorithm) to
identify general trends and best practices for
training RL agents on semi-supervised offline
datasets.
}
}
@misc{bansal2023taskmet,
title = {TaskMet: Task-Driven Metric Learning for Model Learning},
author = {Dishank Bansal and Ricky T. Q. Chen and Mustafa Mukadam and Brandon Amos},
year = {2023},
_venue={NeurIPS},
selected={true},
url={https://arxiv.org/abs/2312.05250},
abstract={
Deep learning models are often used with some downstream
task. Models solely trained to achieve accurate
predictions may struggle to perform well on
the desired downstream tasks. We propose using the
task's loss to learn a metric which parameterizes a
loss to train the model.This approach does not alter
the optimal prediction model itself, but rather
changes the model learning to emphasize the
information important for the downstream task.This
enables us to achieve the best of both worlds:a
prediction model trained in the original prediction
space while also being valuable for the desired
downstream task.We validate our approach through
experiments conducted in two main settings: 1)
decision-focused model learning scenarios involving
portfolio optimization and budget allocation, and2)
reinforcement learning in noisy environments with
distracting states.
}
}
@misc{zharmagambetov2023landscape,
title = {Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information},
author = {Arman Zharmagambetov and Brandon Amos and Aaron Ferber and Taoan Huang and Bistra Dilkina and Yuandong Tian},
year = {2023},
url={https://arxiv.org/abs/2307.08964},
_venue={NeurIPS},
abstract={
Recent works in learning-integrated optimization have shown promise in
settings where the optimization problem is only
partially observed or where general-purpose
optimizers perform poorly without expert tuning. By
learning an optimizer g to tackle these challenging
problems with f as the objective, the optimization
process can be substantially accelerated by
leveraging past experience. Training the optimizer
can be done with supervision from known optimal
solutions (not always available) or implicitly by
optimizing the compound function f ∘ g , but the
implicit approach is slow and challenging due to
frequent calls to the optimizer and sparse
gradients, particularly for combinatorial
solvers. To address these challenges, we propose
using a smooth and learnable Landscape Surrogate
M instead of composing f with g . This surrogate can be computed
faster than g, provides dense and smooth gradients
during training, can generalize to unseen
optimization problems, and is efficiently learned
via alternating optimization. We test our approach
on both synthetic problems and real-world problems,
achieving comparable or superior objective values
compared to state-of-the-art baselines while
reducing the number of calls to g . Notably, our
approach outperforms existing methods for
computationally expensive high-dimensional problems.
}
}
@misc{retchin2023koopman,
title = {Koopman Constrained Policy Optimization: A Koopman operator theoretic method for differentiable optimal control in robotics},
author = {Matthew Retchin and Brandon Amos and Steven Brunton and Shuran Song},
year = {2023},
_venue={ICML Differentiable Almost Everything Workshop},
url={https://differentiable.xyz/papers/paper_45.pdf},
abstract={
We introduce Koopman Constrained Policy Optimization (KCPO),
combining implicitly differentiable model predictive
control with a deep Koopman autoencoder for robot
learning in unknown and nonlinear dynamical
systems. KCPO is a new policy optimization algorithm
that trains neural policies end-to-end with hard box
constraints on controls. Guaranteed satisfaction of
hard constraints helps ensure the performance and
safety of robots. We perform imitation learning with
KCPO to recover expert policies on the Simple
Pendulum, Cartpole Swing-Up, Reacher, and
Differential Drive environments, outperforming
baseline methods in generalizing to
out-of-distribution constraints in most environments
after training.
}
}
@misc{domingoenrich2023stochastic,
title={Stochastic Optimal Control Matching},
author={Carles Domingo-Enrich and Jiequn Han and Brandon Amos and Joan Bruna and Ricky T. Q. Chen},
year={2023},
url={https://arxiv.org/abs/2312.02027},
_venue={arXiv},
abstract={Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models. That is, the control is learned via a least squares problem by trying to fit a matching vector field. The training loss, which is closely connected to the cross-entropy loss, is optimized with respect to both the control function and a family of reparameterization matrices which appear in the matching vector field. The optimization with respect to the reparameterization matrices aims at minimizing the variance of the matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for stochastic optimal control for four different control settings. The key idea underlying SOCM is the path-wise reparameterization trick, a novel technique that is of independent interest, e.g., for generative modeling.}
}
@misc{fickinger2021crossdomain,
title={Cross-Domain Imitation Learning via Optimal Transport},
author={Arnaud Fickinger and Samuel Cohen and Stuart Russell and Brandon Amos},
year={2022},
url={https://arxiv.org/abs/2110.03684},
codeurl={https://github.com/facebookresearch/gwil},
_venue={ICLR},
selected={true},
abstract={
Cross-domain imitation learning studies how to leverage expert
demonstrations of one agent to train an imitation
agent with a different embodiment or
morphology. Comparing trajectories and stationary
distributions between the expert and imitation
agents is challenging because they live on different
systems that may not even have the same
dimensionality. We propose Gromov-Wasserstein
Imitation Learning (GWIL), a method for cross-domain
imitation that uses the Gromov-Wasserstein distance
to align and compare states between the different
spaces of the agents. Our theory formally
characterizes the scenarios where GWIL preserves
optimality, revealing its possibilities and
limitations. We demonstrate the effectiveness of
GWIL in non-trivial continuous control domains
ranging from simple rigid transformation of the
expert domain to arbitrary transformation of the
state-action space.
}
}
@misc{benhamu2022matching,
title = {Matching Normalizing Flows and Probability Paths on Manifolds},
author = {Ben-Hamu*, Heli and Cohen*, Samuel and Bose, Joey and Amos, Brandon and Grover, Aditya and Nickel, Maximilian and Chen, Ricky T. Q. and Lipman, Yaron},
year = {2022},
url = {https://arxiv.org/abs/2207.04711},
_venue={ICML},
selected={false},
abstract={
Continuous Normalizing Flows (CNFs) are a class of generative models
that transform a prior distribution to a model
distribution by solving an ordinary differential
equation (ODE). We propose to train CNFs on
manifolds by minimizing probability path divergence
(PPD), a novel family of divergences between the
probability density path generated by the CNF and a
target probability density path. PPD is formulated
using a logarithmic mass conservation formula which
is a linear first order partial differential
equation relating the log target probabilities and
the CNF's defining vector field. PPD has several key
benefits over existing methods: it sidesteps the
need to solve an ODE per iteration, readily applies
to manifold data, scales to high dimensions, and is
compatible with a large family of target paths
interpolating pure noise and data in finite
time. Theoretically, PPD is shown to bound classical
probability divergences. Empirically, we show that
CNFs learned by minimizing PPD achieve
state-of-the-art results in likelihoods and sample
quality on existing low-dimensional manifold
benchmarks, and is the first example of a generative
model to scale to moderately high dimensional
manifolds.
}
}
@article{chen2022semi,
title={Semi-Discrete Normalizing Flows through Differentiable Tessellation},
author={Ricky T. Q. Chen and Brandon Amos and Maximilian Nickel},
journal={arXiv preprint arXiv:2203.06832},
year={2022},
url={https://arxiv.org/abs/2203.06832},
_venue={NeurIPS},
abstract={
Mapping between discrete and continuous distributions is a difficult
task and many have had to resort to approximate or
heuristical approaches. We propose a
tessellation-based approach that directly learns
quantization boundaries on a continuous space,
complete with exact likelihood evaluations. This is
done through constructing normalizing flows on
convex polytopes parameterized through a
differentiable Voronoi tessellation. Using a simple
homeomorphism with an efficient log determinant
Jacobian, we can then cheaply parameterize
distributions on convex polytopes.
We explore this approach in two application settings, mapping from
discrete to continuous and vice versa. Firstly, a
Voronoi dequantization allows automatically learning
quantization boundaries in a multidimensional
space. The location of boundaries and distances
between regions can encode useful structural
relations between the quantized discrete
values. Secondly, a Voronoi mixture model has
constant computation cost for likelihood evaluation
regardless of the number of mixture
components. Empirically, we show improvements over
existing methods across a range of structured data
modalities, and find that we can achieve a
significant gain from just adding Voronoi mixtures
to a baseline model.
}
}
@misc{pineda2022theseus,
url = {https://arxiv.org/abs/2207.09442},
author = {Pineda, Luis and Fan, Taosha and Monge, Maurizio and Venkataraman, Shobha and Sodhi, Paloma and Chen, Ricky and Ortiz, Joseph and DeTone, Daniel and Wang, Austin and Anderson, Stuart and Dong, Jing and Amos, Brandon and Mukadam, Mustafa},
title = {Theseus: A Library for Differentiable Nonlinear Optimization},
_venue = {NeurIPS},
codeurl={https://github.com/facebookresearch/theseus},
year = 2022,
selected={true},
abstract={
We present Theseus, an efficient application-agnostic open source
library for differentiable nonlinear least squares
(DNLS) optimization built on PyTorch, providing a
common framework for end-to-end structured learning
in robotics and vision. Existing DNLS
implementations are application specific and do not
always incorporate many ingredients important for
efficiency. Theseus is application-agnostic, as we
illustrate with several example applications that
are built using the same underlying differentiable
components, such as second-order optimizers,
standard costs functions, and Lie groups. For
efficiency, Theseus incorporates support for sparse
solvers, automatic vectorization, batching, GPU
acceleration, and gradient computation with implicit
differentiation and direct loss minimization. We do
extensive performance evaluation in a set of
applications, demonstrating significant efficiency
gains and better scalability when these features are
incorporated.
}
}
@misc{vinitsky2022nocturne,
title = {Nocturne: a driving benchmark for multi-agent learning},
author = {Vinitsky, Eugene and Lichtlé, Nathan and Yang, Xiaomeng and Amos, Brandon and Foerster, Jakob},
year = {2022},
url = {https://arxiv.org/abs/2206.09889},
_venue={NeurIPS Datasets and Benchmarks Track},
codeurl={https://github.com/facebookresearch/nocturne},
abstract={
We introduce Nocturne, a new 2D driving simulator for
investigating multi-agent coordination under partial
observability. The focus of Nocturne is to enable
research into inference and theory of mind in
real-world multi-agent settings without the
computational overhead of computer vision and
feature extraction from images. Agents in this
simulator only observe an obstructed view of the
scene, mimicking human visual sensing
constraints. Unlike existing benchmarks that are
bottlenecked by rendering human-like observations
directly using a camera input, Nocturne uses
efficient intersection methods to compute a
vectorized set of visible features in a C++
back-end, allowing the simulator to run at 2000+
steps-per-second. Using open-source trajectory and
map data, we construct a simulator to load and
replay arbitrary trajectories and scenes from
real-world driving data. Using this environment, we
benchmark reinforcement-learning and
imitation-learning agents and demonstrate that the
agents are quite far from human-level coordination
ability and deviate significantly from the expert
trajectories.
}
}
@misc{amos2021modelbased,
title={On the model-based stochastic value gradient for continuous reinforcement learning},
author={Brandon Amos and Samuel Stanton and Denis Yarats and Andrew Gordon Wilson},
year={2021},
_venue={L4DC},
_note={Oral},
url={https://arxiv.org/abs/2008.12775},
codeurl={https://github.com/facebookresearch/svg},
slidesurl={http://bamos.github.io/data/slides/2021.svg.pdf},
_talkurl={https://youtu.be/ABS40GW7Ekk?t=5393},
selected={true},
abstract={
Model-based reinforcement learning approaches add explicit domain
knowledge to agents in hopes of improving the
sample-efficiency in comparison to model-free
agents. However, in practice model-based methods are
unable to achieve the same asymptotic performance on
challenging continuous control tasks due to the
complexity of learning and controlling an explicit
world model. In this paper we investigate the
stochastic value gradient (SVG), which is a
well-known family of methods for controlling
continuous systems which includes model-based
approaches that distill a model-based value
expansion into a model-free policy. We consider a
variant of the model-based SVG that scales to larger
systems and uses 1) an entropy regularization to
help with exploration, 2) a learned deterministic
world model to improve the short-horizon value
estimate, and 3) a learned model-free value estimate
after the model's rollout. This SVG variation
captures the model-free soft actor-critic method as
an instance when the model rollout horizon is zero,
and otherwise uses short-horizon model rollouts to
improve the value estimate for the policy update. We
surpass the asymptotic performance of other
model-based methods on the proprioceptive MuJoCo
locomotion tasks from the OpenAI gym, including a
humanoid. We notably achieve these results with a
simple deterministic world model without requiring
an ensemble.
}
}
@inproceedings{cohen2021riemannian,
title={{Riemannian Convex Potential Maps}},
author={Cohen*, Samuel and Amos*, Brandon and Lipman, Yaron},
booktitle={ICML},
_venue={ICML},
year={2021},
url={https://arxiv.org/abs/2106.10272},
codeurl={https://github.com/facebookresearch/rcpm},
slidesurl={http://bamos.github.io/data/slides/2021.rcpm.pdf},
selected={true},
abstract={
Modeling distributions on Riemannian manifolds is a crucial
component in understanding non-Euclidean data that
arises, e.g., in physics and geology. The budding
approaches in this space are limited by
representational and computational tradeoffs. We
propose and study a class of flows that uses convex
potentials from Riemannian optimal transport. These
are universal and can model distributions on any
compact Riemannian manifold without requiring domain
knowledge of the manifold to be integrated into the
architecture. We demonstrate that these flows can
model standard distributions on spheres, and tori,
on synthetic and geological data.
}
}
@@inproceedings{paulus2021comboptnet,
title={CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints},
author={Paulus, Anselm and Rol{\'\i}nek, Michal and Musil, V{\'\i}t and Amos, Brandon and Martius, Georg},
booktitle={ICML},
_venue={ICML},
year={2021},
url={https://arxiv.org/abs/2105.02343},
codeurl={https://github.com/martius-lab/CombOptNet},
abstract={
Bridging logical and algorithmic reasoning with modern machine
learning techniques is a fundamental challenge with
potentially transformative impact. On the
algorithmic side, many NP-hard problems can be
expressed as integer programs, in which the
constraints play the role of their "combinatorial
specification". In this work, we aim to integrate
integer programming solvers into neural network
architectures as layers capable of learning both the
cost terms and the constraints. The resulting
end-to-end trainable architectures jointly extract
features from raw data and solve a suitable
(learned) combinatorial problem with
state-of-the-art integer programming solvers. We
demonstrate the potential of such layers with an
extensive performance analysis on synthetic data and
with a demonstration on a competitive computer
vision keypoint matching benchmark.
}
}
@inproceedings{fickinger2021scalable,
year={2021},
booktitle={NeurIPS},
_venue={NeurIPS},
url={https://arxiv.org/abs/2109.15316},
title={{Scalable Online Planning via Reinforcement Learning Fine-Tuning}},
author={Arnaud Fickinger and Hengyuan Hu and Brandon Amos and Stuart Russell and Noam Brown},
year={2021},
abstract={
Lookahead search has been a critical component of recent AI successes,
such as in the games of chess, go, and
poker. However, the search methods used in these
games, and in many other settings, are
tabular. Tabular search methods do not scale well
with the size of the search space, and this problem
is exacerbated by stochasticity and partial
observability. In this work we replace tabular
search with online model-based fine-tuning of a
policy neural network via reinforcement learning,
and show that this approach outperforms
state-of-the-art search algorithms in benchmark
settings. In particular, we use our search algorithm
to achieve a new state-of-the-art result in
self-play Hanabi, and show the generality of our
algorithm by also showing that it outperforms
tabular search in the Atari game Ms. Pacman.
}
}
@inproceedings{cohen2020aligning,
title={Aligning Time Series on Incomparable Spaces},
author={Samuel Cohen and Giulia Luise and Alexander Terenin and Brandon Amos and Marc Peter Deisenroth},
booktitle={AISTATS},
year={2021},
_venue={AISTATS},
url={https://arxiv.org/abs/2006.12648},
codeurl={https://github.com/samcohen16/Aligning-Time-Series},
slidesurl={http://bamos.github.io/data/slides/2021.gdtw.pdf},
abstract={
Dynamic time warping (DTW) is a useful method for aligning, comparing
and combining time series, but it requires them to
live in comparable spaces. In this work, we consider
a setting in which time series live on different
spaces without a sensible ground metric, causing DTW
to become ill-defined. To alleviate this, we propose
Gromov dynamic time warping (GDTW), a distance
between time series on potentially incomparable
spaces that avoids the comparability requirement by
instead considering intra-relational geometry. We
derive a Frank-Wolfe algorithm for computing it and
demonstrate its effectiveness at aligning, combining
and comparing time series living on incomparable
spaces. We further propose a smoothed version of
GDTW as a differentiable loss and assess its
properties in a variety of settings, including
barycentric averaging, generative modeling and
imitation learning.
},
}
@inproceedings{chen2021learning,
title={Learning Neural Event Functions for Ordinary Differential Equations},
author={Ricky T. Q. Chen and Brandon Amos and Maximilian Nickel},
booktitle={ICLR},
_venue={ICLR},
year={2021},
url={https://arxiv.org/abs/2011.03902},
codeurl={https://github.com/rtqichen/torchdiffeq},
abstract={
The existing Neural ODE formulation relies on an explicit
knowledge of the termination time. We extend Neural
ODEs to implicitly defined termination criteria
modeled by neural event functions, which can be
chained together and differentiated through. Neural
Event ODEs are capable of modeling discrete
(instantaneous) changes in a continuous-time system,
without prior knowledge of when these changes should
occur or how many such changes should exist. We test
our approach in modeling hybrid discrete- and
continuous- systems such as switching dynamical
systems and collision in multi-body systems, and we
propose simulation-based training of point processes
with applications in discrete control.
}
}
@inproceedings{chen2021neural,
title={Neural Spatio-Temporal Point Processes},
author={Ricky T. Q. Chen and Brandon Amos and Maximilian Nickel},
booktitle={ICLR},
_venue={ICLR},
year={2021},
url={https://arxiv.org/abs/2011.04583},
codeurl={https://github.com/facebookresearch/neural_stpp},
abstract={
We propose a new class of parameterizations for spatio-temporal
point processes which leverage Neural ODEs as a
computational method and enable flexible,
high-fidelity models of discrete events that are
localized in continuous time and space. Central to
our approach is a combination of recurrent
continuous-time neural networks with two novel
neural architectures, i.e., Jump and Attentive
Continuous-time Normalizing Flows. This approach
allows us to learn complex distributions for both
the spatial and temporal domain and to condition
non-trivially on the observed event history. We
validate our models on data sets from a wide variety
of contexts such as seismology, epidemiology, urban
mobility, and neuroscience.
}
}
@inproceedings{yarats2021improving,
title={{Improving Sample Efficiency in Model-Free Reinforcement Learning from Images}},
author={Yarats, Denis and Zhang, Amy and Kostrikov, Ilya and Amos, Brandon and Pineau, Joelle and Fergus, Rob},
journal={arXiv preprint arXiv:1910.01741},
booktitle={AAAI},
_venue={AAAI},
year=2021,
url={https://arxiv.org/abs/1910.01741},
codeurl={https://sites.google.com/view/sac-ae},
abstract={
Training an agent to solve control tasks directly from
high-dimensional images with model-free
reinforcement learning (RL) has proven
difficult. The agent needs to learn a latent
representation together with a control policy to
perform the task. Fitting a high-capacity encoder
using a scarce reward signal is not only sample
inefficient, but also prone to suboptimal
convergence. Two ways to improve sample efficiency
are to extract relevant features for the task and
use off-policy algorithms. We dissect various
approaches of learning good latent features, and
conclude that the image reconstruction loss is the
essential ingredient that enables efficient and
stable representation learning in image-based
RL. Following these findings, we devise an
off-policy actor-critic algorithm with an auxiliary
decoder that trains end-to-end and matches
state-of-the-art performance across both model-free
and model-based algorithms on many challenging
control tasks. We release our code to encourage
future research on image-based RL.
}
}
@article{venkataraman2021neural,
title={Neural Fixed-Point Acceleration for Convex Optimization},
author={Shobha Venkataraman* and Brandon Amos*},
year={2021},
url={https://arxiv.org/abs/2107.10254},
_venue={ICML AutoML Workshop},
codeurl={https://github.com/facebookresearch/neural-scs},
abstract={
Fixed-point iterations are at the heart of numerical computing and
are often a computational bottleneck in real-time
applications that typically need a fast solution of
moderate accuracy. We present neural fixed-point
acceleration which combines ideas from meta-learning
and classical acceleration methods to automatically
learn to accelerate fixed-point problems that are
drawn from a distribution. We apply our framework to
SCS, the state-of-the-art solver for convex cone
programming, and design models and loss functions to
overcome the challenges of learning over unrolled
optimization and acceleration instabilities. Our
work brings neural acceleration into any
optimization problem expressible with CVXPY.
}
}
@misc{cohen2021sliced,
title={Sliced Multi-Marginal Optimal Transport},
author={Samuel Cohen and Alexander Terenin and Yannik Pitcan and Brandon Amos and Marc Peter Deisenroth and K S Sesh Kumar},
year={2021},
url={https://arxiv.org/abs/2102.07115},
_venue={NeurIPS OTML Workshop},
abstract={
Multi-marginal optimal transport enables one to compare multiple
probability measures, which increasingly finds
application in multi-task learning problems. One
practical limitation of multi-marginal transport is
computational scalability in the number of measures,
samples and dimensionality. In this work, we propose
a multi-marginal optimal transport paradigm based on
random one-dimensional projections, whose
(generalized) distance we term the sliced
multi-marginal Wasserstein distance. To construct
this distance, we introduce a characterization of
the one-dimensional multi-marginal Kantorovich
problem and use it to highlight a number of
properties of the sliced multi-marginal Wasserstein
distance. In particular, we show that (i) the sliced
multi-marginal Wasserstein distance is a
(generalized) metric that induces the same topology
as the standard Wasserstein distance, (ii) it admits
a dimension-free sample complexity, (iii) it is
tightly connected with the problem of barycentric
averaging under the sliced-Wasserstein metric. We
conclude by illustrating the sliced multi-marginal
Wasserstein on multi-task density estimation and
multi-dynamics reinforcement learning problems.
}
}
@misc{richterpowell2021input,
title={Input Convex Gradient Networks},
author={Jack Richter-Powell and Jonathan Lorraine and Brandon Amos},
year={2021},
_venue={NeurIPS OTML Workshop},
url={https://arxiv.org/abs/2111.12187},
abstract={
The gradients of convex functions are expressive models of non-trivial
vector fields. For example, Brenier's theorem yields
that the optimal transport map between any two
measures on Euclidean space under the squared
distance is realized as a convex gradient, which is
a key insight used in recent generative flow
models. In this paper, we study how to model convex
gradients by integrating a Jacobian-vector product
parameterized by a neural network, which we call the
Input Convex Gradient Network (ICGN). We
theoretically study ICGNs and compare them to taking
the gradient of an Input-Convex Neural Network
(ICNN), empirically demonstrating that a single
layer ICGN can fit a toy example better than a
single layer ICNN. Lastly, we explore extensions to
deeper networks and connections to constructions
from Riemannian geometry.
}
}
@inproceedings{cohen2021imitation,
title={Imitation Learning from Pixel Observations for Continuous Control},
author={Cohen, Samuel and Amos, Brandon and Deisenroth, Marc Peter and Henaff, Mikael and Vinitsky, Eugene and Yarats, Denis},
_venue={NeurIPS DeepRL Workshop},
year={2021},
url={https://openreview.net/pdf?id=Xe5MFhFvYGX},
abstract={
We study imitation learning from visual observations only for
controlling dynamical systems with continuous states
and actions. This setting is attractive due to the
large amount of video data available from which
agents could learn from. However, it is challenging
due to i) not observing the actions and ii) the
high-dimensional visual space. In this setting, we
explore recipes for imitation learning based on
adversarial learning and optimal transport. These
recipes enable us to scale these methods to attain
expert-level performance on visual continuous
control tasks in the DeepMind control suite. We
investigate the tradeoffs of these approaches and
present a comprehensive evaluation of the key design
choices. To encourage reproducible research in this
area, we provide an easy-to-use implementation for
benchmarking visual imitation learning, including
our methods and expert demonstrations.
}
}
@article{pineda2021mbrl,
title={MBRL-Lib: A Modular Library for Model-based Reinforcement Learning},
author={Pineda, Luis and Amos, Brandon and Zhang, Amy and Lambert, Nathan and Calandra, Roberto},
year={2021},
url={https://arxiv.org/abs/2104.10159},
_venue={arXiv},
codeurl={https://github.com/facebookresearch/mbrl-lib},
abstract={
Model-based reinforcement learning is a compelling framework for
data-efficient learning of agents that interact with
the world. This family of algorithms has many
subcomponents that need to be carefully selected and
tuned. As a result the entry-bar for researchers to
approach the field and to deploy it in real-world
tasks can be daunting. In this paper, we present
MBRL-Lib -- a machine learning library for
model-based reinforcement learning in continuous
state-action spaces based on PyTorch. MBRL-Lib is
designed as a platform for both researchers, to
easily develop, debug and compare new algorithms,
and non-expert user, to lower the entry-bar of
deploying state-of-the-art algorithms.
}
}
@inproceedings{amos2020differentiable,
title={{The Differentiable Cross-Entropy Method}},
author={Amos, Brandon and Yarats, Denis},
booktitle={ICML},
_venue={ICML},
year={2020},
url={https://arxiv.org/abs/1909.12830},
codeurl={https://github.com/facebookresearch/dcem},
slidesurl={http://bamos.github.io/data/slides/2020.dcem.pdf},
selected={true},
abstract={
We study the Cross-Entropy Method (CEM) for the non-convex
optimization of a continuous and parameterized
objective function and introduce a differentiable
variant (DCEM) that enables us to differentiate the
output of CEM with respect to the objective
function's parameters. In the machine learning
setting this brings CEM inside of the end-to-end
learning pipeline where this has otherwise been
impossible. We show applications in a synthetic
energy-based structured prediction task and in
non-convex continuous control. In the control
setting we show on the simulated cheetah and walker
tasks that we can embed their optimal action
sequences with DCEM and then use policy optimization
to fine-tune components of the controller as a step
towards combining model-based and model-free RL.
}
}
@inproceedings{lambert2020objective,
title={Objective Mismatch in Model-based Reinforcement Learning},
author={Lambert, Nathan and Amos, Brandon and Yadan, Omry and Calandra, Roberto},
year={2020},
booktitle={L4DC},
_venue={L4DC},
year={2020},
url={https://arxiv.org/abs/2002.04523},
abstract={