Skip to content

[Performance] Use TensorDict._new_unsafe in step #2905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2025

Conversation

vmoens
Copy link
Collaborator

@vmoens vmoens commented Apr 17, 2025

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2905

Note: Links to docs will display an error until the docs builds have been completed.

❌ 11 New Failures, 1 Pending, 3 Unrelated Failures

As of commit 44f4725 with merge base 382430d (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 17, 2025
[ghstack-poisoned]
Copy link

Result of GPU Benchmark Tests

Expand to view detailed results
Name Max Mean Ops
test_simple 0.9255s 0.8182s 1.2223 Ops/s
test_transformed 1.6057s 1.4669s 0.6817 Ops/s
test_serial 2.4878s 2.3185s 0.4313 Ops/s
test_parallel 2.0834s 1.8694s 0.5349 Ops/s
test_step_mdp_speed[True-True-True-True-True] 0.1559ms 43.7343μs 22.8653 KOps/s
test_step_mdp_speed[True-True-True-True-False] 65.6340μs 25.7044μs 38.9039 KOps/s
test_step_mdp_speed[True-True-True-False-True] 0.1006ms 25.5226μs 39.1810 KOps/s
test_step_mdp_speed[True-True-True-False-False] 40.6520μs 14.1073μs 70.8855 KOps/s
test_step_mdp_speed[True-True-False-True-True] 88.1450μs 48.8645μs 20.4648 KOps/s
test_step_mdp_speed[True-True-False-True-False] 0.1432ms 28.6107μs 34.9519 KOps/s
test_step_mdp_speed[True-True-False-False-True] 63.3930μs 28.3816μs 35.2342 KOps/s
test_step_mdp_speed[True-True-False-False-False] 94.8150μs 17.1701μs 58.2408 KOps/s
test_step_mdp_speed[True-False-True-True-True] 0.1248ms 50.1209μs 19.9517 KOps/s
test_step_mdp_speed[True-False-True-True-False] 59.7930μs 30.8372μs 32.4284 KOps/s
test_step_mdp_speed[True-False-True-False-True] 54.4130μs 27.8116μs 35.9562 KOps/s
test_step_mdp_speed[True-False-True-False-False] 41.0420μs 16.8406μs 59.3802 KOps/s
test_step_mdp_speed[True-False-False-True-True] 1.9716ms 53.3379μs 18.7484 KOps/s
test_step_mdp_speed[True-False-False-True-False] 0.2254ms 34.2277μs 29.2161 KOps/s
test_step_mdp_speed[True-False-False-False-True] 63.5730μs 31.1755μs 32.0765 KOps/s
test_step_mdp_speed[True-False-False-False-False] 50.3330μs 19.5395μs 51.1785 KOps/s
test_step_mdp_speed[False-True-True-True-True] 83.2040μs 51.0085μs 19.6046 KOps/s
test_step_mdp_speed[False-True-True-True-False] 0.1340ms 31.3372μs 31.9109 KOps/s
test_step_mdp_speed[False-True-True-False-True] 0.1957ms 31.9437μs 31.3051 KOps/s
test_step_mdp_speed[False-True-True-False-False] 52.4420μs 18.4611μs 54.1680 KOps/s
test_step_mdp_speed[False-True-False-True-True] 0.2558ms 52.9741μs 18.8771 KOps/s
test_step_mdp_speed[False-True-False-True-False] 0.1510ms 33.0846μs 30.2256 KOps/s
test_step_mdp_speed[False-True-False-False-True] 0.1076ms 34.4395μs 29.0364 KOps/s
test_step_mdp_speed[False-True-False-False-False] 42.7320μs 21.0095μs 47.5976 KOps/s
test_step_mdp_speed[False-False-True-True-True] 85.2140μs 54.5614μs 18.3280 KOps/s
test_step_mdp_speed[False-False-True-True-False] 74.2630μs 36.3297μs 27.5257 KOps/s
test_step_mdp_speed[False-False-True-False-True] 9.7741ms 35.0026μs 28.5693 KOps/s
test_step_mdp_speed[False-False-True-False-False] 54.3130μs 21.3019μs 46.9441 KOps/s
test_step_mdp_speed[False-False-False-True-True] 83.4840μs 57.6071μs 17.3590 KOps/s
test_step_mdp_speed[False-False-False-True-False] 71.4230μs 39.0835μs 25.5863 KOps/s
test_step_mdp_speed[False-False-False-False-True] 66.6930μs 36.4297μs 27.4501 KOps/s
test_step_mdp_speed[False-False-False-False-False] 50.5530μs 23.8060μs 42.0062 KOps/s
test_values[generalized_advantage_estimate-True-True] 25.6841ms 25.1864ms 39.7040 Ops/s
test_values[vec_generalized_advantage_estimate-True-True] 0.1267s 3.4387ms 290.8045 Ops/s
test_values[td0_return_estimate-False-False] 0.1092ms 83.1562μs 12.0256 KOps/s
test_values[td1_return_estimate-False-False] 57.7055ms 56.5351ms 17.6881 Ops/s
test_values[vec_td1_return_estimate-False-False] 1.4345ms 1.0959ms 912.5285 Ops/s
test_values[td_lambda_return_estimate-True-False] 92.0584ms 89.5885ms 11.1621 Ops/s
test_values[vec_td_lambda_return_estimate-True-False] 1.3327ms 1.0899ms 917.5461 Ops/s
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.7893ms 25.0297ms 39.9525 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0792ms 0.7670ms 1.3038 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7788ms 0.6838ms 1.4625 KOps/s
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5505ms 1.4945ms 669.1128 Ops/s
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8594ms 0.7015ms 1.4256 KOps/s
test_dqn_speed[False-None] 3.0933ms 1.5681ms 637.7303 Ops/s
test_dqn_speed[False-backward] 2.6378ms 2.1925ms 456.1049 Ops/s
test_dqn_speed[True-None] 0.7135ms 0.5747ms 1.7400 KOps/s
test_dqn_speed[True-backward] 1.2826ms 1.1622ms 860.4182 Ops/s
test_dqn_speed[reduce-overhead-None] 0.7377ms 0.5767ms 1.7341 KOps/s
test_dqn_speed[reduce-overhead-backward] 1.0589ms 0.9912ms 1.0089 KOps/s
test_ddpg_speed[False-None] 3.6712ms 2.9486ms 339.1470 Ops/s
test_ddpg_speed[False-backward] 4.7181ms 4.2184ms 237.0560 Ops/s
test_ddpg_speed[True-None] 1.4967ms 1.3640ms 733.1128 Ops/s
test_ddpg_speed[True-backward] 2.5923ms 2.4929ms 401.1438 Ops/s
test_ddpg_speed[reduce-overhead-None] 1.5174ms 1.3755ms 727.0307 Ops/s
test_ddpg_speed[reduce-overhead-backward] 2.0834ms 1.9372ms 516.2153 Ops/s
test_sac_speed[False-None] 9.2222ms 8.1447ms 122.7791 Ops/s
test_sac_speed[False-backward] 11.8292ms 11.1905ms 89.3613 Ops/s
test_sac_speed[True-None] 2.0435ms 1.8875ms 529.8019 Ops/s
test_sac_speed[True-backward] 3.8248ms 3.6314ms 275.3748 Ops/s
test_sac_speed[reduce-overhead-None] 20.5990ms 11.8645ms 84.2847 Ops/s
test_sac_speed[reduce-overhead-backward] 1.8028ms 1.6598ms 602.4728 Ops/s
test_redq_speed[False-None] 10.3498ms 7.7382ms 129.2288 Ops/s
test_redq_speed[False-backward] 13.5178ms 11.5620ms 86.4902 Ops/s
test_redq_speed[True-None] 2.5412ms 2.3531ms 424.9628 Ops/s
test_redq_speed[True-backward] 4.3913ms 4.2158ms 237.2037 Ops/s
test_redq_speed[reduce-overhead-None] 2.6320ms 2.3862ms 419.0830 Ops/s
test_redq_speed[reduce-overhead-backward] 4.1359ms 4.0748ms 245.4082 Ops/s
test_redq_deprec_speed[False-None] 9.8494ms 9.1745ms 108.9972 Ops/s
test_redq_deprec_speed[False-backward] 12.9940ms 12.2008ms 81.9621 Ops/s
test_redq_deprec_speed[True-None] 2.8413ms 2.6806ms 373.0491 Ops/s
test_redq_deprec_speed[True-backward] 4.9122ms 4.4366ms 225.3984 Ops/s
test_redq_deprec_speed[reduce-overhead-None] 2.9456ms 2.6902ms 371.7253 Ops/s
test_redq_deprec_speed[reduce-overhead-backward] 4.6042ms 4.4339ms 225.5344 Ops/s
test_td3_speed[False-None] 8.3610ms 8.0934ms 123.5573 Ops/s
test_td3_speed[False-backward] 11.1211ms 10.4117ms 96.0458 Ops/s
test_td3_speed[True-None] 1.7403ms 1.6970ms 589.2687 Ops/s
test_td3_speed[True-backward] 3.4207ms 3.2797ms 304.9070 Ops/s
test_td3_speed[reduce-overhead-None] 60.9283ms 26.8484ms 37.2462 Ops/s
test_td3_speed[reduce-overhead-backward] 1.4103ms 1.3583ms 736.1934 Ops/s
test_cql_speed[False-None] 17.8331ms 16.9628ms 58.9525 Ops/s
test_cql_speed[False-backward] 23.0739ms 22.2220ms 45.0005 Ops/s
test_cql_speed[True-None] 3.5134ms 3.3300ms 300.2965 Ops/s
test_cql_speed[True-backward] 6.1211ms 5.6221ms 177.8709 Ops/s
test_cql_speed[reduce-overhead-None] 20.6717ms 13.1092ms 76.2823 Ops/s
test_cql_speed[reduce-overhead-backward] 1.9341ms 1.8103ms 552.4016 Ops/s
test_a2c_speed[False-None] 5.5280ms 3.2912ms 303.8448 Ops/s
test_a2c_speed[False-backward] 8.9581ms 6.2421ms 160.2015 Ops/s
test_a2c_speed[True-None] 1.5041ms 1.3618ms 734.3431 Ops/s
test_a2c_speed[True-backward] 3.1983ms 2.9768ms 335.9296 Ops/s
test_a2c_speed[reduce-overhead-None] 15.4882ms 8.7660ms 114.0765 Ops/s
test_a2c_speed[reduce-overhead-backward] 1.5901ms 1.4601ms 684.8724 Ops/s
test_ppo_speed[False-None] 6.5544ms 3.8162ms 262.0399 Ops/s
test_ppo_speed[False-backward] 9.9282ms 6.9555ms 143.7707 Ops/s
test_ppo_speed[True-None] 1.6486ms 1.4465ms 691.3045 Ops/s
test_ppo_speed[True-backward] 3.2379ms 3.1115ms 321.3881 Ops/s
test_ppo_speed[reduce-overhead-None] 1.1306ms 0.9486ms 1.0542 KOps/s
test_ppo_speed[reduce-overhead-backward] 1.5476ms 1.4416ms 693.6572 Ops/s
test_reinforce_speed[False-None] 3.3187ms 2.3381ms 427.6934 Ops/s
test_reinforce_speed[False-backward] 4.0416ms 3.3498ms 298.5211 Ops/s
test_reinforce_speed[True-None] 1.7237ms 1.3426ms 744.8434 Ops/s
test_reinforce_speed[True-backward] 3.2289ms 2.9853ms 334.9709 Ops/s
test_reinforce_speed[reduce-overhead-None] 18.7026ms 10.0872ms 99.1354 Ops/s
test_reinforce_speed[reduce-overhead-backward] 1.6011ms 1.5275ms 654.6667 Ops/s
test_iql_speed[False-None] 10.4553ms 9.4777ms 105.5110 Ops/s
test_iql_speed[False-backward] 13.9174ms 13.2900ms 75.2447 Ops/s
test_iql_speed[True-None] 2.7526ms 2.3363ms 428.0195 Ops/s
test_iql_speed[True-backward] 5.3112ms 4.8829ms 204.7973 Ops/s
test_iql_speed[reduce-overhead-None] 19.2494ms 10.9752ms 91.1149 Ops/s
test_iql_speed[reduce-overhead-backward] 2.0511ms 1.9684ms 508.0361 Ops/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.7605ms 6.2559ms 159.8488 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.7305ms 0.3338ms 2.9958 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6813ms 0.3138ms 3.1866 KOps/s
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.5357ms 5.9970ms 166.7513 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1268ms 0.3302ms 3.0286 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7083ms 0.3121ms 3.2044 KOps/s
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7877ms 1.4330ms 697.8608 Ops/s
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7877ms 1.3194ms 757.9183 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5165ms 6.2182ms 160.8186 Ops/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3430ms 0.4727ms 2.1155 KOps/s
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8975ms 0.4466ms 2.2392 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3468ms 6.0588ms 165.0486 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.2250ms 0.3505ms 2.8533 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6871ms 0.3427ms 2.9176 KOps/s
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 9.5850ms 6.0042ms 166.5500 Ops/s
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7642ms 0.3221ms 3.1044 KOps/s
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7885ms 0.3044ms 3.2848 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6357ms 6.1808ms 161.7910 Ops/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9518ms 0.4645ms 2.1529 KOps/s
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8525ms 0.4490ms 2.2272 KOps/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.0441ms 5.5090ms 181.5224 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.1208ms 1.6911ms 591.3333 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.1806ms 1.1743ms 851.5435 Ops/s
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 8.0531ms 5.5780ms 179.2770 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 9.2703ms 1.8125ms 551.7368 Ops/s
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.1254ms 1.0427ms 959.0799 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5477s 16.6376ms 60.1048 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 9.9406ms 2.0135ms 496.6512 Ops/s
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 10.2579ms 1.3731ms 728.2742 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 51.3609ms 50.2769ms 19.8899 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 21.5938ms 17.3455ms 57.6519 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 52.3598ms 50.4753ms 19.8117 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.3725ms 17.4895ms 57.1773 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 52.8300ms 50.8906ms 19.6500 Ops/s
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.0758ms 18.8637ms 53.0119 Ops/s

@vmoens vmoens merged commit 44f4725 into gh/vmoens/126/base Apr 22, 2025
55 of 70 checks passed
vmoens pushed a commit that referenced this pull request Apr 22, 2025
ghstack-source-id: 8a117fb
Pull Request resolved: #2905
@vmoens vmoens deleted the gh/vmoens/126/head branch April 22, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants