This repo provides supplementary code and benchmark data for the paper: Elastica: A compliant mechanics environment for soft robotic control, in IEEE Robotics and Automation Letters.
Elastica is a simulation environment for simulating assemblies of one-dimensional soft, slender structures using Cosserat rod theory. More information about Elastica is available on the project website. You can install the Python version of Elastica via pip install pyelastica.
In this repo, Elastica is interfaced with Stable Baselines to investigate how RL can dynamically control a compliant robotic arm. You can install Stable Baselines via pip install stable-baselines[mpi]
(note: Stable Baselines only works with TesorFlow <= v1.15).
Five different RL model-free algorithms from the Stable Baselines implementations are used. Two of them are on-policy algorithms: Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) and three of them are off-policy algorithms: Soft Actor Critic (SAC), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3). Four different cases are considered with detailed explanations given in the paper.
If you discover any bugs, please open an issue and let us know. We plan to actively maintain and develop these benchmark cases.
We have provided visualization scripts we use for these cases in the visualization
folder. Data from hyperparameter tuning is available in the supplementary_data
folder.
In this case, the arm is continuously tracking a randomly moving target in 3D space. Actuations are only allowed in normal and binormal directions with 6 control points in each direction.
- To replicate training using different RL algorithms, run
logging_bio_args.py
located in theCase1/
folder. You can train policies using the five RL algorithms considered by passing the algorithm name as a command-line argument i.e.--algo_name TRPO
. Also, you can control the total number of training timesteps, the random seed, and the timestep per batch as command-line arguments, i.e.--total_timesteps 1E6
,--SEED 0
,--timesteps_per_batch 2048
. In addition to that, you can choose a different number of control points or torque scaling factor by changing thenumber_of_control_points
andalpha
variables insidelogging_bio_args.py
respectively. - To replicate the hyperparameter tuning, run the code
Case1/policy_training_script.py
. Note that the number of CPUs should be edited appropriately in the script. Runtime is 12-24 hours per individual case. - Code for initializing the Elastica simulation environment is located in
Case1/set_environment.py
. Specific details on how Case 1 was implemented are in this file. - Post-processing scripts are located in
Case1/post_processing.py
In this case, the arm is reaching the randomly positioned stationary target, while re-orienting itself to match the orientation of the target. Actuations are allowed in normal, binormal, and tangent directions with 6 control points in each direction.
- To replicate training using different RL algorithms, run
logging_bio_args.py
located in theCase2/
folder. You can train policies using the five RL algorithms considered by passing the algorithm name as a command-line argument i.e.--algo_name TRPO
. Also, you can control the total number of training timesteps, the random seed, and the timestep per batch as command-line arguments, i.e.--total_timesteps 1E6
,--SEED 0
,--timesteps_per_batch 2048
. In addition to that, you can choose a different number of control points or torque scaling factor by changing thenumber_of_control_points
andalpha
variables insidelogging_bio_args.py
respectively. - To replicate the hyperparameter tuning, run the code
Case2/policy_training_script.py
. Note that the number of CPUs should be edited appropriately in the script. Runtime is 12-24 hours per individual case. - Code for initializing the Elastica simulation environment is located in
Case2/set_environment.py
. Specific details on how Case 2 was implemented are in this file. - Post-processing scripts are located in
Case2/post_processing.py
In this case, the arm is reaching a stationary target placed behind an array of eight obstacles with an opening through
which the arm must maneuver to reach the target. Target is placed in the normal plane so that only in-plane actuation is
required. Thus actuation only in the normal direction is allowed. Case 3 has two subcases. First one is training using
2 manually placed control points at 40% and 90% of the arm and the second one is training using 2, 4, 6, and 8
equidistant control points. Code for manually selected control
points are located in Case3/Case3_main-text/
folder and code for equidistant control points are
located in Case3/Case3_SI-ctrl_pts/
.
-
To replicate the manually placed two control points training:
- Run
Case3/Case3_main-text/logging_bio_args_OnPolicy.py
and runCase3/Case3_main-text/logging_bio_args_OffPolicy.py
for on-policy and off-policy algorithms respectively. You can train policies using the five RL algorithms considered by passing the algorithm name as a command-line argument i.e.--algo_name TRPO
. Also, you can control the total number of training timesteps, the random seed, and the timestep per batch as command-line arguments, i.e.--total_timesteps 1E6
,--SEED 0
,--timesteps_per_batch 2048
. In addition to that, you can choose a different torque scaling factor by changing thealpha
variable insidelogging_bio_args_OnPolicy.py
orlogging_bio_args_OffPolicy.py
depending on the policy. The number of control points is fixed for this case and it is two. - To replicate the hyperparameter tuning, run the code
Case3/Case3_main-text/policy_training_script_OnPolicy.py
orCase3/Case3_main-text/policy_training_script_OffPolicy.py
. Note that the number of CPUs should be edited appropriately in the script. Runtime is 3-4 hours per individual case. - Code for initializing the Elastica simulation environment is located in
Case3/Case3_main-text/set_environment.py
. Specific details on how the first subcase of Case 3 was implemented are in this file. - Post-processing scripts are located in
Case3/Case3_main-text/post_processing.py
.
- Run
-
To replicate the equidistantly placed control points training:
- Run
Case3/Case3_SI-ctrl_pts/logging_bio_args.py
. You can train policies using the five RL algorithms considered by passing the algorithm name as a command-line argument i.e.--algo_name TRPO
. Also, you can control the total number of training timesteps, the random seed, and the timestep per batch as command-line arguments, i.e.--total_timesteps 1E6
,--SEED 0
,--timesteps_per_batch 2048
. In addition to that, you can choose a different number of control points or torque scaling factor by changing thenumber_of_control_points
andalpha
variables insidelogging_bio_args_OnPolicy.py
orlogging_bio_args_OffPolicy.py
respectively. - To replicate the hyperparameter tuning, run the code
Case3/Case3_SI-ctrl_pts/policy_training_script.py
. Note that the number of CPUs should be edited appropriately in the script. Runtime is 3-4 hours per individual case. - Code for initializing the Elastica simulation environment is located in
Case3/Case3_SI-ctrl_pts/set_environment.py
. Specific details on how the second subcase of Case 3 was implemented are in this file. - Post-processing scripts are located in
Case3/Case3_SI-ctrl_pts/post_processing.py
.
- Run
In this case, the arm is reaching a stationary target by maneuvering around an unstructured nest of twelve randomly located obstacles. Actuation for this case is similar to Case 3, using two manually placed control points at 40% and 90% of the arm. Different than Case 3 actuations in normal and binormal directions are allowed.
- To replicate training using different RL algorithms, run
logging_bio_args.py
located in theCase4/
folder. You can train policies using the five RL algorithms considered by passing the algorithm name as a command-line argument i.e.--algo_name TRPO
. Also, you can control the total number of training timesteps, the random seed, and the timestep per batch as command-line arguments, i.e.--total_timesteps 1E6
,--SEED 0
,--timesteps_per_batch 2048
. In addition to that, you can choose a different torque scaling factor by changing thealpha
variable insidelogging_bio_args.py
. The number of control points is fixed for this case and it is two. - To replicate the hyperparameter tuning, run the code
Case4/policy_training_script.py
. Note that the number of CPUs should be edited appropriately in the script. Runtime is 6-8 hours per individual case. - Code for initializing the Elastica simulation environment is located in
Case4/set_environment.py
. Specific details on how Case 2 was implemented are in this file. - Post-processing scripts are located in
Case4/post_processing.py
We ask that any publications which use these benchmark cases cite the original paper:
Naughton, Sun, Tekinalp, Parthasarathy, Chowdhary and Gazzola, Elastica: A compliant mechanics environment for soft robotic control, IEEE Robotics and Automation Letters, 2021. doi: 10.1109/LRA.2021.3063698
@article{Naughton2021,
author={Naughton, Noel and Sun, Jiarui and Tekinalp, Arman and Parthasarathy, Tejaswin and Chowdhary, Girish and Gazzola, Mattia},
journal={IEEE Robotics and Automation Letters},
title={Elastica: A compliant mechanics environment for soft robotic control},
year={2021},
volume={6},
number={2},
pages={3389-3396},
doi={10.1109/LRA.2021.3063698}
}