Skip to content

Commit

Permalink
Add sim-to-sim validation
Browse files Browse the repository at this point in the history
  • Loading branch information
diegoferigo committed Jun 3, 2023
1 parent e3cb259 commit 233041d
Show file tree
Hide file tree
Showing 12 changed files with 4,910 additions and 41 deletions.
1 change: 1 addition & 0 deletions Chapters/Part_1/chapter_3.tex
Original file line number Diff line number Diff line change
Expand Up @@ -593,6 +593,7 @@ \subsection{Policy Gradient}
\end{remark*}

\subsection{Generalized Advantage Estimation}
\label{sec:gae}

Policy gradient methods are not uniquely defined by the final forms of Equation~\eqref{equation:policy_gradient_final} and Equation~\eqref{equation:policy_gradient_reward_to_go}.
They are just two specific cases of a more general formulation expressed in the following form:
Expand Down
2 changes: 2 additions & 0 deletions Chapters/Part_2/chapter_7.tex
Original file line number Diff line number Diff line change
Expand Up @@ -594,6 +594,7 @@ \section{Validation}
The box should start accelerating only when the applied force is able to overcome the opposing effects due to friction.
We conclude this section by validating the contact model on non-flat terrain.
We simulate a falling box over an inclined plane characterised by different coefficients of friction, and compare its trajectory with the Mujoco simulator~\parencite{todorov_mujoco_2012}.
The specifications of the machine used to execute the validation experiments are reported in Table~\ref{tab:laptop_specifications_validation}.

\subsection{Bouncing Ball}

Expand Down Expand Up @@ -674,6 +675,7 @@ \subsection{Sliding Box on Flat Terrain}
\end{figure}

\subsection{Sliding Box on Inclined Plane}
\label{sec:sliding_box_inclined_plane}

\begin{figure}
\centering
Expand Down
294 changes: 274 additions & 20 deletions Chapters/Part_2/chapter_8.tex

Large diffs are not rendered by default.

7 changes: 4 additions & 3 deletions Chapters/epilogue.tex
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,11 @@ \subsection*{\autoref{ch:scaling_rigid_body_simulations}: Scaling Rigid Body Sim
To conclude the discussion on the features related to \jax, our algorithms are not yet compatible with its \ac{AD} capability.
The activities to assess the support and implement \ac{AD} support are ongoing, and we expect they will enable us to start investigating all the new emerging methodologies involving differentiable simulations.

Other activities planned for the near future involve the \ac{RL} stack built over \jaxsim.
The combination of an environment interfacing with \jaxsim and \ac{RL} algorithms implemented in \jax would result in a single application whose data never leaves the hardware accelerator.
Other activities planned for the near future involve enhancing the \ac{RL} stack built over \jaxsim.
The combination of an environment interfacing with \jaxsim and \ac{RL} algorithms implemented in \jax results in a single application whose data never leaves the hardware accelerator.
Therefore, beyond the sampling performance of parallel simulations, the complete pipeline would also prevent the data transfer overhead that is always present when some computation has to happen on \acp{CPU}.
We already implemented a \jax version of \ac{PPO} and tested on the canonical examples of inverted pendulum and cartpole swing-up, but the results are too preliminary and have not been included in this thesis.
In Section~\ref{sec:jaxsim_validation}, we provided a continuous control validation by sampling from a cartpole environment simulated entirely on \ac{GPU}.
However, we used an existing \ac{PPO} implementation not developed in \jax, therefore it was not possible to compile in \ac{JIT} the entire collection of the batch but only an individual parallelized sample.
Future work will continue this activity, extending the investigation to contact-rich locomotion problems.
Finally, we would like to embed these environments in Gym-Ignition, creating a new \jaxsim \scenario component, so that all the benefits of future real-time backends could be applicable on \jaxsim experiments.
Towards this goal, Gym-Ignition should switch to the upcoming functional version of \verb|gym.Env| that has been recently proposed upstream.
Expand Down
2 changes: 1 addition & 1 deletion FrontBackmatter/contents.tex
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@
\nomenclature[L, 17]{$\langle \mathcal{S}, \mathcal{A}, \mathcal{R}, \mathcal{P}, \mathcal{\rho}_0 \rangle$}{Tuple defining a Markov Decision Process}
\nomenclature[L, 18]{$V^\pi(s)$}{State-value function for policy $\pi$ at state $s$}
\nomenclature[L, 18]{$Q^\pi(s, a)$}{Action-value function for policy $\pi$ at state-action pair $(s, a)$}
\nomenclature[L, 19]{$A^\pi(s, a)$}{Advantage function for policy $pi$ at state-action pair $(s, a)$}
\nomenclature[L, 19]{$A^\pi(s, a)$}{Advantage function for policy $\pi$ at state-action pair $(s, a)$}
\nomenclature[L, 20]{$\mathbb{E}[\cdot]$}{Expected value of a random variable}
\nomenclature[L, 21]{$\hat{\mathbb{E}}[\cdot]$}{Empirical average estimating the expected value of a random variable from samples}

Expand Down
1 change: 1 addition & 0 deletions classicthesis-config.tex
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@
% 4. Setup floats: tables, (sub)figures, and captions
% ****************************************************************************************************

\usepackage{pdflscape}
\usepackage{rotating}
\usepackage{tabularx} % better tables
\setlength{\extrarowheight}{3pt} % increase table row height
Expand Down
Binary file added images/contributions/chapter_8/cartpole.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 233041d

Please sign in to comment.