where are the 2-3 most computationally-intensive loops in the fortran code that could be parallelized using OPENMP ?