diff --git a/.gitignore b/.gitignore index d7ce66f1..6544fa8b 100644 --- a/.gitignore +++ b/.gitignore @@ -1,10 +1,17 @@ # Emacs *~ \#*\# +.auctex-auto/ +.dir-locals.el # Generated by test plot_*.html +# Outputs from analysis scripts +*.png +out.txt +*.npz + # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] @@ -93,3 +100,23 @@ ENV/ # Rope project settings .ropeproject + +# Job scheduler output +################ +# Slurm +*.out + +# Cobalt +*.output +*.error +*.cobaltlog + +# PBS +# *.o* +# *.e* + +# Etc +*.local + +# TF Profiler output +*.json \ No newline at end of file diff --git a/.travis.yml b/.travis.yml index fc139ca0..9d93f9e6 100644 --- a/.travis.yml +++ b/.travis.yml @@ -1,10 +1,26 @@ language: python - +branches: + only: + - disable # master os: - linux -python: - - 2.7 +dist: xenial + +#python: +# - 2.7 +# - 3.6 + +matrix: + include: + - env: MPI_LIBRARY=openmpi MPI_LIBRARY_VERSION=2.0.0 + python: 2.7 + - env: MPI_LIBRARY=openmpi MPI_LIBRARY_VERSION=2.0.0 + python: 3.6 + - stage: python linter + env: + install: pip install flake8 + script: python -m flake8 && echo "Finished linting Python files with flake8" addons: apt: @@ -12,8 +28,38 @@ addons: - python-numpy - python-setuptools +env: + - TEST_DIR=.; TEST_SCRIPT="python setup.py test" + +# before_install: + install: + - sh ci/travis/install-mpi.sh + - export MPI_PREFIX="${HOME}/opt/${MPI_LIBRARY}-${MPI_LIBRARY_VERSION}" + - export PATH="${HOME}/.local/bin:${MPI_PREFIX}/bin${PATH:+":${PATH}"}" + - export LD_LIBRARY_PATH="${MPI_PREFIX}/lib${LD_LIBRARY_PATH:+":${LD_LIBRARY_PATH}"}" - pip install --upgrade pip + - pip install -r envs/pip-requirements-travis.txt + +# before_script: script: - python setup.py test + - cd $TEST_DIR && $TEST_SCRIPT && cd .. + +# Specify order of stages; build matrix of installation and regression tests +# will only run if linter stage passes +stages: + - python linter + - test + +notifications: + email: + recipients: + - felker@anl.gov + on_success: change + on_failure: never # always + slack: + rooms: + - secure: "NBjGYpIF2VO/GvhbC7XVPIfi0WLGFEuVhi51UbZjfHg4IOv1UuCF0fImi8GN2UZZBFRnZcbtPVffZnyMUyJI0Krw0M1ropkjA4YDaJx1JUGnYuQLxNmX+jbQmN61Usjg5MQOgcRnAn1bdMN1ttWInqkKejpV5buCjZbt8SZbDePfXz4U3No68P/pRsDTVXSy0xLTtRacuEITJjwxfjp6phbmR3qs127MZMRbVYDC2HA6KsoJW6YSKF1vFyHqnFMl7GSavxYw/XQpqFLJkGKXfnNgPZV6qAbVk5+bzyytAbbwLGvgbuFpnJsGPvAyebV8wVYaIPg7OeK+Sm1A3q0jt774NnqFp2AZ+pSKrxbxIkygDM0zoWLm4i3pt6ToJ6dKcqSbCKbELnSY5NphcyYuiJ8uhLVpAR0Y+vp+fOvhb8td/nH2AWkxFpp6xwOHPXvtodBsyPMkiaeKoVElYBfbfOhDSYH2KafADECTX75S63A9KleeNZh0DSImfFdQPaN3GFLEL8Z9UFABzTkM9eYKP9pyEP82Wh/JGDaqbARazEgNzy8rwghomsEguV247XHdOx32PSd0att541gsRFZ5uyMuVFKzI0jiLijekibY2I1c5b6dDeuK4O8uBiq4FTS+bM55Rj4Job01kdxSCKw3RwOh0amzITRTQvEWKpTAekg=" + on_success: never # always + on_failure: never # always diff --git a/README.md b/README.md index 2b9d1fd2..47001529 100644 --- a/README.md +++ b/README.md @@ -1,108 +1,47 @@ -## FRNN - PPPL deep learning disruption prediction package +# FRNN -The FRNN code workflow is similar to that characteristic of typical distributed deep learning projects. -First, the raw data is preprocessed and normalized. The pre-processing step involves cutting, resampling, -and structuring the data - as well as determining and validating the disruptive properties of -the shots considered. Various options for normalization are implemented. +[![Build Status](https://travis-ci.com/PPPLDeepLearning/plasma-python.svg?branch=master)](https://travis-ci.com/PPPLDeepLearning/plasma-python) +[![Build Status](https://jenkins.princeton.edu/buildStatus/icon?job=FRNM/PPPL)](https://jenkins.princeton.edu/job/FRNM/job/PPPL/) -Secondly, with respect to distributed data-parallel training of the model, the associated parameters are check-pointed after each epoch on the disk, in HDF5 file format. Finally – regarding the cross validation and prediction step on -unlabeled data, it is planned to also implement a hyper-parameter tuning; approach using a random search algorithm. +## Package description -The results are stored as HDF5 files, including the final neural network model parameters together with -statistical summaries of the variables used during training to allow researchers to produce learning -curves and performance summary plots. +The Fusion Recurrent Neural Net (FRNN) software is a Python package that implements deep learning models for disruption prediction in tokamak fusion plasmas. -The Fusion Recurrent Neural Net (FRNN) deep learning code is implemented as a Python package -consisting of 4 core modules: +It consists of 4 core modules: -- models: Python classes necessary to construct, train and optimize deep RNN models. Including a distributed data-parallel implementation of mini-batch gradient descent with MPI +- `models`: Python classes necessary to construct, train and optimize deep RNN models. Including a distributed data-parallel synchronous implementation of mini-batch gradient descent. FRNN makes use of MPI for communication and supports TensorFlow via the high-level Keras API. FRNN offers the built-in ability to run hyperparameter search optimizations. -- preprocessors: signal preprocessing and normalization classes, including the methods necessary to prepare physical data for stateful RNN training. +- `preprocessors`: signal preprocessing and normalization classes, including the methods necessary to prepare physical data for stateful LSTM training. -- primitives: contains abstractions specific to the domain implemented as Python classes. For instance: Shot - a measurement of plasma current as a function of time. The Shot object contains attributes corresponding to unique identifier of a shot, disruption time in milliseconds, time profile of the shot converted to time-to- disruption values, validity of a shot (whether plasma current reaches a certain value during the shot), etc +- `primitives`: contains abstractions specific to the domain, implemented as Python classes. For instance, `Shot`: a measurement of plasma current as a function of time. The Shot object contains attributes corresponding to unique identifier of a shot, disruption time in milliseconds, time profile of the shot converted to time-to-disruption values, validity of a shot (whether plasma current reaches a certain value during the shot), etc. Other primitives include `Machines` and `Signals` which carry the relevant information necessary for incorporating physics data into the overall pipeline. Signals know the Machine they live on, their mds+ paths, code for being downloaded, preprocessing approaches, their dimensionality, etc. Machines know which Signals are defined on them, which mds+ server houses the data, etc. -- utilities: a set of auxiliary functions for preprocessing, performance evaluation and learning curves analysis +- `utilities`: a set of auxiliary functions for preprocessing, performance evaluation and learning curves analysis. -This is a pure Python implementation for Python versions 2.6 and 2.7. +In addition to the `utilities` FRNN supports TensorBoard scaler variable summaries, histogramms of layers, activations and gradients and graph visualizations. -## Installation +This is a pure Python implementation for Python versions 3.6+. -The package comes with a standard setup script and a list of dependencies which include: mpi4py, Theano, -Keras, h5py, Pathos. It also requires a standard set of CUDA drivers to run on GPU. +## Installation -Run: -```bash -pip install -i https://testpypi.python.org/pypi plasma -``` -optionally add `--user` to install in a home directory. +The package comes with a standard setup script and a list of dependencies which include: mpi4py, TensorFlow, h5py, Pathos. It also requires a standard set of CUDA drivers to run on GPU. -Alternatively, use the setup script: +Then checkout the repo and use the setup script: ```bash -python setup.py install +git clone https://github.com/PPPLDeepLearning/plasma-python +cd plasma-python +pip install -e . ``` with `sudo` if superuser permissions are needed or `--home=~` to install in a home directory. The latter option requires an appropriate `PYTHONPATH`. -## Module index - -## Tutorials - -### Sample usage on Tiger - +Alternatively run (no need to checkout the repository in that case): ```bash -module load anaconda cudatoolkit/7.5 cudann openmpi/intel-16.0/1.8.8/64 -source activate environment -python setup.py install -``` - -Where `environment` should contain the Python packages as per `requirements.txt` file. - -#### Preprocessing - -```bash -python guarantee_preprocessed.py -``` - -#### Training and inference - -Use Slurm scheduler to perform batch or interactive analysis on Tiger cluster. - -##### Batch analysis - -For batch analysis, make sure to allocate 1 process per GPU: - -```bash -#SBATCH -N X -#SBATCH --ntasks-per-node=4 -#SBATCH --ntasks-per-socket=2 -#SBATCH --gres=gpu:4 -``` -where X is the number of nodes for distibuted data parallel training. - - -```bash -sbatch slurm.cmd -``` - -##### Interactive analysis - -The workflow is to request an interactive session: - -```bash -salloc -N [X] --ntasks-per-node=16 --ntasks-per-socket=8 --gres=gpu:4 -t 0-6:00 -``` -where the number of GPUs is X * 4. - - -Then launch the application from the command line: - -```bash -cd plasma-python -mpirun -npernode 4 python examples/mpi_learn.py +pip install -i https://testpypi.python.org/pypi plasma ``` +optionally add `--user` to install in a home directory. -Note: there is Theano compilation going on in the 1st epoch which will distort timing. It is recommended to perform testing setting `num_epochs >= 2` in `conf.py`. +## Tutorials -## Status +For a tutorial, check out: [PrincetonUTutorial.md](docs/PrincetonUTutorial.md) diff --git a/ci/jenkins/jenkins.sh b/ci/jenkins/jenkins.sh new file mode 100644 index 00000000..6dbf1170 --- /dev/null +++ b/ci/jenkins/jenkins.sh @@ -0,0 +1,33 @@ +#!/bin/bash + +export OMPI_MCA_btl="tcp,self,sm" + +echo ${PWD} + +echo "Jenkins test Python3.6" +rm /tigress/alexeys/model_checkpoints/* +rm -rf /tigress/alexeys/processed_shots +rm -rf /tigress/alexeys/processed_shotlists +rm -rf /tigress/alexeys/normalization +module load anaconda3/4.4.0 +source activate /tigress/alexeys/jenkins/.conda/envs/jenkins3 +module load cudatoolkit/8.0 +module load cudnn/cuda-8.0/6.0 +module load openmpi/cuda-8.0/intel-17.0/2.1.0/64 +module load intel/17.0/64/17.0.4.196 + +echo ${PWD} +cd /home/alexeys/jenkins/workspace/FRNM/PPPL +echo ${PWD} +python setup.py install + +echo $SLURM_NODELIST +cd examples +echo ${PWD} +ls ${PWD} +sed -i -e 's/num_epochs: 1000/num_epochs: 2/g' conf.yaml +sed -i -e 's/data: jet_data/data: jenkins_jet/g' conf.yaml + +srun python mpi_learn.py + +echo "Jenkins test Python2.7" diff --git a/ci/jenkins/run_jenkins.py b/ci/jenkins/run_jenkins.py new file mode 100644 index 00000000..b5ae4714 --- /dev/null +++ b/ci/jenkins/run_jenkins.py @@ -0,0 +1,75 @@ +from plasma.utils.batch_jobs import ( + generate_working_dirname, copy_files_to_environment, start_jenkins_job + ) +import yaml +import os +import getpass +import plasma.conf + +num_nodes = 4 # Set in the Jenkins project area!! +test_matrix = [("Python3", "jet_data"), ("Python2", "jet_data")] + +run_directory = "{}/{}/jenkins/".format( + plasma.conf.conf['fs_path'], getpass.getuser()) +template_path = os.environ['PWD'] +conf_name = "conf.yaml" +executable_name = "mpi_learn.py" + + +def generate_conf_file( + test_configuration, + template_path="../", + save_path="./", + conf_name="conf.yaml"): + assert(template_path != save_path) + with open(os.path.join(template_path, conf_name), 'r') as yaml_file: + conf = yaml.load(yaml_file, Loader=yaml.SafeLoader) + conf['training']['num_epochs'] = 2 + conf['paths']['data'] = test_configuration[1] + if test_configuration[1] == "Python3": + conf['env']['name'] = "PPPL_dev3" + conf['env']['type'] = "anaconda3" + else: + conf['env']['name'] = "PPPL" + conf['env']['type'] = "anaconda" + + with open(os.path.join(save_path, conf_name), 'w') as outfile: + yaml.dump(conf, outfile, default_flow_style=False) + return conf + + +working_directory = generate_working_dirname(run_directory) +os.makedirs(working_directory) + +os.system( + " ".join(["cp -p", os.path.join(template_path, conf_name), + working_directory])) +os.system(" ".join( + ["cp -p", os.path.join(template_path, executable_name), + working_directory])) + +# os.chdir(working_directory) +# print("Going into {}".format(working_directory)) + +for ci in test_matrix: + subdir = working_directory + "/{}/".format(ci[0]) + os.makedirs(subdir) + copy_files_to_environment(subdir) + print("Making modified conf") + conf = generate_conf_file(ci, working_directory, subdir, conf_name) + print("Starting job") + if ci[1] == "Python3": + env_name = "PPPL_dev3" + env_type = "anaconda3" + else: + env_name = "PPPL" + env_type = "anaconda" + start_jenkins_job( + subdir, + num_nodes, + executable_name, + ci, + env_name, + env_type) + +print("submitted jobs.") diff --git a/ci/jenkins/validate_jenkins.py b/ci/jenkins/validate_jenkins.py new file mode 100644 index 00000000..abed06f4 --- /dev/null +++ b/ci/jenkins/validate_jenkins.py @@ -0,0 +1,19 @@ +#!/usr/bin/env python + +import sys +from mpi4py import MPI +import tensorflow as tf +import keras as kk +import mpi4py as mmm + +print(mmm.__version__) +print(kk.__version__) +print(tf.__version__) + +size = MPI.COMM_WORLD.Get_size() +rank = MPI.COMM_WORLD.Get_rank() +name = MPI.Get_processor_name() + +sys.stdout.write( + "Hello, World! I am process %d of %d on %s.\n" + % (rank, size, name)) diff --git a/ci/jenkins/validate_jenkins.sh b/ci/jenkins/validate_jenkins.sh new file mode 100644 index 00000000..f7d6b0d5 --- /dev/null +++ b/ci/jenkins/validate_jenkins.sh @@ -0,0 +1,25 @@ +#!/bin/bash + +export OMPI_MCA_btl="tcp,self,sm" + +module load anaconda3/4.4.0 +source activate /tigress/alexeys/jenkins/.conda/envs/jenkins3 +module load cudatoolkit/8.0 +module load cudnn/cuda-8.0/6.0 +module load openmpi/cuda-8.0/intel-17.0/2.1.0/64 +module load intel/17.0/64/17.0.4.196 + +cd /home/alexeys/jenkins/workspace/FRNM/PPPL +python setup.py install + +echo `which python` +echo `which mpicc` + +echo ${PWD} +echo $SLURM_NODELIST + +cd jenkins-ci +echo ${PWD} +ls ${PWD} + +srun python validate_jenkins.py diff --git a/ci/travis/install-mpi.sh b/ci/travis/install-mpi.sh new file mode 100755 index 00000000..b509269f --- /dev/null +++ b/ci/travis/install-mpi.sh @@ -0,0 +1,44 @@ +#!/bin/sh + +set -e + +DOWNLOAD_DIR="${HOME}/tmp" +INSTALL_PREFIX="${HOME}/opt" + +PACKAGE_NAME="${MPI_LIBRARY:?"MPI_LIBRARY not set!"}-${MPI_LIBRARY_VERSION:?"MPI_LIBRARY_VERSION not set!"}" +INSTALL_PREFIX="${INSTALL_PREFIX}/${PACKAGE_NAME}" +TARBALL_NAME="${PACKAGE_NAME}.tar.gz" + +if [ -d "${INSTALL_PREFIX}" ] +then + echo "MPI library already installed: ${PACKAGE_NAME}" + exit 0 +fi + +case "$MPI_LIBRARY" in + openmpi) + OPENMPI_SHORTVERSION=$(expr "${MPI_LIBRARY_VERSION}" ":" "\(.\{1,\}\..\{1,\}\)\..\{1,\}") + SOURCE_URL="http://www.open-mpi.org/software/ompi/v${OPENMPI_SHORTVERSION}/downloads/${TARBALL_NAME}" + ;; + mpich) + SOURCE_URL="http://www.mpich.org/static/downloads/${MPI_LIBRARY_VERSION}/${TARBALL_NAME}" + ;; +esac + +echo "Installing MPI library: ${PACKAGE_NAME}" +echo "into: ${INSTALL_PREFIX}" +echo "Download URL: ${SOURCE_URL}" +echo "Tarball name: ${TARBALL_NAME}" + +mkdir -p "${DOWNLOAD_DIR}" +cd "${DOWNLOAD_DIR}" +rm -rf "${TARBALL_NAME}" +wget --no-check-certificate -O "${TARBALL_NAME}" "${SOURCE_URL}" +rm -rf "${PACKAGE_NAME}" +tar -xzf "${TARBALL_NAME}" + +cd "${PACKAGE_NAME}" +mkdir -p "${INSTALL_PREFIX}" +./configure --enable-shared --prefix="${INSTALL_PREFIX}" +make -j 2 +make -j 2 install diff --git a/data/__init__.py b/data/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/data/d3d_signals.py b/data/d3d_signals.py deleted file mode 100644 index 1ae4b1a7..00000000 --- a/data/d3d_signals.py +++ /dev/null @@ -1,99 +0,0 @@ -#JET signal hierarchy -#------------------------------------------------------------------------# -#User only needs to look at 1st and last sections -# - conf.py only needs to import signals_dirs and signals_masks -# - get_mdsplus_data.py only needs signals_dirs and download_masks -# - performance_analysis_utils.py needs : -# - signals_dirs, plot_masks, ppf_labels, jpf_labels -#------------------------------------------------------------------------# -################ -# Signal names # -################ -#This section contains all the exact JET signal strings and their -#groupings by type and dimensionality. -#User should not touch this. Use for reference - - -### 0D signals ### -signal_paths = [ -'efsli', #Internal Inductance -'ipsip', #Plasma Current -'efsbetan', #Normalized Beta -'efswmhd', #Stored Energy -'nssampn1l', #Tearing Mode Amplitude (rotating 2/1) -'nssfrqn1l', #Tearing Mode Frequency (rotating 2/1) -'nssampn2l', #Tearing Mode Amplitude (rotating 3/2) -'nssfrqn2l', #Tearing Mode Frequency (rotating 3/2) -'dusbradial', #LM Amplitude -'dssdenest', #Plasma Density -r'\bol_l15_p', #Radiated Power core -r'\bol_l03_p', #Radiated Power Edge -'bmspinj', #Total Beam Power -'bmstinj',] #Total Beam Torque -#'pcechpwrf'] #Total ECH Power Not always on! - -signal_paths = ['d3d/' + path for path in signal_paths] - -### 0D EFIT signals ### -signal_paths += ['EFIT01/RESULTS.AEQDSK.Q95'] - -### 1D EFIT signals ### -#signal_paths += [ -#'AOT/EQU.t_e', #electron temperature profile vs rho (uniform mapping over time) -#'AOT/EQU.dens_e'] #electron density profile vs rho (uniform mapping over time) - -#these signals seem to give more reliable data -signal_paths += [ -'ZIPFIT01/PROFILES.ETEMPFIT', #electron temperature profile vs rho (uniform mapping over time) -'ZIPFIT01/PROFILES.EDENSFIT'] #electron density profile vs rho (uniform mapping over time) - -#make into list of lists format to be consistent with jet_signals.py -signal_paths = [[path] for path in signal_paths] - -#format : 'tree/signal_path' for each path -signals_dirs = signal_paths - - -################################################## -# USER SELECTIONS # -################################################## - - -################################## -# Select signals for downloading # -################################## - -#Default pass to get_mdsplus_data.py: download all above signals -download_masks = [[True]*len(sig_list) for sig_list in signals_dirs] -# download_masks[-1] = [False] # enable/disable temperature profile -# download_masks[-2] = [False] # enable/disable density profile - -####################################### -# Select signals for training/testing # -####################################### - -#Default pass to conf.py: train with all above signals -signals_masks = [[True]*len(sig_list) for sig_list in signals_dirs] -signals_masks[-1] = [False] # enable/disable temperature profile -signals_masks[-2] = [False] # enable/disable density profile - -#num_signals = sum([group.count(True) for i,group in enumerate(jet_signals.signals_masks)] -########################################### -# Select signals for performance analysis # -########################################### - -#User selects these by signal name -plot_masks = [[True]*len(sig_list) for sig_list in signals_dirs] - -#LaTeX strings for performance analysis, sorted in lists by signal_group -group_labels = [[r' $I_{plasma}$ [A]'], - [r' Mode L. A. [A]'], - [r' $P_{radiated}$ [W]'], #0d radiation, db/ - [r' $P_{radiated}$ [W]'],#1d radiation, db/ - [r' $\rho_{plasma}$ [m^-2]'], - [r' $L_{plasma,internal}$'], - [r'$\frac{d}{dt} E_{D}$ [W]'], - [r' $P_{input}$ [W]'], - [r'$E_{D}$'], -#ppf signal labels - [r'ECE unit?']] diff --git a/data/gadata.py b/data/gadata.py index d7c66fd5..1906ea58 100644 --- a/data/gadata.py +++ b/data/gadata.py @@ -1,99 +1,108 @@ -import MDSplus +import MDSplus import numpy -import time -import sys +# import time + class gadata: - """GA Data Obj""" - def __init__(self,signal,shot,tree=None,connection=None,nomds=False): + """GA Data Obj""" + + def __init__(self, signal, shot, tree=None, connection=None, nomds=False): - # Save object values - self.signal = signal - self.shot = shot - self.zdata = -1 - self.xdata = -1 - self.ydata = -1 - self.zunits = '' - self.xunits = '' - self.yunits = '' - self.rank = -1 - self.connection = connection - + # Save object values + self.signal = signal + self.shot = shot + self.zdata = -1 + self.xdata = -1 + self.ydata = -1 + self.zunits = '' + self.xunits = '' + self.yunits = '' + self.rank = -1 + self.connection = connection - ## Retrieve Data - t0 = time.time() - self.found = False + # Retrieve Data + # t0 = time.time() + self.found = False - # Create the MDSplus connection (thin) if not passed in - if self.connection is None: - self.connection = MDSplus.Connection('atlas.gat.com') + # Create the MDSplus connection (thin) if not passed in + if self.connection is None: + self.connection = MDSplus.Connection('atlas.gat.com') - # Retrieve data from MDSplus (thin) - if nomds == False: - #first try, retrieve directly from tree and tag - try: - #print 'trying direct using tree and tag' - if tree != None: - tag = self.signal - fstree = tree - else: - tag = self.connection.get('findsig("'+self.signal+'",_fstree)').value - fstree = self.connection.get('_fstree').value + # Retrieve data from MDSplus (thin) + if not nomds: + # first try, retrieve directly from tree and tag + try: + # print('trying direct using tree and tag') + if tree is not None: + tag = self.signal + fstree = tree + else: + tag = self.connection.get( + 'findsig("' + self.signal + '",_fstree)').value + fstree = self.connection.get('_fstree').value - self.connection.openTree(fstree,shot) - self.zdata = self.connection.get('_s = '+tag).data() - self.zunits = self.connection.get('units_of(_s)').data() - self.rank = numpy.ndim(self.zdata) - if self.rank > 1: - self.xdata = self.connection.get('dim_of(_s,1)').data() - self.xunits = self.connection.get('units_of(dim_of(_s,1))').data() - if self.xunits == '' or self.xunits == ' ': - self.xunits = self.connection.get('units(dim_of(_s,1))').data() + self.connection.openTree(fstree, shot) + self.zdata = self.connection.get('_s = ' + tag).data() + self.zunits = self.connection.get('units_of(_s)').data() + self.rank = numpy.ndim(self.zdata) + if self.rank > 1: + self.xdata = self.connection.get('dim_of(_s,1)').data() + self.xunits = self.connection.get( + 'units_of(dim_of(_s,1))').data() + if self.xunits == '' or self.xunits == ' ': + self.xunits = self.connection.get( + 'units(dim_of(_s,1))').data() - self.ydata = self.connection.get('dim_of(_s)').data() - self.yunits = self.connection.get('units_of(dim_of(_s))').data() - if self.yunits == '' or self.yunits == ' ': - self.yunits = self.connection.get('units(dim_of(_s))').data() - else: - self.xdata = self.connection.get('dim_of(_s)').data() - self.xunits = self.connection.get('units_of(dim_of(_s))').data() - if self.xunits == '' or self.xunits == ' ': - self.xunits = self.connection.get('units(dim_of(_s))').data() - #print 'zdata: ' + str(self.zdata) - self.found = True + self.ydata = self.connection.get('dim_of(_s)').data() + self.yunits = self.connection.get( + 'units_of(dim_of(_s))').data() + if self.yunits == '' or self.yunits == ' ': + self.yunits = self.connection.get( + 'units(dim_of(_s))').data() + else: + self.xdata = self.connection.get('dim_of(_s)').data() + self.xunits = self.connection.get( + 'units_of(dim_of(_s))').data() + if self.xunits == '' or self.xunits == ' ': + self.xunits = self.connection.get( + 'units(dim_of(_s))').data() + # print('zdata: ' + str(self.zdata)) + self.found = True - # MDSplus seems to return 2-D arrays transposed. Change them back. - if numpy.ndim(self.zdata) == 2: self.zdata = numpy.transpose(self.zdata) - if numpy.ndim(self.ydata) == 2: self.ydata = numpy.transpose(self.ydata) - if numpy.ndim(self.xdata) == 2: self.xdata = numpy.transpose(self.xdata) + # MDSplus seems to return 2-D arrays transposed. Change them + # back. + if numpy.ndim(self.zdata) == 2: + self.zdata = numpy.transpose(self.zdata) + if numpy.ndim(self.ydata) == 2: + self.ydata = numpy.transpose(self.ydata) + if numpy.ndim(self.xdata) == 2: + self.xdata = numpy.transpose(self.xdata) - except Exception,e: - #node not found -# print ' Signal not in MDSplus: %s' % (signal,) - pass + except Exception as e: + print(e) + pass - # Retrieve data from PTDATA if node not found - if not self.found: - #print 'Trying ptdata: %s' % (signal,) - self.zdata = self.connection.get('_s = ptdata2("'+signal+'",'+str(shot)+')') - if len(self.zdata) != 1: - self.xdata = self.connection.get('dim_of(_s)') - self.rank = 1 - self.found = True + # Retrieve data from PTDATA if node not found + if not self.found: + # print('Trying ptdata: %s' % (signal,)) + self.zdata = self.connection.get( + '_s = ptdata2("' + signal+'",' + str(shot)+')') + if len(self.zdata) != 1: + self.xdata = self.connection.get('dim_of(_s)') + self.rank = 1 + self.found = True - # Retrieve data from Pseudo-pointname if not in ptdata - if not self.found: - #print ' Signal not in PTDATA: %s' % (signal,) - self.zdata = self.connection.get('_s = pseudo("'+signal+'",'+str(shot)+')') - if len(self.zdata) != 1: - self.xdata = self.connection.get('dim_of(_s)') - self.rank = 1 - self.found = True + # Retrieve data from Pseudo-pointname if not in ptdata + if not self.found: + # print(' Signal not in PTDATA: %s' % (signal,)) + self.zdata = self.connection.get( + '_s = pseudo("' + signal+'",' + str(shot)+')') + if len(self.zdata) != 1: + self.xdata = self.connection.get('dim_of(_s)') + self.rank = 1 + self.found = True - if not self.found: #this means the signal wasn't found - pass - # print ' Signal not in pseudo-pointname: %s' % (signal,) - #print " No such signal: %s" % (signal,) + if not self.found: # this means the signal wasn't found + pass - #print ' GADATA Retrieval Time : ',time.time() - t0 - return + return diff --git a/data/get_mdsplus_data.py b/data/get_mdsplus_data.py index 94777ae8..81f14c53 100644 --- a/data/get_mdsplus_data.py +++ b/data/get_mdsplus_data.py @@ -1,244 +1,18 @@ -from __future__ import print_function -''' -http://www.mdsplus.org/index.php?title=Documentation:Tutorial:RemoteAccess&open=76203664636339686324830207&page=Documentation%2FThe+MDSplus+tutorial%2FRemote+data+access+in+MDSplus -http://piscope.psfc.mit.edu/index.php/MDSplus_%26_python#Simple_example_of_reading_MDSplus_data -http://www.mdsplus.org/documentation/beginners/expressions.shtml -http://www.mdsplus.org/index.php?title=Documentation:Tutorial:MdsObjects&open=76203664636339686324830207&page=Documentation%2FThe+MDSplus+tutorial%2FThe+Object+Oriented+interface+of+MDSPlus -''' +from plasma.utils.downloading import download_all_shot_numbers +from plasma.conf import conf -'''TODO -- mapping to flux surfaces: its not always [0,1]! -- handling of 1D signals during preprocessing & normalization -- handling of 1D signals for feeding into RNN (convolutional layers) -- handling of missing data in shots? -- TEST -''' -from MDSplus import * -#from pylab import * -import numpy as np -import sys -import multiprocessing as mp -from functools import partial -import Queue -import os -import errno -# import gadata - -from plasma.primitives.shots import ShotList - -from signals import * - -#print("Importing numpy version"+np.__version__) - -def create_missing_value_filler(): - time = np.linspace(0,100,100) - vals = np.zeros_like(time) - return time,vals - -def mkdirdepth(filename): - folder=os.path.dirname(filename) - if not os.path.exists(folder): - os.makedirs(folder) - -def format_save_path(prepath,signal_path,shot_num): - return prepath + signal_path + '/{}.txt'.format(shot_num) - - -def save_shot(shot_num_queue,c,signals,save_prepath,machine,sentinel=-1): - missing_values = 0 - # if machine == 'd3d': - # reload(gadata) #reloads Gadata object with connection - while True: - shot_num = shot_num_queue.get() - if shot_num == sentinel: - break - shot_complete = True - for signal in signals: - signal_path = signal.get_path(machine) - save_path_full = format_save_path(save_prepath,signal_path,shot_num) - success = False - if os.path.isfile(save_path_full): - print('-',end='') - success = True - else: - try: - try: - time,data,mapping,success = machine.fetch_data(signal,shot_num,c) - except: - #missing_values += 1 - print('Signal {}, shot {} missing. Filling with zeros'.format(signal_path,shot_num)) - time,data = create_missing_value_filler() - - data_two_column = np.vstack((np.atleast_2d(time),np.atleast_2d(data))).transpose() - try: #can lead to race condition - mkdirdepth(save_path_full) - except OSError, e: - if e.errno == errno.EEXIST: - # File exists, and it's a directory, another process beat us to creating this dir, that's OK. - pass - else: - # Our target dir exists as a file, or different error, reraise the error! - raise - np.savetxt(save_path_full,data_two_column,fmt = '%.5e')#fmt = '%f %f') - print('.',end='') - except: - print('Could not save shot {}, signal {}'.format(shot_num,signal_path)) - print('Warning: Incomplete!!!') - raise - sys.stdout.flush() - if not success: - missing_values += 1 - shot_complete = False - #only add shot to list if it was complete - if shot_complete: - print('saved shot {}'.format(shot_num)) - #complete_queue.put(shot_num) - else: - print('shot {} not complete. removing from list.'.format(shot_num)) - print('Finished with {} missing values total'.format(missing_values)) - return - - -def download_shot_numbers(shot_numbers,save_prepath,machine,max_cores = 2): - sentinel = -1 - fn = partial(save_shot,signals=signals,save_prepath=save_prepath,machine=machine,sentinel=sentinel) - num_cores = min(mp.cpu_count(),max_cores) #can only handle 8 connections at once :( - queue = mp.Queue() - #complete_shots = Array('i',zeros(len(shot_numbers)))# = mp.Queue() - - assert(len(shot_numbers) < 30000) # mp.queue can't handle larger queues yet! - for shot_num in shot_numbers: - queue.put(shot_num) - for i in range(num_cores): - queue.put(sentinel) - connections = [Connection(machine.server) for _ in range(num_cores)] - processes = [mp.Process(target=fn,args=(queue,connections[i])) for i in range(num_cores)] - - print('running in parallel on {} processes'.format(num_cores)) - - for p in processes: - p.start() - for p in processes: - p.join() - - -def download_all_shot_numbers(prepath,save_path,shot_numbers_path,shot_numbers_files,machine,max_cores): - max_len = 30000 - save_prepath = prepath+save_path + '/' + machine.name + '/' - shot_numbers,_ = ShotList.get_multiple_shots_and_disruption_times(prepath + shot_numbers_path,shot_numbers_files) - shot_numbers_chunks = [shot_numbers[i:i+max_len] for i in xrange(0,len(shot_numbers),max_len)]#can only use queue of max size 30000 - start_time = time.time() - for shot_numbers_chunk in shot_numbers_chunks: - download_shot_numbers(shot_numbers_chunk,save_prepath,machine,max_cores) - - print('Finished downloading {} shots in {} seconds'.format(len(shot_numbers),time.time()-start_time)) - - - - - - -prepath = '/cscratch/share/frnn/' +prepath = '/p/datad2/' # '/cscratch/share/frnn/' shot_numbers_path = 'shot_lists/' -save_path = 'signal_data/' -machine = d3d -signals = d3d_signals - -#nstx -# shot_numbers_files = ['disrupt_nstx.txt'] #nstx - -#d3d -shot_numbers_files = ['shotlist_JaysonBarr_clear.txt'] -shot_numbers_files += ['shotlist_JaysonBarr_disrupt.txt'] -# #shot_numbers_files = ['d3d_short_clear.txt']# ,'d3d_clear.txt', 'd3d_disrupt.txt'] - -#jet -# shot_numbers_files = ['CWall_clear.txt','CFC_unint.txt','BeWall_clear.txt','ILW_unint.txt']#jet - -max_cores = 32 -download_all_shot_numbers(prepath,save_path,shot_numbers_path,shot_numbers_files,machine,max_cores) - - -#complete_shot_numbers = [] -#print(complete_queue) -#print(complete_queue.qsize()) -#for i in range(len(complete_shots)): -# if complete_shots[i]: -# complete_shot_numbers.append(shot_numbers[i]) -#while not complete_queue.empty(): -# complete_shot_numbers.append(complete_queue.get(False)) - - - -# if machine == 'nstx': -# shot_numbers_files = ['disrupt_nstx.txt'] -# server_path = "skylark.pppl.gov:8501::" -# signal_paths = ['engineering/ip1/', -# 'operations/rwmef_plas_n1_amp_br/', -# 'efit02/li/', -# 'activespec/ts_ld/', -# 'passivespec/bolom_totpwr/', -# 'nbi/nb_p_inj/', -# 'efit02/wpdot/'] - -# elif machine == 'd3d': -# shot_numbers_files = ['shotlist_JaysonBarr_clear.txt'] -# shot_numbers_files += ['shotlist_JaysonBarr_disrupt.txt'] -# #shot_numbers_files = ['d3d_short_clear.txt']# ,'d3d_clear.txt', 'd3d_disrupt.txt'] -# server_path = 'atlas.gat.com' -# from d3d_signals import signal_paths -# import itertools -# signal_paths = list(itertools.chain.from_iterable(signal_paths)) -# # signal_paths = ['PINJ','IP','Q95','DENSITY','WMHD'] #,'PRAD'] #PRAD returns a 2D xdata -# # Recommended signals from DIII-D -# # signal_paths = ['efsli','ipsip','efsbetan','efswmhd','nssampn1l','nssfrqn1l', -# # 'nssampn2l','nssfrqn2l', -# # 'dusbradial','dssdenest',r'\bol_l15_p',r'\bol_l03_p','bmspinj','bmstinj','pcechpwrf'] -# # signal_paths = ['d3d/' + path for path in signal_paths] - - - -# elif machine == 'jet': -# shot_numbers_files = ['CWall_clear.txt','CFC_unint.txt','BeWall_clear.txt','ILW_unint.txt'] -# server_path = 'mdsplus.jet.efda.org' - -# #plasma current, locked mode, output power -# signal_paths = ['jpf/da/c2-ipla', -# 'jpf/da/c2-loca', -# 'jpf/db/b5r-ptot>out'] - - - -# #internal inductance, time derivative of stored energy, input power, total diamagnetic energy -# signal_paths += ['jpf/gs/bl-liout'] #Radiated Power [W] -#1D vertical signals, don't use signal 16 and 23 -db =[] -db += ['b5vr-pbol:{:03d}'.format(i) for i in range(1,28) if (i != 16 and i != 23)] -#1D horizontal signals -db += ['b5hr-pbol:{:03d}'.format(i) for i in range(1,24)] - -#### 1D density signals [m^-2] #### -df = [] -#4 vertical channels and 4 horizontal channels, dont use signal 1 -df += ['g1r-lid:{:03d}'.format(i) for i in range(2,9)] - -#### 0D signals et al #### -gs_inductance = ['bl-li 1: - xdata = c.get('dim_of(_s,1)').data() - xunits = get_units('dim_of(_s,1)') - ydata = c.get('dim_of(_s)').data() - yunits = get_units('dim_of(_s)') - else: - xdata = c.get('dim_of(_s)').data() - xunits = get_units('dim_of(_s)') - found = True - # MDSplus seems to return 2-D arrays transposed. Change them back. - if np.ndim(data) == 2: data = np.transpose(data) - if np.ndim(ydata) == 2: ydata = np.transpose(ydata) - if np.ndim(xdata) == 2: xdata = np.transpose(xdata) - - except Exception,e: - #print(e) - #sys.stdout.flush() - pass - - # Retrieve data from PTDATA if node not found - if not found: - #print("not in full path {}".format(signal)) - data = c.get('_s = ptdata2("'+signal+'",'+str(shot)+')') - if len(data) != 1: - xdata = c.get('dim_of(_s)') - rank = 1 - found = True - # Retrieve data from Pseudo-pointname if not in ptdata - if not found: - #print("not in PTDATA {}".format(signal)) - data = c.get('_s = pseudo("'+signal+'",'+str(shot)+')') - if len(data) != 1: - xdata = c.get('dim_of(_s)') - rank = 1 - found = True - #this means the signal wasn't found - if not found: - print " No such signal: %s" % (signal,) - pass - - # print ' GADATA Retrieval Time : ',time.time() - t0 - return xdata,data,ydata,found - - -def fetch_jet_data(signal_path,shot_num,c): - data = c.get('_sig=jet("{}/",{})'.format(signal_path,shot_num)).data() - time = c.get('_sig=dim_of(jet("{}/",{}))'.format(signal_path,shot_num)).data() - found = True - return time,data,None,found - -def fetch_nstx_data(signal_path,shot_num,c): - tree,tag = get_tree_and_tag(signal_path) - c.openTree(tree,shot_num) - data = c.get(tag).data() - time = c.get('dim_of('+tag+')').data() - found = True - return time,data,None,found - -d3d = Machine("d3d","atlas.gat.com",fetch_d3d_data) -jet = Machine("jet","mdsplus.jet.efda.org",fetch_jet_data) -nstk = Machine("nstx","skylark.pppl.gov:8501::",fetch_nstx_data) - - -all_machines = [d3d,jet] - -etemp_profile = Signal("Electron temperature profile",["ZIPFIT01/PROFILES.ETEMPFIT"],[d3d],causal_shifts=[10]) -edens_profile = Signal("Electron density profile",["ZIPFIT01/PROFILES.EDENSFIT"],[d3d],causal_shifts=[10]) - -q95 = Signal("q95 safety factor",["EFIT01/RESULTS.AEQDSK.Q95"],[d3d],causal_shifts=[10]) - -li = Signal("locked mode amplitude",["jpf/gs/bl-liout'],[jet]) -pin = Signal("Input Power (beam for d3d)",['jpf/gs/bl-ptot 1: + xdata = c.get('dim_of(_s,1)').data() + ydata = c.get('dim_of(_s)').data() + else: + xdata = c.get('dim_of(_s)').data() + + # MDSplus seems to return 2-D arrays transposed. Change them back. + if np.ndim(data) == 2: + data = np.transpose(data) + if np.ndim(ydata) == 2: + ydata = np.transpose(ydata) + if np.ndim(xdata) == 2: + xdata = np.transpose(xdata) + + # print(' GADATA Retrieval Time : ', time.time() - t0) + xdata = xdata*1e-3 # time is measued in ms + return xdata, data, ydata, found + + +def fetch_jet_data(signal_path, shot_num, c): + found = False + time = np.array([0]) + ydata = None + data = np.array([0]) + try: + data = c.get('_sig=jet("{}/",{})'.format(signal_path, shot_num)).data() + if np.ndim(data) == 2: + data = np.transpose(data) + time = c.get('_sig=dim_of(jet("{}/",{}),1)'.format( + signal_path, shot_num)).data() + ydata = c.get('_sig=dim_of(jet("{}/",{}),0)'.format( + signal_path, shot_num)).data() + else: + time = c.get('_sig=dim_of(jet("{}/",{}))'.format( + signal_path, shot_num)).data() + found = True + except Exception as e: + g.print_unique(e) + sys.stdout.flush() + # pass + return time, data, ydata, found + + +def fetch_nstx_data(signal_path, shot_num, c): + tree, tag = get_tree_and_tag(signal_path) + c.openTree(tree, shot_num) + data = c.get(tag).data() + time = c.get('dim_of(' + tag + ')').data() + found = True + return time, data, None, found + + +d3d = Machine("d3d", "atlas.gat.com", fetch_d3d_data, max_cores=32, + current_threshold=2e-1) +jet = Machine("jet", "mdsplus.jet.efda.org", fetch_jet_data, max_cores=8, + current_threshold=1e5) +nstx = Machine("nstx", "skylark.pppl.gov:8501::", fetch_nstx_data, max_cores=8) + +all_machines = [d3d, jet] + +profile_num_channels = 64 + +# The "data_avail_tolerances" parameter in Signal class initializer relaxes +# the cutoff for the signal around the defined t_disrupt (provided in the +# disruptive shot list). The latter definition (based on current quench) may +# vary depending on who supplied the shot list and computed t_disrupt, since +# quench may last for O(10 ms). E.g. C. Rea may have taken t_disrupt = midpoint +# of start and end of quench for later D3D shots after 2016 in +# d3d_disrupt_since_2016.txt. Whereas J. Barr, and semi-/automatic methods for +# calculating t_disrupt may use t_disrupt = start of current quench. + +# Early-terminating signals will be implicitly padded with zeros when t_disrupt +# still falls within the tolerance (see shots.py, +# Shot.get_signals_and_times_from_file). Even tols > 30 ms are fine (do not +# violate causality), but the ML method may start to base predictions on the +# disappearance of signals. + +# "t" subscripted variants of signal variables increase the tolernace to 29 ms +# on D3D, the maximum value possible without violating causality for the min +# T_warn=30 ms. This is important for the signals of newer shots in +# d3d_disrupt_since_2016.txt; many of them would cause [omit] of entire shot +# because their values end shortly before t_disrupt (poss. due to different +# t_disrupt label calculation). + +# See conf_parser.py dataset definitions of d3d_data_max_tol, d3d_data_garbage +# which use these signal variants. + +# For non-t-subscripted profile signals (and q95), a positive tolerance of +# 20ms on D3D (and 30-50ms on JET) is used to account for the causal shifting +# of the delayed "real-time processing". + +# List ---> individual tolerance for each machine when signal definitions are +# shared in cross-machine studies. + +# ZIPFIT comes from actual measurements +# jet and d3d: +etemp_profile = ProfileSignal( + "Electron temperature profile", + ["ppf/hrts/te", "ZIPFIT01/PROFILES.ETEMPFIT"], [jet, d3d], + mapping_paths=["ppf/hrts/rho", None], causal_shifts=[0, 10], + mapping_range=(0, 1), num_channels=profile_num_channels, + data_avail_tolerances=[0.05, 0.02]) +edens_profile = ProfileSignal( + "Electron density profile", + ["ppf/hrts/ne", "ZIPFIT01/PROFILES.EDENSFIT"], [jet, d3d], + mapping_paths=["ppf/hrts/rho", None], causal_shifts=[0, 10], + mapping_range=(0, 1), num_channels=profile_num_channels, + data_avail_tolerances=[0.05, 0.02]) + +etemp_profilet = ProfileSignal( + "Electron temperature profile tol", + ["ppf/hrts/te", "ZIPFIT01/PROFILES.ETEMPFIT"], [jet, d3d], + mapping_paths=["ppf/hrts/rho", None], causal_shifts=[0, 10], + mapping_range=(0, 1), num_channels=profile_num_channels, + data_avail_tolerances=[0.05, 0.029]) +edens_profilet = ProfileSignal( + "Electron density profile tol", + ["ppf/hrts/ne", "ZIPFIT01/PROFILES.EDENSFIT"], [jet, d3d], + mapping_paths=["ppf/hrts/rho", None], causal_shifts=[0, 10], + mapping_range=(0, 1), num_channels=profile_num_channels, + data_avail_tolerances=[0.05, 0.029]) +# d3d only: +# etemp_profile = ProfileSignal( +# "Electron temperature profile", ["ZIPFIT01/PROFILES.ETEMPFIT"], [d3d], +# mapping_paths=[None], causal_shifts=[10], mapping_range=(0, 1), +# num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +# edens_profile = ProfileSignal( +# "Electron density profile", ["ZIPFIT01/PROFILES.EDENSFIT"], [d3d], +# mapping_paths=[None], causal_shifts=[10], mapping_range=(0, 1), +# num_channels=profile_num_channels, data_avail_tolerances=[0.02]) + +itemp_profile = ProfileSignal( + "Ion temperature profile", ["ZIPFIT01/PROFILES.ITEMPFIT"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +zdens_profile = ProfileSignal( + "Impurity density profile", ["ZIPFIT01/PROFILES.ZDENSFIT"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +trot_profile = ProfileSignal( + "Rotation profile", ["ZIPFIT01/PROFILES.TROTFIT"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +# note, thermal pressure doesn't include fast ions +pthm_profile = ProfileSignal( + "Thermal pressure profile", ["ZIPFIT01/PROFILES.PTHMFIT"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +neut_profile = ProfileSignal( + "Neutrals profile", ["ZIPFIT01/PROFILES.NEUTFIT"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +# compare the following profile to just q95 0D signal +q_profile = ProfileSignal( + "Q profile", ["ZIPFIT01/PROFILES.BOOTSTRAP.QRHO"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) +bootstrap_current_profile = ProfileSignal( + "Rotation profile", ["ZIPFIT01/PROFILES.BOOTSTRAP.JBS_SAUTER"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) + +# equilibrium_image = 2DSignal( +# "2D Magnetic Equilibrium", ["EFIT01/RESULTS.GEQDSK.PSIRZ"], [d3d], +# causal_shifts=[10], mapping_range=(0, 1), +# num_channels=profile_num_channels, data_avail_tolerances=[0.02]) + +# EFIT is the solution to the inverse problem from external magnetic +# measurements + +# pressure might be unphysical since it is not constrained by measurements, +# only the EFIT which does not know about density and temperature + +# pressure_profile = ProfileSignal( +# "Pressure profile", ["EFIT01/RESULTS.GEQDSK.PRES"], [d3d], +# causal_shifts=[10], mapping_range=(0, 1), +# num_channels=profile_num_channels, data_avail_tolerances=[0.02]) + +q_psi_profile = ProfileSignal( + "Q(psi) profile", ["EFIT01/RESULTS.GEQDSK.QPSI"], [d3d], + causal_shifts=[10], mapping_range=(0, 1), + num_channels=profile_num_channels, data_avail_tolerances=[0.02]) + +# epress_profile_spatial = ProfileSignal( +# "Electron pressure profile", ["ppf/hrts/pe/"], [jet], causal_shifts=[25], +# mapping_range=(2, 4), num_channels=profile_num_channels) + +etemp_profile_spatial = ProfileSignal( + "Electron temperature profile", ["ppf/hrts/te"], [jet], + causal_shifts=[0], mapping_range=(2, 4), + num_channels=profile_num_channels, data_avail_tolerances=[0.05]) +edens_profile_spatial = ProfileSignal( + "Electron density profile", ["ppf/hrts/ne"], [jet], + causal_shifts=[0], mapping_range=(2, 4), + num_channels=profile_num_channels, data_avail_tolerances=[0.05]) +rho_profile_spatial = ProfileSignal( + "Rho at spatial positions", ["ppf/hrts/rho"], [jet], + causal_shifts=[0], mapping_range=(2, 4), + num_channels=profile_num_channels, data_avail_tolerances=[0.05]) + +etemp = Signal("electron temperature", ["ppf/hrtx/te0"], + [jet], causal_shifts=[25], data_avail_tolerances=[0.05]) +# epress = Signal("electron pressure", ["ppf/hrtx/pe0/"], [jet], +# causal_shifts=[25]) + +q95 = Signal( + "q95 safety factor", ['ppf/efit/q95', "EFIT01/RESULTS.AEQDSK.Q95"], + [jet, d3d], causal_shifts=[15, 10], normalize=False, + data_avail_tolerances=[0.03, 0.02]) +q95t = Signal( + "q95 safety factor tol", ['ppf/efit/q95', "EFIT01/RESULTS.AEQDSK.Q95"], + [jet, d3d], causal_shifts=[15, 10], normalize=False, + data_avail_tolerances=[0.03, 0.029]) + +# "d3d/ipsip" was used before, ipspr15V seems to be available for a +# superset of shots. +ip = Signal("plasma current", ["jpf/da/c2-ipla", "ipspr15V"], + [jet, d3d], is_ip=True) + +ipt = Signal("plasma current tol", ["jpf/da/c2-ipla", "ipspr15V"], + [jet, d3d], is_ip=True, data_avail_tolerances=[0.029, 0.029]) +iptarget = Signal("plasma current target", ["ipsiptargt"], [d3d]) +iptargett = Signal("plasma current target tol", ["ipsiptargt"], [d3d], + data_avail_tolerances=[0.029]) +iperr = Signal("plasma current error", ["ipeecoil"], [d3d]) +iperrt = Signal("plasma current error tol", ["ipeecoil"], [d3d], + data_avail_tolerances=[0.029]) + +li = Signal("internal inductance", ["jpf/gs/bl-liout'], [jet]) +pradtott = Signal("Radiated Power tol", ['jpf/db/b5r-ptot>out'], [jet], + data_avail_tolerances=[0.029]) +# pradtot = Signal("Radiated Power", ['jpf/db/b5r-ptot>out', +# r'\prad_tot'], [jet,d3d]) +# pradcore = ChannelSignal("Radiated Power Core", [r'\bol_l15_p'] +# ,[d3d]) +# pradedge = ChannelSignal("Radiated Power Edge", [r'\bol_l03_p'], +# [d3d]) +pradcore = ChannelSignal("Radiated Power Core", + ['ppf/bolo/kb5h/channel14', r'\bol_l15_p'], + [jet, d3d]) +pradedge = ChannelSignal("Radiated Power Edge", + ['ppf/bolo/kb5h/channel10', r'\bol_l03_p'], + [jet, d3d]) +pradcoret = ChannelSignal("Radiated Power Core tol", + ['ppf/bolo/kb5h/channel14', r'\bol_l15_p'], + [jet, d3d], data_avail_tolerances=[0.029, 0.029]) +pradedget = ChannelSignal("Radiated Power Edge tol", + ['ppf/bolo/kb5h/channel10', r'\bol_l03_p'], + [jet, d3d], data_avail_tolerances=[0.029, 0.029]) +# pechin = Signal("ECH input power, not always on", ['pcechpwrf'], [d3d]) +pechin = Signal("ECH input power, not always on", + ['RF/ECH.TOTAL.ECHPWRC'], [d3d]) +pechint = Signal("ECH input power, not always on tol", + ['RF/ECH.TOTAL.ECHPWRC'], [d3d], + data_avail_tolerances=[0.029]) + +# betan = Signal("Normalized Beta", ['jpf/gs/bl-bndia 1) +} +d3d_signals = { + sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( + sig.is_defined_on_machine(d3d)) +} +d3d_signals_max_tol = { + sig_name: sig for (sig_name, sig) in all_signals_max_tol.items() if ( + sig.is_defined_on_machine(d3d)) +} +d3d_signals_0D = { + sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( + (sig.is_defined_on_machine(d3d) and sig.num_channels == 1)) +} +d3d_signals_1D = { + sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( + (sig.is_defined_on_machine(d3d) and sig.num_channels > 1)) +} + +jet_signals = { + sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( + sig.is_defined_on_machine(jet)) +} +jet_signals_0D = { + sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( + (sig.is_defined_on_machine(jet) and sig.num_channels == 1)) +} +jet_signals_1D = { + sig_name: sig for (sig_name, sig) in all_signals_restricted.items() if ( + (sig.is_defined_on_machine(jet) and sig.num_channels > 1)) +} + +# ['pcechpwrf'] #Total ECH Power Not always on! +# ## 0D EFIT signals ### +# signal_paths += ['EFIT02/RESULTS.AEQDSK.Q95'] + +# ## 1D EFIT signals ### +# the other signals give more reliable data +# signal_paths += [ +# # Note, the following signals are uniformly mapped over time +# 'AOT/EQU.t_e', # electron temperature profile vs rho +# 'AOT/EQU.dens_e'] # electron density profile vs rho + + +# [[' $I_{plasma}$ [A]'], +# [' Mode L. A. [A]'], +# [' $P_{radiated}$ [W]'], +# [' $P_{radiated}$ [W]'], +# [' $\rho_{plasma}$ [m^-2]'], +# [' $L_{plasma,internal}$'], +# ['$\frac{d}{dt} E_{D}$ [W]'], +# [' $P_{input}$ [W]'], +# ['$E_{D}$'], +# ppf signal labels +# ['ECE unit?']] diff --git a/docs/A21-DL-Workload-README.md b/docs/A21-DL-Workload-README.md new file mode 100644 index 00000000..0951372b --- /dev/null +++ b/docs/A21-DL-Workload-README.md @@ -0,0 +1,94 @@ +# DeepFusionAESP (FRNN) + +*Last updated 2020-01-23.* + +## Description +The Fusion Recurrent Neural Net (FRNN) software is a Python package that +implements deep learning models for disruption prediction in tokamak fusion +plasmas. Using diagnostic time-series data collected over years of tokamak pulses, or +"shots", combined with labels of the disruption time (or "nondisruptive") +provided by machine experts, our software aims to predict disruptions at least +30 ms before their onset. Future extensions of the framework will attempt to +explain and classify the precursors to disruptions well in-advance of the event. + +The main machine learning model used in the module (and behind the project's name) is +based on a recurrent neural network (RNN), specifically the long short-term memory (LSTM) +architecture. 1D (radial) profile data, if available and enabled by the user, is +incorporated through a series of `Conv1D` layers and concatenated to the 0D scalar signal +data at every timestep before being fed to the RNN layers. Future work will involve +incorporating 2D (radial x toroidal) data from the electron cyclotron emission imaging +(ECEi) advanced diagnostic in operation. + +The implementation is built on Keras with +a pre-v2.x TensorFlow backend. A second implementation in PyTorch is also +available. The software can optionally employ data parallelism with distributed +synchronous training through `mpi4py`. We do not currently support Horovod, but +we expect to implement it soon. + +Despite the name, the codebase also contains implementations of a variety of +"shallow" ML methods for comparison: random forests, SVMs, single hidden layer +perceptrons, and gradient-boosted trees. We have added, or are about to add, +deep learning methods that are not based on RNNs, e.g. TCN and Transformers. + +Refer to Kates-Harbeck, et al. (2019), for additional information. + +## Code and datasets +The public software repository is hosted on GitHub: +https://github.com/PPPLDeepLearning/plasma-python + +The master host for the raw data used in the project is located on the +[`/tigress`](https://researchcomputing.princeton.edu/storage/tigress) GPFS file +system hosted by Princeton Research Computing. At the time of writing, it +consists of: +- 1.8 TB of `SHOT_ID.txt` files each containing the data from an individual +diagnostic signal data, during a single shot, for the JET, DIII-D, and NSTX +tokamaks. +``` +/tigress/FRNN/signal_data/ +/tigress/FRNN/signal_data_new_nov2019/ +``` +- A few MB of two-column (with 3 space delimiter) `.txt` files containing ID +numbers of shots and the labeled disruption time (or `-1.000000` if the shot is +nondisruptive). +``` +/tigress/FRNN/shot_lists/ +``` + +The data is accessible from the ALCF Theta and Cooley systems via the Lustre +file system. Periodically, the data from Princeton is copied via Globus to this +secondary store: +``` +/lus/theta-fs0/projects/fusiondl_aesp/shot_lists/ +/lus/theta-fs0/projects/fusiondl_aesp/signal_data/ +``` + + +### Processed signal data + +Although the subset of raw data that is used to train a single FRNN +configuration is typically ~100s of GBs, the raw data must first be preprocessed +into individual `SHOT_ID.npz` files each containing a pickled `Shot` class +object which combines the resampled, cut, clipped, normalized, and transformed +diagnostic data that was originally distributed among multiple `SHOT_ID.txt` +files. + +An example of such a set of preprocessed shot data files can be found here: +``` +/lus/theta-fs0/projects/fusiondl_aesp/felker/processed_shots/signal_group_250640798211266795112500621861190558178 +``` + +This pipeline greatly reduces the size of the input data used during training or +inference. For the 0D FRNN model applied to our original set of 3449 shots from DIII-D, +the preprocessed `SHOT_ID.npz` files occupy only a few GB, e.g. + +## Building and running the software + +A comprehensive tutorial for building and running the main FRNN model on the P100 GPUs of +the [`Tiger` cluster](https://researchcomputing.princeton.edu/systems-and-services/available-systems/tiger) +can be found in the main repository: +https://github.com/PPPLDeepLearning/plasma-python/blob/master/docs/PrincetonUTutorial.md + +Similarly, we maintain documentation for a tutorial on ALCF Theta: +https://github.com/PPPLDeepLearning/plasma-python/blob/master/docs/ALCF.md + +However, it is generally less up-to-date than the GPU documentation. diff --git a/docs/ALCF.md b/docs/ALCF.md new file mode 100644 index 00000000..2f7cb9d9 --- /dev/null +++ b/docs/ALCF.md @@ -0,0 +1,385 @@ +# ALCF Theta `plasma-python` FRNN Notes + +**Original author: Rick Zamora (rzamora@anl.gov)** + +This document is intended to act as a tutorial for running the [plasma-python](https://github.com/PPPLDeepLearning/plasma-python) implementation of the fusion recurrent neural network (FRNN) on the ALCF Theta supercomputer (Cray XC40; Intel KNL processors). The steps followed in these notes are based on the Princeton [Tiger-GPU tutorial](https://github.com/PPPLDeepLearning/plasma-python/blob/master/docs/PrincetonUTutorial.md#location-of-the-data-on-tigress), hosted within the main GitHub repository for the project. + +## Environment Setup + +Choose a root project directory for FRNN-related installations on Theta: + +``` +export FRNN_ROOT= +cd $FRNN_ROOT +``` + +*Personal note:* I am using `FRNN_ROOT=/home/zamora/ESP` + +Create a simple directory structure allowing experimental *builds* of the `plasma-python` python code/library: + +``` +mkdir build +mkdir build/miniconda-3.6-4.5.4 +cd build/miniconda-3.6-4.5.4 +``` + +### Custom Miniconda Environment Setup + +Copy miniconda installation script to working directory (and install): + +``` +cp /lus/theta-fs0/projects/fusiondl_aesp/FRNN/rzamora/scripts/install_miniconda-3.6-4.5.4.sh . +./install_miniconda-3.6-4.5.4.sh +``` + +The `install_miniconda-3.6-4.5.4.sh` script will install `miniconda-4.5.4` (using `Python-3.6`), as well as `Tensorflow-1.12.0` and `Keras 2.2.4`. + + +Update your environment variables to use miniconda: + +``` +export PATH=${FRNN_ROOT}/build/miniconda-3.6-4.5.4/miniconda3/4.5.4/bin:$PATH +export PYTHONPATH=${FRNN_ROOT}/build/miniconda-3.6-4.5.4/miniconda3/4.5.4/lib/python3.6/site-packages/:$PYTHONPATH +``` + +Note that the previous lines (as well as the definition of `FRNN_ROOT`) can be appended to your `$HOME/.bashrc` file if you want to use this environment on Theta by default. + + +## Installing `plasma-python` + +Here, we assume the installation is within the custom miniconda environment installed in the previous steps. We also assume the following commands have already been executed: + +``` +export FRNN_ROOT= +export PATH=${FRNN_ROOT}/build/miniconda-3.6-4.5.4/miniconda3/4.5.4/bin:$PATH +export PYTHONPATH=${FRNN_ROOT}/build/miniconda-3.6-4.5.4/miniconda3/4.5.4/lib/python3.6/site-packages/:$PYTHONPATH +``` + +*Personal note:* I am using `export FRNN_ROOT=/lus/theta-fs0/projects/fusiondl_aesp/zamora/FRNN_project` + +If the environment is set up correctly, installation of `plasma-python` is straightforward: + +``` +cd ${FRNN_ROOT}/build/miniconda-3.6-4.5.4 +git clone https://github.com/PPPLDeepLearning/plasma-python.git +cd plasma-python +python setup.py build +python setup.py install +``` + +## Data Access + +Sample data and metadata is available in `/lus/theta-fs0/projects/FRNN/tigress/alexeys/signal_data` and `/lus/theta-fs0/projects/FRNN/tigress/alexeys/shot_lists`, respectively. It is recommended that users create their own symbolic links to these directories. I recommend that you do this within a directory called `/lus/theta-fs0/projects/fusiondl_aesp//`. For example: + +``` +ln -s /lus/theta-fs0/projects/fusiondl_aesp/FRNN/tigress/alexeys/shot_lists  /lus/theta-fs0/projects/fusiondl_aesp//shot_lists +ln -s /lus/theta-fs0/projects/fusiondl_aesp/FRNN/tigress/alexeys/signal_data  /lus/theta-fs0/projects/fusiondl_aesp//signal_data +``` + +For the examples included in `plasma-python`, there is a configuration file that specifies the root directory of the raw data. Change the `fs_path: '/tigress'` line in `examples/conf.yaml` to reflect the following: + +``` +fs_path: '/lus/theta-fs0/projects/fusiondl_aesp' +``` + +Its also a good idea to change `num_gpus: 4` to `num_gpus: 1`. I am also using the `jet_data_0D` dataset: + +``` +paths: + data: jet_data_0D +``` + + +### Data Preprocessing + +#### The SLOW Way (On Theta) + +Theta is KNL-based, and is **not** the best resource for processing many text files in python. However, the preprocessing step *can* be used by using the following steps (although it may need to be repeated many times to get through the whole dataset in a 60-minute debug queues): + +``` +cd ${FRNN_ROOT}/build/miniconda-3.6-4.5.4/plasma-python/examples +cp /lus/theta-fs0/projects/fusiondl_aesp/FRNN/rzamora/scripts/submit_guarantee_preprocessed.sh . +``` + +Modify the paths defined in `submit_guarantee_preprocessed.sh` to match your environment. + +Note that the preprocessing module will use Pathos multiprocessing (not MPI/mpi4py). Therefore, the script will see every compute core (all 256 per node) as an available resource. Since the LUSTRE file system is unlikely to perform well with 256 processes (on the same node) opening/closing/creating files at once, it might improve performance if you make a slight change to line 85 in the `vi ~/plasma-python/plasma/preprocessor/preprocess.py` file: + +``` +line 85: use_cores = min( , max(1,mp.cpu_count()-2) ) +``` + +After optionally re-building and installing plasm-python with this change, submit the preprocessing job: + +``` +qsub submit_guarantee_preprocessed.sh +``` + +#### The FAST Way (On Cooley) + +You will fine it much less painful to preprocess the data on Cooley, because the Haswell processors are much better suited for this... Log onto the ALCF Cooley Machine: + +``` +ssh @cooley.alcf.anl.gov +``` + +Copy my `cooley_preprocess` example directory to whatever directory you choose to work in: + +``` +cp -r /lus/theta-fs0/projects/fusiondl_aesp/FRNN/rzamora/scripts/cooley_preprocess . +cd cooley_preprocess +``` + +This directory has a Singularity image with everything you need to run your code on Cooley. Assuming you have created symbolic links to the `shot_lists` and `signal_data` directories in `/lus/theta-fs0/projects/fusiondl_aesp//`, you can just submit the included `COBALT` script (to specify the data you want to process, just modify the included `conf.yaml` file): + +``` +qsub submit.sh +``` + +For me, this finishes in less than 10 minutes, and creates 5523 `.npz` files in the `/lus/theta-fs0/projects/fusiondl_aesp//processed_shots/` directory. The output file of the COBALT submission ends with the following message: + +``` +5522/5523Finished Preprocessing 5523 files in 406.94421911239624 seconds +Omitted 5523 shots of 5523 total. +0/0 disruptive shots +WARNING: All shots were omitted, please ensure raw data is complete and available at /lus/theta-fs0/projects/fusiondl_aesp/zamora/signal_data/. +4327 1196 +``` + + +# Notes on Revisiting Pre-Processes + +## Preprocessing Information + +To understand what might be going wrong with the preprocessing step, let's investigate what the code is actually doing. + +**Step 1** Call `guarentee_preprocessed( conf )`, which is defined in `plasma/preprocessor/preprocess.py`. This function first initializes a `Preprocessor()` object (whose class definition is in the same file), and then checks if the preprocessing was already done (by looking for a file). The preprocessor object is called `pp`. + +**Step 2** Assuming preprocessing is needed, we call `pp.clean_shot_lists()`, which loops through each file in the `shot_lists` directory and calls `self.clean_shot_list()` (not plural) for each text-file item. I do not believe this function is doing any thing when I run it, because all the shot list files have been "cleaned." The cleaning of a shot-list file just means the data is corrected to have two columns, and the file is renamed (to have "clear" in the name). + +**Step 3** We call `pp.preprocess_all()`, which parses some of the config file, and ultimately calls `self.preprocess_from_files(shot_files_all,use_shots)` (where I believe `shot_files_all` is the output directory, and `use_shots` is the number of shots to use). + +**Step 4** The `preprocess_from_files()` function is used to do the actual preprocessing. It does this by creating a multiprocessing pool, and mapping the processes to the `self.preprocess_single_file` function (note that the code for `ShotList` class is in `plasma/primitives/shots.py`, and the preprocessing code is still in `plasma/preprocessor/preprocess.py`). + +**Important:** It looks like the code uses the path definitions in `data/shot_lists/signals.py` to define the location/path of signal data. I believe that some of the signal data is missing, which is causing every "shot" to be labeled as incomplete (and consequently thrown out). + +### Possible Issues + +From the preprocessing output, it is clear that the *Signal Radiated Power Core* data was not downloaded correctly. According to the `data/shot_lists/signals.py` file, the data *should* be in `/lus/theta-fs0/projects/fusiondl_aesp//signal_data/jet/ppf/bolo/kb5h/channel14`. However, the only subdirectory of `~/jet/ppf/` is `~/jet/ppf/efit` + +Another possible issue is that the `data/shot_lists/signals.py` file specifies the **name** of the directory containing the *Radiated Power* data incorrectly (*I THINK*). Instead of the following line: + +`pradtot = Signal("Radiated Power",['jpf/db/b5r-ptot>out'],[jet])` + +We might need this: + +`pradtot = Signal("Radiated Power",['jpf/db/b5r-ptot\>out'],[jet])` + +The issue has to do with the `>` character in the directory name (without the proper `\` escape character, python may be looking in the wrong path). **NOTE: I need to confirm that there is actually an issue with the way the code is actually using the string.** + + +## Singularity/Docker Notes + +Recall that the data preprocessing step was PAINFULLY slow on Theta, and so I decided to use Cooley. To simplify the process of using Cooley, I created a Docker image with the necessary environment. + +*Personal Note:* I performed this work on my local machine (Mac) in `/Users/rzamora/container-recipes`. + +In order to use a Docker image within a Singularity container (required on ALCF machines), it is useful to build the image on your local machine and push it to "Docker Hub": + + +**Step 1:** Install Docker if you don't have it. [Docker-Mac](https://www.docker.com/docker-mac) works well for Mac. + +**Step 2:** Build a Docker image using the recipe discussed below. + +``` +export IMAGENAME="test_image" +export RECIPENAME="Docker.centos7-cuda-tf1.12.0" +docker build -t $IMAGENAME -f $RECIPENAME . +``` + +You can check that the image is functional by starting an interactive shell session, and checking that the necessary python modules are available. For example (using `-it` for an interactive session): + +``` +docker run --rm -it -v $PWD:/tmp -w /tmp $IMAGENAME:latest bash +# python -c "import keras; import plasma; print(plasma.__file__)" +``` + +Note that the `plasma-python` source code will be located in `/root/plasma-python/` for the recipe described below. + +**Step 3:** Push the image to [Docker Hub](https://hub.docker.com/). + +Using your docker-hub username: + +``` +docker login --username= +``` + +Then, "tag" the image using the `IMAGE ID` value displayed with `docker image ls`: + +``` +docker tag /: