Update example access result part and update documentation (NVIDIA#681)

YanxuanLiu · Jun 17, 2022 · 11df63f · 11df63f
1 parent a335a30
commit 11df63f
Show file tree

Hide file tree

Showing 18 changed files with 467 additions and 461 deletions.
diff --git a/.gitignore b/.gitignore
@@ -141,6 +141,3 @@ variables.*
 *.hdf5
 
 wrksp/*
-
-# poc run data
-*run_*
diff --git a/docs/example_applications.rst b/docs/example_applications.rst
@@ -10,7 +10,7 @@ NVIDIA FLARE has several examples to help you get started with federated learnin
    :maxdepth: 1
    :hidden:
 
-   examples/hello_numpy
+   examples/hello_scatter_and_gather
    examples/hello_cross_val
    examples/hello_pt 
    examples/hello_pt_tb

diff --git a/docs/examples/access_result.rst b/docs/examples/access_result.rst
@@ -0,0 +1,12 @@
+Accessing the results
+^^^^^^^^^^^^^^^^^^^^^
+
+Once the job is finished, you can issue the ``download_job [JOB_ID]``
+in the admin client to download the results.
+
+`[JOB_ID]` is the ID assigned by the system when submitting the job.
+
+The result will be downloaded to your admin workspace
+(the exact download path will be displayed when running the command).
+
+The download workspace will be in ``[DOWNLOAD_DIR]/[JOB_ID]/workspace/``.
diff --git a/docs/examples/hello_cross_val.rst b/docs/examples/hello_cross_val.rst
@@ -6,40 +6,49 @@ Hello Cross-Site Validation
 Before You Start
 ----------------
 
-Before jumping into this QuickStart guide, make sure you have an environment with `NVIDIA FLARE <https://pypi.org/project/nvflare/>`_
-installed. You can follow :doc:`installation <../installation>` on the general concept of setting up a Python virtual
-environment (the recommended environment) and how to install NVIDIA FLARE.
+Before jumping into this guide, make sure you have an environment
+with `NVIDIA FLARE <https://pypi.org/project/nvflare/>`_ installed.
+
+You can follow the :ref:`installation <installation>` guide on the general concept of setting up a
+Python virtual environment (the recommended environment) and how to install NVIDIA FLARE.
 
 Prerequisite
 -------------
 
-This example builds on the :doc:`Hello Numpy <hello_numpy>` example based on the :class:`ScatterAndGather<nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather>`
-workflow. Please make sure you go through it completely as the concepts are heavily tied.
+This example builds on the :doc:`Hello Scatter and Gather <hello_scatter_and_gather>` example
+based on the :class:`ScatterAndGather<nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather>` workflow.
+
+Please make sure you go through it completely as the concepts are heavily tied.
 
 Introduction
 -------------
 
-This tutorial is meant to solely demonstrate how the NVIDIA FLARE system works, without introducing any actual deep learning concepts.
-Through this exercise, you will learn how to use NVIDIA FLARE with numpy to perform cross site validation after training.
-The training process is explained in the :doc:`Hello Numpy <hello_numpy>` example.
-Using simplified weights and metrics, you will be able to clearly see how NVIDIA FLARE performs validation across different
-sites with little extra work.
+This tutorial is meant to solely demonstrate how the NVIDIA FLARE system works,
+without introducing any actual deep learning concepts.
+
+Through this exercise, you will learn how to use NVIDIA FLARE with numpy to perform cross site validation
+after training.
+
+The training process is explained in the :doc:`Hello Scatter and Gather <hello_scatter_and_gather>` example.
+
+Using simplified weights and metrics, you will be able to clearly see how NVIDIA FLARE performs
+validation across different sites with little extra work.
 
-The design of this exercise follows on the :doc:`Hello Numpy <hello_numpy>` example which consists of one **server** and
-two **clients** starting with weights ``[[1, 2, 3], [4, 5, 6], [7, 8, 9]]``.
+The setup of this exercise consists of one **server** and two **clients**.
+The server side model starting with weights ``[[1, 2, 3], [4, 5, 6], [7, 8, 9]]``.
 
 Cross site validation consists of the following steps:
 
     - During the initial phase of training with the :class:`ScatterAndGather<nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather>`
       workflow, NPTrainer saves the local model to disk for the clients.
-    - The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>` workflow gets
-      the client models with the ``submit_model`` task.
+    - The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>` workflow
+      gets the client models with the ``submit_model`` task.
     - The ``validate`` task is broadcast to the all participating clients with the model shareable containing the model data,
       and results from the ``validate`` task are saved.
 
 During this exercise, we will see how NVIDIA FLARE takes care of most of the above steps with little work from the user.
-We will be working with the ``hello-numpy-cross-val`` application in the examples folder. Custom FL applications
-can contain the folders:
+We will be working with the ``hello-numpy-cross-val`` application in the examples folder.
+Custom FL applications can contain the folders:
 
  #. **custom**: contains the custom components (``np_trainer.py``, ``np_model_persistor.py``, ``np_validator.py``, ``np_model_locator``, ``np_formatter``)
  #. **config**: contains client and server configurations (``config_fed_client.json``, ``config_fed_server.json``)
@@ -51,7 +60,8 @@ Let's get started. First clone the repo, if you haven't already:
 
   $ git clone https://github.com/NVIDIA/NVFlare.git
 
-Remember to activate your NVIDIA FLARE Python virtual environment from the installation guide. Ensure numpy is installed.
+Remember to activate your NVIDIA FLARE Python virtual environment from the installation guide.
+Ensure numpy is installed.
 
 .. code-block:: shell
 
@@ -63,20 +73,24 @@ Now that you have all your dependencies installed, let's implement the Federated
 Training
 --------------------------------
 
-In the :doc:`Hello Numpy <hello_numpy>` example, we implemented the ``NPTrainer`` object. In this example, we use the
-same ``NPTrainer`` but extend it to process the ``submit_model`` task to work with the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
+In the :doc:`Hello Scatter and Gather <hello_scatter_and_gather>` example, we implemented the ``NPTrainer`` object.
+In this example, we use the same ``NPTrainer`` but extend it to process the ``submit_model`` task to
+work with the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
 workflow to get the client models.
 
-The code in *np_trainer.py* saves the model to disk after each step of training in the model.
+The code in ``np_trainer.py`` saves the model to disk after each step of training in the model.
 
-Note that the server also produces a global model. The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
+Note that the server also produces a global model.
+The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
 workflow submits the server model for evaluation after the client models.
 
 Implementing the Validator
 --------------------------
 
-The validator is an Executor that is called for validating the models received from the server during the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
-workflow. These models could be from other clients or models generated on server.
+The validator is an Executor that is called for validating the models received from the server during
+the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>` workflow.
+
+These models could be from other clients or models generated on server.
 
 .. literalinclude:: ../../nvflare/app_common/np/np_validator.py
    :language: python
@@ -85,9 +99,10 @@ workflow. These models could be from other clients or models generated on server
    :linenos:
    :caption: np_validator.py
 
-The validator is an Executor and implements the **execute** function which receives a Shareable. It handles the ``validate``
-task by performing a calculation to find the sum divided by the max of the data and adding a random random_epsilon before
-returning the results packaged with a DXO into a Shareable.
+The validator is an Executor and implements the **execute** function which receives a Shareable.
+
+It handles the ``validate`` task by performing a calculation to find the sum divided by the max of the data
+and adding a ``random_epsilon`` before returning the results packaged with a DXO into a Shareable.
 
 .. note::
 
@@ -106,8 +121,9 @@ Inside the config folder there are two files, ``config_fed_client.json`` and ``c
    :caption: config_fed_server.json
 
 The server now has a second workflow configured after Scatter and Gather, :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`.
+
 The components "model_locator" and "formatter" have been added to work with the cross site model evaluation workflow,
-and the rest is the same as in :doc:`Hello Numpy <hello_numpy>`.
+and the rest is the same as in :doc:`Hello Scatter and Gather <hello_scatter_and_gather>`.
 
 
 .. literalinclude:: ../../examples/hello-numpy-cross-val/config/config_fed_client.json
@@ -122,62 +138,43 @@ workflow to get the client models.
 Cross site validation!
 ----------------------
 
-Now you can use admin command prompt to submit and start this example app. To do this on a proof of concept local
-FL system, follow the sections :ref:`setting_up_poc` and :ref:`starting_poc` if you have not already.
+.. |ExampleApp| replace:: hello-numpy-cross-val
+.. include:: run_fl_system.rst
 
-Running the FL System
-^^^^^^^^^^^^^^^^^^^^^
+During the first phase, the model will be trained.
 
-.. include:: run_example.rst
-
-.. code-block:: shell
+During the second phase, cross site validation will happen.
 
-    > submit_job hello-numpy-cross-val
-
-This command uploads the job configuration from the admin client to the server. A job id will be returned, and we can
-use that id to access job information.
-
-From time to time, you can issue ``check_status server`` in the admin client to check the entire training progress. During the first phase,
-the model will be trained. During the second phase, cross site validation will happen. The workflow on the
-client will change to :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
+The workflow on the client will change to :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
 as it enters this second phase.
 
-Accessing the results
----------------------
-
 During cross site model evaluation, every client validates other clients' models and server models (if present).
-This can produce a lot of results. All the results are kept on the server in 
-*<run_dir>/cross_val_results.json*. All the models sent to the server are also present in the
-*<run_dir>/<client_uid>/* directory.
-
-The results will be in the json format.
-
+This can produce a lot of results. All the results will be kept in the job's workspace when it is completed.
 
 Understanding the Output
-^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^
 
-After starting the server and clients, you should begin to see 
-some outputs in each terminal tracking the progress of the FL run. 
+After starting the server and clients, you should begin to see
+some outputs in each terminal tracking the progress of the FL run.
 As each client finishes training, it will start the cross site validation process.
-Druing this you'll see several important outputs the track the progress 
-of cross site validation.
+During this you'll see several important outputs the track the progress of cross site validation.
 
-The server shows the log of each client requesting models, the models it sends and the
-results received. Since the server could be responding to many clients at the same time, it may 
+The server shows the log of each client requesting models, the models it sends and the results received.
+Since the server could be responding to many clients at the same time, it may
 require careful examination to make proper sense of events from the jumbled logs.
 
-Once the FL run is complete and the server has successfully aggregated the client's results after all the rounds, and
-cross site model evaluation is finished, run the following commands in the fl_admin to shutdown the system (while
-inputting ``admin`` when prompted with password):
 
-.. code-block:: shell
+.. include:: access_result.rst
 
-    > shutdown client
-    > shutdown server
-    > bye
+.. note::
+    You could see the cross-site validation results
+    at ``[DOWNLOAD_DIR]/[JOB_ID]/workspace/cross_site_val/cross_val_results.json``
 
-In order to stop all processes, run ``./stop_fl.sh``. 
+.. include:: shutdown_fl_system.rst
 
 Congratulations!
+
 You've successfully run your numpy federated learning system with cross site validation.
-The full `source code <https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-numpy-cross-val/>`_ for this exercise can be found in ``examples/hello-numpy-cross-val``.
+
+The full source code for this exercise can be found in
+`examples/hello-numpy-cross-val <https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-numpy-cross-val/>`_.
-Original file line number
+Diff line change
@@ Expand Up / @@ -141,6 +141,3 @@ variables.* @@
     *.hdf5
     wrksp/*
-    # poc run data
-    *run_*