Skip to content

Commit

Permalink
Update example access result part and update documentation (NVIDIA#681)
Browse files Browse the repository at this point in the history
  • Loading branch information
YuanTingHsieh authored Jun 17, 2022
1 parent a335a30 commit 11df63f
Show file tree
Hide file tree
Showing 18 changed files with 467 additions and 461 deletions.
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,3 @@ variables.*
*.hdf5

wrksp/*

# poc run data
*run_*
2 changes: 1 addition & 1 deletion docs/example_applications.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ NVIDIA FLARE has several examples to help you get started with federated learnin
:maxdepth: 1
:hidden:

examples/hello_numpy
examples/hello_scatter_and_gather
examples/hello_cross_val
examples/hello_pt
examples/hello_pt_tb
Expand Down
12 changes: 12 additions & 0 deletions docs/examples/access_result.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Accessing the results
^^^^^^^^^^^^^^^^^^^^^

Once the job is finished, you can issue the ``download_job [JOB_ID]``
in the admin client to download the results.

`[JOB_ID]` is the ID assigned by the system when submitting the job.

The result will be downloaded to your admin workspace
(the exact download path will be displayed when running the command).

The download workspace will be in ``[DOWNLOAD_DIR]/[JOB_ID]/workspace/``.
131 changes: 64 additions & 67 deletions docs/examples/hello_cross_val.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,40 +6,49 @@ Hello Cross-Site Validation
Before You Start
----------------

Before jumping into this QuickStart guide, make sure you have an environment with `NVIDIA FLARE <https://pypi.org/project/nvflare/>`_
installed. You can follow :doc:`installation <../installation>` on the general concept of setting up a Python virtual
environment (the recommended environment) and how to install NVIDIA FLARE.
Before jumping into this guide, make sure you have an environment
with `NVIDIA FLARE <https://pypi.org/project/nvflare/>`_ installed.

You can follow the :ref:`installation <installation>` guide on the general concept of setting up a
Python virtual environment (the recommended environment) and how to install NVIDIA FLARE.

Prerequisite
-------------

This example builds on the :doc:`Hello Numpy <hello_numpy>` example based on the :class:`ScatterAndGather<nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather>`
workflow. Please make sure you go through it completely as the concepts are heavily tied.
This example builds on the :doc:`Hello Scatter and Gather <hello_scatter_and_gather>` example
based on the :class:`ScatterAndGather<nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather>` workflow.

Please make sure you go through it completely as the concepts are heavily tied.

Introduction
-------------

This tutorial is meant to solely demonstrate how the NVIDIA FLARE system works, without introducing any actual deep learning concepts.
Through this exercise, you will learn how to use NVIDIA FLARE with numpy to perform cross site validation after training.
The training process is explained in the :doc:`Hello Numpy <hello_numpy>` example.
Using simplified weights and metrics, you will be able to clearly see how NVIDIA FLARE performs validation across different
sites with little extra work.
This tutorial is meant to solely demonstrate how the NVIDIA FLARE system works,
without introducing any actual deep learning concepts.

Through this exercise, you will learn how to use NVIDIA FLARE with numpy to perform cross site validation
after training.

The training process is explained in the :doc:`Hello Scatter and Gather <hello_scatter_and_gather>` example.

Using simplified weights and metrics, you will be able to clearly see how NVIDIA FLARE performs
validation across different sites with little extra work.

The design of this exercise follows on the :doc:`Hello Numpy <hello_numpy>` example which consists of one **server** and
two **clients** starting with weights ``[[1, 2, 3], [4, 5, 6], [7, 8, 9]]``.
The setup of this exercise consists of one **server** and two **clients**.
The server side model starting with weights ``[[1, 2, 3], [4, 5, 6], [7, 8, 9]]``.

Cross site validation consists of the following steps:

- During the initial phase of training with the :class:`ScatterAndGather<nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather>`
workflow, NPTrainer saves the local model to disk for the clients.
- The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>` workflow gets
the client models with the ``submit_model`` task.
- The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>` workflow
gets the client models with the ``submit_model`` task.
- The ``validate`` task is broadcast to the all participating clients with the model shareable containing the model data,
and results from the ``validate`` task are saved.

During this exercise, we will see how NVIDIA FLARE takes care of most of the above steps with little work from the user.
We will be working with the ``hello-numpy-cross-val`` application in the examples folder. Custom FL applications
can contain the folders:
We will be working with the ``hello-numpy-cross-val`` application in the examples folder.
Custom FL applications can contain the folders:

#. **custom**: contains the custom components (``np_trainer.py``, ``np_model_persistor.py``, ``np_validator.py``, ``np_model_locator``, ``np_formatter``)
#. **config**: contains client and server configurations (``config_fed_client.json``, ``config_fed_server.json``)
Expand All @@ -51,7 +60,8 @@ Let's get started. First clone the repo, if you haven't already:
$ git clone https://github.com/NVIDIA/NVFlare.git
Remember to activate your NVIDIA FLARE Python virtual environment from the installation guide. Ensure numpy is installed.
Remember to activate your NVIDIA FLARE Python virtual environment from the installation guide.
Ensure numpy is installed.

.. code-block:: shell
Expand All @@ -63,20 +73,24 @@ Now that you have all your dependencies installed, let's implement the Federated
Training
--------------------------------

In the :doc:`Hello Numpy <hello_numpy>` example, we implemented the ``NPTrainer`` object. In this example, we use the
same ``NPTrainer`` but extend it to process the ``submit_model`` task to work with the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
In the :doc:`Hello Scatter and Gather <hello_scatter_and_gather>` example, we implemented the ``NPTrainer`` object.
In this example, we use the same ``NPTrainer`` but extend it to process the ``submit_model`` task to
work with the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
workflow to get the client models.

The code in *np_trainer.py* saves the model to disk after each step of training in the model.
The code in ``np_trainer.py`` saves the model to disk after each step of training in the model.

Note that the server also produces a global model. The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
Note that the server also produces a global model.
The :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
workflow submits the server model for evaluation after the client models.

Implementing the Validator
--------------------------

The validator is an Executor that is called for validating the models received from the server during the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
workflow. These models could be from other clients or models generated on server.
The validator is an Executor that is called for validating the models received from the server during
the :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>` workflow.

These models could be from other clients or models generated on server.

.. literalinclude:: ../../nvflare/app_common/np/np_validator.py
:language: python
Expand All @@ -85,9 +99,10 @@ workflow. These models could be from other clients or models generated on server
:linenos:
:caption: np_validator.py

The validator is an Executor and implements the **execute** function which receives a Shareable. It handles the ``validate``
task by performing a calculation to find the sum divided by the max of the data and adding a random random_epsilon before
returning the results packaged with a DXO into a Shareable.
The validator is an Executor and implements the **execute** function which receives a Shareable.

It handles the ``validate`` task by performing a calculation to find the sum divided by the max of the data
and adding a ``random_epsilon`` before returning the results packaged with a DXO into a Shareable.

.. note::

Expand All @@ -106,8 +121,9 @@ Inside the config folder there are two files, ``config_fed_client.json`` and ``c
:caption: config_fed_server.json

The server now has a second workflow configured after Scatter and Gather, :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`.

The components "model_locator" and "formatter" have been added to work with the cross site model evaluation workflow,
and the rest is the same as in :doc:`Hello Numpy <hello_numpy>`.
and the rest is the same as in :doc:`Hello Scatter and Gather <hello_scatter_and_gather>`.


.. literalinclude:: ../../examples/hello-numpy-cross-val/config/config_fed_client.json
Expand All @@ -122,62 +138,43 @@ workflow to get the client models.
Cross site validation!
----------------------

Now you can use admin command prompt to submit and start this example app. To do this on a proof of concept local
FL system, follow the sections :ref:`setting_up_poc` and :ref:`starting_poc` if you have not already.
.. |ExampleApp| replace:: hello-numpy-cross-val
.. include:: run_fl_system.rst

Running the FL System
^^^^^^^^^^^^^^^^^^^^^
During the first phase, the model will be trained.

.. include:: run_example.rst

.. code-block:: shell
During the second phase, cross site validation will happen.

> submit_job hello-numpy-cross-val
This command uploads the job configuration from the admin client to the server. A job id will be returned, and we can
use that id to access job information.

From time to time, you can issue ``check_status server`` in the admin client to check the entire training progress. During the first phase,
the model will be trained. During the second phase, cross site validation will happen. The workflow on the
client will change to :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
The workflow on the client will change to :class:`CrossSiteModelEval<nvflare.app_common.workflows.cross_site_model_eval.CrossSiteModelEval>`
as it enters this second phase.

Accessing the results
---------------------

During cross site model evaluation, every client validates other clients' models and server models (if present).
This can produce a lot of results. All the results are kept on the server in
*<run_dir>/cross_val_results.json*. All the models sent to the server are also present in the
*<run_dir>/<client_uid>/* directory.

The results will be in the json format.

This can produce a lot of results. All the results will be kept in the job's workspace when it is completed.

Understanding the Output
^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^

After starting the server and clients, you should begin to see
some outputs in each terminal tracking the progress of the FL run.
After starting the server and clients, you should begin to see
some outputs in each terminal tracking the progress of the FL run.
As each client finishes training, it will start the cross site validation process.
Druing this you'll see several important outputs the track the progress
of cross site validation.
During this you'll see several important outputs the track the progress of cross site validation.

The server shows the log of each client requesting models, the models it sends and the
results received. Since the server could be responding to many clients at the same time, it may
The server shows the log of each client requesting models, the models it sends and the results received.
Since the server could be responding to many clients at the same time, it may
require careful examination to make proper sense of events from the jumbled logs.

Once the FL run is complete and the server has successfully aggregated the client's results after all the rounds, and
cross site model evaluation is finished, run the following commands in the fl_admin to shutdown the system (while
inputting ``admin`` when prompted with password):

.. code-block:: shell
.. include:: access_result.rst

> shutdown client
> shutdown server
> bye
.. note::
You could see the cross-site validation results
at ``[DOWNLOAD_DIR]/[JOB_ID]/workspace/cross_site_val/cross_val_results.json``

In order to stop all processes, run ``./stop_fl.sh``.
.. include:: shutdown_fl_system.rst

Congratulations!

You've successfully run your numpy federated learning system with cross site validation.
The full `source code <https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-numpy-cross-val/>`_ for this exercise can be found in ``examples/hello-numpy-cross-val``.

The full source code for this exercise can be found in
`examples/hello-numpy-cross-val <https://github.com/NVIDIA/NVFlare/tree/main/examples/hello-numpy-cross-val/>`_.
Loading

0 comments on commit 11df63f

Please sign in to comment.