Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add local env setup documentation #199

Merged
merged 15 commits into from
Nov 24, 2022
Merged
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
*****************************
Deploying locally - dev setup
*****************************
************************
Test locally - dev setup
************************


This page gives you directions to locally run the Substra stack. This deployment is made of:
This page gives the directions to locally run the Substra stack. This deployment is made of:

* 1 orchestrator (running in standalone mode, i.e. storing data in its own local database)
* 2 backends (running in two organisations, ``org-1`` and ``org-2``)
Expand Down Expand Up @@ -72,7 +72,7 @@ First, install `Homebrew <https://brew.sh/>`_, then run the following commands:
First time configuration
========================

1. Execute the script :download:`k3-create.sh<./getting-started/k3-create.sh>`. This script deletes the existing cluster, recreates a new one and applies a patch for SSL.
1. Create a Kubernetes cluster, create and patch the Nginx ingress to enable SSL passthrough:

1. Download :download:`k3-create.sh<./getting-started/k3-create.sh>`.
2. Make the script executable.
Expand All @@ -90,7 +90,7 @@ First time configuration
.. tip::
This script can be used to reset your development environment.

2. Add the following line to ``/etc/hosts`` to allow the communication between your local cluster and the host (your machine):
2. Add the following line to the ``/etc/hosts`` file to allow the communication between your local cluster and the host (your machine):

.. code-block:: text

Expand All @@ -101,7 +101,6 @@ First time configuration
.. code-block:: bash

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add stable https://charts.helm.sh/stable
helm repo add twuni https://helm.twun.io
helm repo add jetstack https://charts.jetstack.io

Expand All @@ -113,6 +112,12 @@ First time configuration

git clone https://github.com/Substra/substra.git

* `substrafl <https://github.com/substra/substrafl>`_

.. code-block:: bash

git clone https://github.com/Substra/substrafl.git

* `orchestrator <https://github.com/substra/orchestrator>`_

.. code-block:: bash
Expand Down Expand Up @@ -166,7 +171,7 @@ Launching
skaffold run

.. caution::
On arm64 architecture (e.g. Apple silicon chips M1 & M2), you need to add the profiles ``dev``and ``arm64``.
On arm64 architecture (e.g. Apple silicon chips M1 & M2), you need to add the profiles ``dev`` and ``arm64``.

.. code-block:: bash

Expand All @@ -181,7 +186,7 @@ Launching

* Deploy the frontend. You can use two methods (described below)

a. local server: Execute the following command:
a. Local server: Execute the following command:

.. code-block:: bash

Expand Down Expand Up @@ -210,27 +215,27 @@ To stop the Substra stack, you need to stop the 3 components (backend, orchestra

* Stop the frontend: This action depends on which option you chose during the launch:

a. local server: Stop the process running the local server (usually using CONTROL + C)
a. Local server: Stop the process running the local server (usually using *Control+C* or *Command+C* on macOS)
b. Docker:

.. code-block:: bash

docker stop DOCKER_FRONTEND_CONTAINER_NAME
docker stop DOCKER_FRONTEND_CONTAINER_NAME

| with ``DOCKER_FRONTEND_CONTAINER_NAME`` the name of the frontend container you chose during the launch
* Stop the orchestrator:

.. code-block:: bash

cd orchestrator
skaffold delete
cd orchestrator
skaffold delete

* Stop the backend:

.. code-block:: bash

cd substra-backend
skaffold delete
cd substra-backend
skaffold delete

If this command fails and you still have pods up, you can use the following command to remove the ``org-1`` and ``org-2`` namespaces entirely.

Expand All @@ -241,9 +246,9 @@ If this command fails and you still have pods up, you can use the following comm
Next steps
==========

Now you are ready to go, you are ready to run either the :doc:`/auto_examples/index` or the :doc:`Substrafl (low-level library) examples </substrafl_doc/examples/index>` (low-level library).
Now you are ready to go, you are ready to run either the :doc:`/auto_examples/index` or the :doc:`Substrafl examples </substrafl_doc/examples/index>`.

If you are interested in more deployment options or more customised set-up, you can have a look at :doc:`/operations/deploy` or at the documentation included in the repo of substra_, substra-backend_, orchestrator_ or substra-frontend_.
If you are interested in more deployment options or more customised set-up, you can have a look at :doc:`/operations/overview` or at the documentation included in the repo of substra_, substra-backend_, orchestrator_ or substra-frontend_.

Troubleshooting
===============
Expand All @@ -256,7 +261,7 @@ Troubleshooting
* if you are using a release you can use :ref:`the compatibility table <additional/release:Compatibility table>`.
* if you are using the latest commit from the ``main`` git branch, check that you are up-to-date and see if there were any open issue in the repositories or any bugfixes in the latest commits.

You can also go through :doc:`the instructions one more time </operations/getting-started>`, maybe they changed since you last saw them.
You can also go through :doc:`the instructions one more time </contributing/getting-started>`, maybe they changed since you last saw them.

Troubleshooting prerequisites
-----------------------------
Expand All @@ -266,35 +271,38 @@ This section summarize errors happening when you are not meeting the hardware re
.. note::
The instructions are targeted to some specific platforms (Docker for Windows in certain cases and Docker for Mac), where you can set the resources allowed to Docker in the configuration panel (information available `here for Mac <https://docs.docker.com/desktop/settings/mac/>`__ and `here for Windows <https://docs.docker.com/desktop/settings/windows/>`__).


The following list describes errors that have already occurred, and their resolutions.

* .. code-block:: pycon

<ERROR:substra.sdk.backends.remote.rest_client:Requests error status 502: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>
<ERROR:substra.sdk.backends.remote.rest_client:Requests error status 502: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx</center>
</body>
</html>

WARNING:root:Function _request failed: retrying in 1s>
WARNING:root:Function _request failed: retrying in 1s>

You may have to increase the number of CPU available in the settings panel.
You may have to increase the number of CPU available in the settings panel.

* .. code-block:: go

Unable to connect to the server: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Unable to connect to the server: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

.. code-block:: go

Unable to connect to the server: net/http: TLS handshake timeout
Unable to connect to the server: net/http: TLS handshake timeout

You may have to increase the RAM available in the settings panel.

* If you've got a task with ``FAILED`` status and the logs in the worker are of this form:

.. code-block:: py3

substrapp.exceptions.PodReadinessTimeoutError: Pod substra.ai/pod-name=substra-***-compute-*** failed to reach the \"Running\" phase after 300 seconds."
substrapp.exceptions.PodReadinessTimeoutError: Pod substra.ai/pod-name=substra-***-compute-*** failed to reach the \"Running\" phase after 300 seconds."

Your Docker disk image might be full, increase it or clean it with ``docker system prune -a``

Expand All @@ -304,34 +312,12 @@ Troubleshooting deployment
Skaffold version 1.31.0
^^^^^^^^^^^^^^^^^^^^^^^

Status check is broken in version 1.31.0 and kubectl secret manifests are not applied until helm deploy is done, but helm deploy depends on kubectl secret manifests.
It has been fixed in `Skaffold 1.32.0 (PR #6574) <https://github.com/GoogleContainerTools/skaffold/releases/tag/v1.32.0>`__.

The solution for the version 1.31.0 is to add ``--status-check=false`` when running Skaffold:
Due to a change in the deployment sequence in Skaffold 1.31.x our components cannot be deployed with this version using only ``skaffold run``. Either upgrade to `Skaffold 1.32.0 <https://github.com/GoogleContainerTools/skaffold/releases/tag/v1.32.0>`__ or add the ``--status-check=false`` flag.

.. code-block:: bash

skaffold dev/run/deploy --status-check=false

Failed calling webhook ``validate.nginx.ingress.kubernetes.io``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you encounter the following error message when deploying the backend(s):


.. code-block:: bash

Error: UPGRADE FAILED: failed to create resource: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": an error on the server ("") has prevented the request from succeeding
failed to deploy: install: exit status 1

As a workaround, you can delete the failing webhook by launching the following command:

.. code-block:: bash

kubectl delete Validatingwebhookconfigurations ingress-nginx-admission

You should now be able to :ref:`deploy the backend(s) again<Deploy the backend>`.

Other errors during backend deployment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -347,11 +333,3 @@ If you encounter one of the following errors while deploying the backend:
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority

Check that the orchestrator is deployed and relaunch the command ``skaffold run``.

Troubleshooting monitoring
--------------------------

k9s limits on log lines
^^^^^^^^^^^^^^^^^^^^^^^

By default, k9s limits the log to the last 200 lines. To increase this value, set ``logger.tail`` and ``logger.buffer`` to the desired number (e.g. 5000) in the `k9s config file <https://github.com/derailed/k9s#k9s-configuration>`_.
16 changes: 12 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Some quick links:
* :ref:`MNIST federated learning example <substrafl_doc/examples/get_started/plot_substrafl_torch_fedavg:Using Torch FedAvg on MNIST dataset>`
* :ref:`Substrafl overview <substrafl_doc/substrafl_overview:Overview>`
* :ref:`Compatibility table <additional/release:Compatibility table>`
* :ref:`How to deploy Substra for Site Reliability Engineers <operations/deploy:Deploying Substra>`
* :ref:`How to deploy Substra for Site Reliability Engineers <operations/howto:How-To>`
* :ref:`Community <additional/community:Community>`


Expand All @@ -83,15 +83,23 @@ Some quick links:
documentation/api_reference.rst


.. toctree::
:glob:
:maxdepth: 2
:caption: Contributing to Substra
:hidden:

contributing/components.rst
contributing/getting-started.rst


.. toctree::
:glob:
:maxdepth: 1
:caption: Deploying Substra
:hidden:

operations/index.rst
operations/getting-started.rst
operations/deploy.rst
operations/overview.rst
operations/howto.rst
operations/upgrade_notes.rst

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
*****************
Deploying Substra
*****************
********
Overview
********

Requirements
============
Expand Down Expand Up @@ -75,3 +75,6 @@ User access are created by a dedicated pod (``account-operator``), credentials a

There are also shared credentials to allow direct backend to backend communication.
They are listed under ``addAccountOperator.incomingOrganizations`` or ``addAccountOperator.outgoingOrganizations``.


Now you understand some of the concepts, you can read :doc:`how to deploy Substra </operations/howto>`.