Merge branch 'dev' into features/#101-import-heat-demand-data

openego · Feb 5, 2021 · 727d149 · 727d149
2 parents 5bf23d8 + 30bff3b
commit 727d149
Show file tree

Hide file tree

Showing 11 changed files with 3,940 additions and 144 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -8,6 +8,9 @@ Unreleased
 Added
 -----
 
+
+* Include description of the egon-data workflow in our documentation 
+  `#23 <https://github.com/openego/eGon-data/issues/23>`_
 * There's now a wrapper around `subprocess.run` in
   `egon.data.subprocess.run`. This wrapper catches errors better and
   displays better error messages that Python's built-in function. Use
@@ -28,6 +31,8 @@ Added
   `#3 <https://github.com/openego/eGon-data/issues/3>`_
 * Zensus population data import
   `#2 <https://github.com/openego/eGon-data/issues/2>`_
+* Zensus data import for households, apartments and buildings
+  `#91 <https://github.com/openego/eGon-data/issues/91>`_
 
 Changed
 -------

diff --git a/README.rst b/README.rst
@@ -78,45 +78,114 @@ The data used in the eGo^N project along with the code importing, generating and
 
 * Free software: GNU Affero General Public License v3 or later (AGPLv3+)
 
+.. begin-getting-started-information
+
+Pre-requisites
+==============
+
+In addition to the installation of Python packages, some non-Python
+packages are required too. Right now these are:
+
+* `Docker <https://docs.docker.com/get-started/>`_: Docker is used to provide
+  a PostgreSQL database (in the default case).
+
+  Docker provides extensive installation instruction. Best you consult `their
+  docs <https://docs.docker.com/get-docker/>`_ and choose the appropriate
+  install method for your OS.
+
+  Docker is not required if you use a local PostreSQL installation.
+
+* The `psql` executable. On Ubuntu, this is provided by the
+  `postgresql-client-common` package.
+
+* Header files for the :code:`libpq5` PostgreSQL library. These are necessary
+  to build the :code:`psycopg2` package from source and are provided by the
+  :code:`libpq-dev` package on Ubuntu.
+
+* `osm2pgsql <https://osm2pgsql.org/>`_
+  On recent Ubuntu version you can install it via
+  :code:`sudo apt install osm2pgsql`.
+
+
 Installation
 ============
 
-::
+Since no release is available on PyPI and installations are probably
+used for development, cloning it via
+
+.. code-block:: bash
+
+   git clone git@github.com:openego/eGon-data.git
+
+and installing it in editable mode via
+
+.. code-block:: bash
+
+   pip install -e eGon-data/
+
+are recommended.
+
+In order to keep the package installation isolated, we recommend
+installing the package in a dedicated virtual environment. There's both,
+an `external tool`_ and a `builtin module`_ which help in doing so. I
+also highly recommend spending the time to set up `virtualenvwrapper`_
+to manage your virtual environments if you start having to keep multiple
+ones around.
+
+If you run into any problems during the installation of ``egon.data``,
+try looking into the list of `known installation problems`_ we have
+collected. Maybe we already know of your problem and also of a solution
+to it.
+
+.. _external tool: https://virtualenv.pypa.io/en/latest/
+.. _builtin module: https://docs.python.org/3/tutorial/venv.html#virtual-environments-and-packages
+.. _virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/index.html
+.. _known installation problems: https://eGon-data.readthedocs.io/en/latest/troubleshooting.html#installation-errors
 
-    pip install egon.data
 
-You can also install the in-development version with::
+Run the workflow
+================
 
-    pip install https://github.com/openego/eGon-data/archive/master.zip
+The :py:mod:`egon.data` package installs a command line application
+called :code:`egon-data` with which you can control the workflow so once
+the installation is successful, you can explore the command line
+interface starting with :code:`egon-data --help`.
 
+The most useful subcommand is probably :code:`egon-data serve`. After
+running this command, you can open your browser and point it to
+`localhost:8080`, after which you will see the web interface of `Apache
+Airflow`_ with which you can control the :math:`eGo^n` data processing
+pipeline.
 
-Documentation
-=============
+If running :code:`egon-data` results in an error, we also have collected
+a list of `known runtime errors`_, which can consult in search of a
+solution.
 
+.. _Apache Airflow: https://airflow.apache.org/docs/apache-airflow/stable/ui.html#ui-screenshots
+.. _known runtime errors: https://eGon-data.readthedocs.io/en/latest/troubleshooting.html#runtime-errors
 
-https://eGon-data.readthedocs.io/
+.. warning::
 
+   A complete run of the workflow might require much computing power and
+   can't be run on laptop. Use the :ref:`test mode <Test mode>` for
+   experimenting.
 
-Development
-===========
 
-To run all the tests run::
+Test mode
+---------
 
-    tox
+The workflow can be tested on a smaller subset of data on example of the
+federal state of Bremen.
 
-Note, to combine the coverage data from all the tox environments run:
+.. warning::
 
-.. list-table::
-    :widths: 10 90
-    :stub-columns: 1
+   Right now, only OSM data for Bremen get's imported. This is hard-wired in
+   `egon.data/data_sets.yml`.
 
-    - - Windows
-      - ::
 
-            set PYTEST_ADDOPTS=--cov-append
-            tox
+.. end-getting-started-information
 
-    - - Other
-      - ::
+Further Reading
+===============
 
-            PYTEST_ADDOPTS=--cov-append tox
+You can find more in depth documentation at https://eGon-data.readthedocs.io.
diff --git a/docs/conf.py b/docs/conf.py
@@ -26,8 +26,8 @@
 pygments_style = 'trac'
 templates_path = ['.']
 extlinks = {
-    'issue': ('https://github.com/openego/eGon-data/issues/%s', '#'),
-    'pr': ('https://github.com/openego/eGon-data/pull/%s', 'PR #'),
+    "issue": ("https://github.com/openego/eGon-data/issues/%s", "issue #"),
+    "pr": ("https://github.com/openego/eGon-data/pull/%s", "PR #"),
 }
 # on_rtd is whether we are on readthedocs.org
 on_rtd = os.environ.get('READTHEDOCS', None) == 'True'

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -2,64 +2,6 @@
 Getting Started
 ***************
 
-Pre-requisites
-==============
-
-In addition to the installation of Python packages, some non-Python
-packages are required too. Right now these are:
-
-* `osm2pgsql <https://osm2pgsql.org/>`_
-  On recent Ubuntu version you can install it via
-  :code:`sudo apt install osm2pgsql`.
-
-
-Installation
-============
-
-Since no release is available on PyPI and installations are probably
-used for development, cloning
-
-.. code-block:: bash
-
-   git clone git@github.com:openego/eGon-data.git
-
-and installing in editable mode recommended.
-
-.. code-block:: bash
-
-   pip install -e eGon-data
-
-
-Run the workflow
-================
-
-The :py:mod:`egon.data` package installs a command line application
-called :code:`egon-data` with which you can control the workflow so once
-the installation is successfull, you can explore the command line
-interface starting with :code:`egon-data --help`.
-
-The most useful subcommand is probably :code:`egon-data serve`. After
-running this command, you can open your browser and point it to
-`localhost:8080`, after which you will see the webinterface of `Apache
-Airflow`_ with which you can control the :math:`eGo^n` data processing
-pipeline.
-
-.. _Apache Airflow: https://airflow.apache.org/docs/apache-airflow/stable/ui.html#ui-screenshots
-
-.. warning::
-
-   A complete run of the workflow might require much computing power and
-   can't be run on laptop. Use the :ref:`test mode <Test mode>` for
-   experimenting.
-
-
-Test mode
----------
-
-The workflow can be tested on a smaller subset of data on example of the
-federal state of Bremen.
-
-.. warning::
-
-   Right now, only OSM data for Bremen get's imported. This is hard-wired in
-   `egon.data/data_sets.yml`.
+.. include:: ../README.rst
+    :start-after: .. begin-getting-started-information
+    :end-before: .. end-getting-started-information
diff --git a/docs/images/DP_Workflow_15012021.svg b/docs/images/DP_Workflow_15012021.svg
diff --git a/docs/index.rst b/docs/index.rst
@@ -7,6 +7,7 @@ Contents
 
    getting_started
    workflow
+   troubleshooting
    data
    literature
    contributing
@@ -20,4 +21,3 @@ Indices and tables
 * :ref:`genindex`
 * :ref:`modindex`
 * :ref:`search`
-
diff --git a/docs/troubleshooting.rst b/docs/troubleshooting.rst
@@ -0,0 +1,91 @@
+***************
+Troubleshooting
+***************
+
+Having trouble installing or running ``eGon-data``? Here's a list of
+known issues including a solution.
+
+
+Installation Errors
+===================
+
+These are some errors you might encounter while trying to install
+:py:mod:`egon.data`.
+
+``importlib_metadata.PackageNotFoundError: No package metadata ...``
+--------------------------------------------------------------------
+
+It might happen that you have installed `importlib-metadata=3.1.0` for some
+reason which will lead to this error. Make sure you have
+`importlib-metadata>=3.1.1` installed. For more information read the
+discussion in :issue:`60`.
+
+
+Runtime Errors
+==============
+
+These are some of the errors you might encounter while trying to run
+:code:`egon-data`.
+
+``ERROR: Couldn't connect to Docker daemon ...``
+------------------------------------------------
+
+To verify, please execute :code:`docker-compose -f <(echo {"service":
+{"image": "hellow-world"}}) ps` and you should see something like
+
+
+.. code-block:: none
+
+    ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?
+
+    If it's at a non-standard location, specify the URL with the DOCKER_HOST environment
+    variable.
+
+This can have at least two possible reasons. First, the docker daemon
+might not be running. On Linux Systems, you can check for this by
+running :code:`ps -e | grep dockerd`. If this generates no output, you
+have to start the docker daemon, which you can do via :code:`sudo
+systemctl start docker.service` on recent Ubuntu systems.
+
+Second, your current user might not be a member of the `docker` group. On
+Linux, you can check this by running :code:`groups $(whoami)`. If the
+output does not contain the word `docker`, you have to add your current
+user to the `docker` group. You can find more information on how to do
+this in the `docker documentation`_. Read the :issue:`initial discussion
+<33>` for more context.
+
+.. _docker documentation: https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user
+
+
+``[ERROR] Connection in use ...``
+---------------------------------
+
+This error might arise when running :code:`egon-data serve` making it
+shut down early with :code:`ERROR - Shutting down webserver`. The reason
+for this is that the local webserver from a previous :code:`egon-data
+serve` run didn't shut down properly and is still running. This can be
+fixed by running :code:`ps -eo pid,command  | grep "gunicorn: master" |
+grep -v grep` which should lead to output like :code:`NUMBER gunicorn:
+master [airflow-webserver]` where :code:`NUMBER` is a varying number.
+Once you got this, run :code:`kill -s INT NUMBER`, substituting
+:code:`NUMBER` with what you got previously. After this,
+:code:`egon-data serve` should run without errors again.
+
+
+Other import or incompatible package version errors
+===================================================
+
+If you get an :py:class:`ImportError` when trying to run ``egon-data``,
+or the installation complains with something like
+
+.. code-block:: none
+
+  first-package a.b.c requires second-package>=q.r.r, but you'll have
+  second-package x.y.z which is incompatible.
+
+you might have run into a problem of earlier ``pip`` versions. Either
+upgrade to a ``pip`` version >=20.3 and reinstall ``egon.data``, or
+reinstall the package via ``pip install -U --use-feature=2020-resolver``.
+The ``-U`` flag is important to actually force a reinstall. For more
+information read the discussions in issues :issue:`#36 <36>` and
+:issue:`#37 <37>`.
diff --git a/docs/workflow.rst b/docs/workflow.rst
@@ -1,3 +1,45 @@
 ********
 Workflow
 ********
+
+Project background
+-----------------
+
+egon-data provides a transparent and reproducible open data based data processing pipeline for generating data models suitable for energy system modeling. The data is customized for the requirements of the research project eGo_n. The research project aims to develop tools for an open and cross-sectoral planning of transmission and distribution grids. For further information please visit the `eGo_n project website <https://ego-n.org/>`_.
+egon-data is a further development of the `Data processing <https://github.com/openego/data_processing>`_ developed in the former research project `open_eGo <https://openegoproject.wordpress.com/>`_. It aims for an extensions of the data models as well as for a better replicability and manageability of the data preparation and processing. 
+The resulting data set serves as an input for the optimization tools `eTraGo <https://github.com/openego/eTraGo>`_, `ding0 <https://github.com/openego/ding0>`_ and `eDisGo <https://github.com/openego/eDisGo>`_ and delivers for example data on grid topologies, demands/demand curves and generation capacities in a high spatial resolution. The outputs of egon-data are published under open source and open data licenses.  
+
+Data
+----
+
+egon-data retrieves and processes data from several different external input sources which are all freely available and published under an open data license. The process handles data with different data types, such as spatial data with a high geographical resolution or load/generation time series with an hourly resolution.  
+
+Execution
+---------
+
+In principle egon-data is not limited to the use of a specific programming language as the workflow integrates different scripts using Apache Airflow, but Python and SQL are widely used within the process. Apache Airflow organizes the order of execution of processing steps through so-called operators. In the default case the SQL processing is executed on a containerized local PostgreSQL database using Docker. For further information on Docker and its installation please refer to their `documentation <https://docs.docker.com/>`_. Connection information of our local Docker database are defined in the corresponding `docker-compose.yml <https://github.com/openego/eGon-data/blob/dev/src/egon/data/airflow/docker-compose.yml>`_ 
+
+The egon-data workflow is composed of four different sections: database setup, data import, data processing and data export to the OpenEnergy Platform. Each section consists of different tasks, which are managed by Apache Airflow and correspond with the local database. 
+Only final datasets which function as an input for the optimization tools or selected interim results are uploaded to the `Open Energy Platform <https://openenergy-platform.org/>`_. 
+The data processing in egon-data needs to be performed locally as calculations on the Open Energy Platform are prohibited. 
+More information on how to run the workflow can be found in the `getting started section <https://egon-data.readthedocs.io/en/latest/getting_started.html#run-the-workflow>`_ of our documentation.
+
+.. _DP_workflow_sketch:
+.. figure:: images/DP_Workflow_15012021.svg
+
+
+Versioning
+----------
+
+.. warning::
+   Please note, the following is not implemented yet, but we are working on it.
+
+Source code and data are versioned independendly from each other. Every data table uploaded to the Open Energy Platform contains a column 'version' which is used to identify different versions of the same data set. The version number is maintained for every table separately. This is a major difference to the versioning concept applied in the former data processing where all (interim) results were versioned under the same version number.  
+
+
+
+
+
+
+
+