Skip to content

Commit

Permalink
Merge branch 'dev' into features/#101-import-heat-demand-data
Browse files Browse the repository at this point in the history
  • Loading branch information
EvaWie committed Feb 5, 2021
2 parents 5bf23d8 + 30bff3b commit 727d149
Show file tree
Hide file tree
Showing 11 changed files with 3,940 additions and 144 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ Unreleased
Added
-----


* Include description of the egon-data workflow in our documentation
`#23 <https://github.com/openego/eGon-data/issues/23>`_
* There's now a wrapper around `subprocess.run` in
`egon.data.subprocess.run`. This wrapper catches errors better and
displays better error messages that Python's built-in function. Use
Expand All @@ -28,6 +31,8 @@ Added
`#3 <https://github.com/openego/eGon-data/issues/3>`_
* Zensus population data import
`#2 <https://github.com/openego/eGon-data/issues/2>`_
* Zensus data import for households, apartments and buildings
`#91 <https://github.com/openego/eGon-data/issues/91>`_

Changed
-------
Expand Down
113 changes: 91 additions & 22 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,45 +78,114 @@ The data used in the eGo^N project along with the code importing, generating and

* Free software: GNU Affero General Public License v3 or later (AGPLv3+)

.. begin-getting-started-information
Pre-requisites
==============

In addition to the installation of Python packages, some non-Python
packages are required too. Right now these are:

* `Docker <https://docs.docker.com/get-started/>`_: Docker is used to provide
a PostgreSQL database (in the default case).

Docker provides extensive installation instruction. Best you consult `their
docs <https://docs.docker.com/get-docker/>`_ and choose the appropriate
install method for your OS.

Docker is not required if you use a local PostreSQL installation.

* The `psql` executable. On Ubuntu, this is provided by the
`postgresql-client-common` package.

* Header files for the :code:`libpq5` PostgreSQL library. These are necessary
to build the :code:`psycopg2` package from source and are provided by the
:code:`libpq-dev` package on Ubuntu.

* `osm2pgsql <https://osm2pgsql.org/>`_
On recent Ubuntu version you can install it via
:code:`sudo apt install osm2pgsql`.


Installation
============

::
Since no release is available on PyPI and installations are probably
used for development, cloning it via

.. code-block:: bash
git clone git@github.com:openego/eGon-data.git
and installing it in editable mode via

.. code-block:: bash
pip install -e eGon-data/
are recommended.

In order to keep the package installation isolated, we recommend
installing the package in a dedicated virtual environment. There's both,
an `external tool`_ and a `builtin module`_ which help in doing so. I
also highly recommend spending the time to set up `virtualenvwrapper`_
to manage your virtual environments if you start having to keep multiple
ones around.

If you run into any problems during the installation of ``egon.data``,
try looking into the list of `known installation problems`_ we have
collected. Maybe we already know of your problem and also of a solution
to it.

.. _external tool: https://virtualenv.pypa.io/en/latest/
.. _builtin module: https://docs.python.org/3/tutorial/venv.html#virtual-environments-and-packages
.. _virtualenvwrapper: https://virtualenvwrapper.readthedocs.io/en/latest/index.html
.. _known installation problems: https://eGon-data.readthedocs.io/en/latest/troubleshooting.html#installation-errors

pip install egon.data

You can also install the in-development version with::
Run the workflow
================

pip install https://github.com/openego/eGon-data/archive/master.zip
The :py:mod:`egon.data` package installs a command line application
called :code:`egon-data` with which you can control the workflow so once
the installation is successful, you can explore the command line
interface starting with :code:`egon-data --help`.

The most useful subcommand is probably :code:`egon-data serve`. After
running this command, you can open your browser and point it to
`localhost:8080`, after which you will see the web interface of `Apache
Airflow`_ with which you can control the :math:`eGo^n` data processing
pipeline.

Documentation
=============
If running :code:`egon-data` results in an error, we also have collected
a list of `known runtime errors`_, which can consult in search of a
solution.

.. _Apache Airflow: https://airflow.apache.org/docs/apache-airflow/stable/ui.html#ui-screenshots
.. _known runtime errors: https://eGon-data.readthedocs.io/en/latest/troubleshooting.html#runtime-errors

https://eGon-data.readthedocs.io/
.. warning::

A complete run of the workflow might require much computing power and
can't be run on laptop. Use the :ref:`test mode <Test mode>` for
experimenting.

Development
===========

To run all the tests run::
Test mode
---------

tox
The workflow can be tested on a smaller subset of data on example of the
federal state of Bremen.

Note, to combine the coverage data from all the tox environments run:
.. warning::

.. list-table::
:widths: 10 90
:stub-columns: 1
Right now, only OSM data for Bremen get's imported. This is hard-wired in
`egon.data/data_sets.yml`.

- - Windows
- ::

set PYTEST_ADDOPTS=--cov-append
tox
.. end-getting-started-information
- - Other
- ::
Further Reading
===============

PYTEST_ADDOPTS=--cov-append tox
You can find more in depth documentation at https://eGon-data.readthedocs.io.
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@
pygments_style = 'trac'
templates_path = ['.']
extlinks = {
'issue': ('https://github.com/openego/eGon-data/issues/%s', '#'),
'pr': ('https://github.com/openego/eGon-data/pull/%s', 'PR #'),
"issue": ("https://github.com/openego/eGon-data/issues/%s", "issue #"),
"pr": ("https://github.com/openego/eGon-data/pull/%s", "PR #"),
}
# on_rtd is whether we are on readthedocs.org
on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
Expand Down
64 changes: 3 additions & 61 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,64 +2,6 @@
Getting Started
***************

Pre-requisites
==============

In addition to the installation of Python packages, some non-Python
packages are required too. Right now these are:

* `osm2pgsql <https://osm2pgsql.org/>`_
On recent Ubuntu version you can install it via
:code:`sudo apt install osm2pgsql`.


Installation
============

Since no release is available on PyPI and installations are probably
used for development, cloning

.. code-block:: bash
git clone git@github.com:openego/eGon-data.git
and installing in editable mode recommended.

.. code-block:: bash
pip install -e eGon-data
Run the workflow
================

The :py:mod:`egon.data` package installs a command line application
called :code:`egon-data` with which you can control the workflow so once
the installation is successfull, you can explore the command line
interface starting with :code:`egon-data --help`.

The most useful subcommand is probably :code:`egon-data serve`. After
running this command, you can open your browser and point it to
`localhost:8080`, after which you will see the webinterface of `Apache
Airflow`_ with which you can control the :math:`eGo^n` data processing
pipeline.

.. _Apache Airflow: https://airflow.apache.org/docs/apache-airflow/stable/ui.html#ui-screenshots

.. warning::

A complete run of the workflow might require much computing power and
can't be run on laptop. Use the :ref:`test mode <Test mode>` for
experimenting.


Test mode
---------

The workflow can be tested on a smaller subset of data on example of the
federal state of Bremen.

.. warning::

Right now, only OSM data for Bremen get's imported. This is hard-wired in
`egon.data/data_sets.yml`.
.. include:: ../README.rst
:start-after: .. begin-getting-started-information
:end-before: .. end-getting-started-information
3,526 changes: 3,526 additions & 0 deletions docs/images/DP_Workflow_15012021.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Contents

getting_started
workflow
troubleshooting
data
literature
contributing
Expand All @@ -20,4 +21,3 @@ Indices and tables
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

91 changes: 91 additions & 0 deletions docs/troubleshooting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
***************
Troubleshooting
***************

Having trouble installing or running ``eGon-data``? Here's a list of
known issues including a solution.


Installation Errors
===================

These are some errors you might encounter while trying to install
:py:mod:`egon.data`.

``importlib_metadata.PackageNotFoundError: No package metadata ...``
--------------------------------------------------------------------

It might happen that you have installed `importlib-metadata=3.1.0` for some
reason which will lead to this error. Make sure you have
`importlib-metadata>=3.1.1` installed. For more information read the
discussion in :issue:`60`.


Runtime Errors
==============

These are some of the errors you might encounter while trying to run
:code:`egon-data`.

``ERROR: Couldn't connect to Docker daemon ...``
------------------------------------------------

To verify, please execute :code:`docker-compose -f <(echo {"service":
{"image": "hellow-world"}}) ps` and you should see something like


.. code-block:: none
ERROR: Couldn't connect to Docker daemon at http+docker://localunixsocket - is it running?
If it's at a non-standard location, specify the URL with the DOCKER_HOST environment
variable.
This can have at least two possible reasons. First, the docker daemon
might not be running. On Linux Systems, you can check for this by
running :code:`ps -e | grep dockerd`. If this generates no output, you
have to start the docker daemon, which you can do via :code:`sudo
systemctl start docker.service` on recent Ubuntu systems.

Second, your current user might not be a member of the `docker` group. On
Linux, you can check this by running :code:`groups $(whoami)`. If the
output does not contain the word `docker`, you have to add your current
user to the `docker` group. You can find more information on how to do
this in the `docker documentation`_. Read the :issue:`initial discussion
<33>` for more context.

.. _docker documentation: https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user


``[ERROR] Connection in use ...``
---------------------------------

This error might arise when running :code:`egon-data serve` making it
shut down early with :code:`ERROR - Shutting down webserver`. The reason
for this is that the local webserver from a previous :code:`egon-data
serve` run didn't shut down properly and is still running. This can be
fixed by running :code:`ps -eo pid,command | grep "gunicorn: master" |
grep -v grep` which should lead to output like :code:`NUMBER gunicorn:
master [airflow-webserver]` where :code:`NUMBER` is a varying number.
Once you got this, run :code:`kill -s INT NUMBER`, substituting
:code:`NUMBER` with what you got previously. After this,
:code:`egon-data serve` should run without errors again.


Other import or incompatible package version errors
===================================================

If you get an :py:class:`ImportError` when trying to run ``egon-data``,
or the installation complains with something like

.. code-block:: none
first-package a.b.c requires second-package>=q.r.r, but you'll have
second-package x.y.z which is incompatible.
you might have run into a problem of earlier ``pip`` versions. Either
upgrade to a ``pip`` version >=20.3 and reinstall ``egon.data``, or
reinstall the package via ``pip install -U --use-feature=2020-resolver``.
The ``-U`` flag is important to actually force a reinstall. For more
information read the discussions in issues :issue:`#36 <36>` and
:issue:`#37 <37>`.
42 changes: 42 additions & 0 deletions docs/workflow.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,45 @@
********
Workflow
********

Project background
-----------------

egon-data provides a transparent and reproducible open data based data processing pipeline for generating data models suitable for energy system modeling. The data is customized for the requirements of the research project eGo_n. The research project aims to develop tools for an open and cross-sectoral planning of transmission and distribution grids. For further information please visit the `eGo_n project website <https://ego-n.org/>`_.
egon-data is a further development of the `Data processing <https://github.com/openego/data_processing>`_ developed in the former research project `open_eGo <https://openegoproject.wordpress.com/>`_. It aims for an extensions of the data models as well as for a better replicability and manageability of the data preparation and processing.
The resulting data set serves as an input for the optimization tools `eTraGo <https://github.com/openego/eTraGo>`_, `ding0 <https://github.com/openego/ding0>`_ and `eDisGo <https://github.com/openego/eDisGo>`_ and delivers for example data on grid topologies, demands/demand curves and generation capacities in a high spatial resolution. The outputs of egon-data are published under open source and open data licenses.

Data
----

egon-data retrieves and processes data from several different external input sources which are all freely available and published under an open data license. The process handles data with different data types, such as spatial data with a high geographical resolution or load/generation time series with an hourly resolution.

Execution
---------

In principle egon-data is not limited to the use of a specific programming language as the workflow integrates different scripts using Apache Airflow, but Python and SQL are widely used within the process. Apache Airflow organizes the order of execution of processing steps through so-called operators. In the default case the SQL processing is executed on a containerized local PostgreSQL database using Docker. For further information on Docker and its installation please refer to their `documentation <https://docs.docker.com/>`_. Connection information of our local Docker database are defined in the corresponding `docker-compose.yml <https://github.com/openego/eGon-data/blob/dev/src/egon/data/airflow/docker-compose.yml>`_

The egon-data workflow is composed of four different sections: database setup, data import, data processing and data export to the OpenEnergy Platform. Each section consists of different tasks, which are managed by Apache Airflow and correspond with the local database.
Only final datasets which function as an input for the optimization tools or selected interim results are uploaded to the `Open Energy Platform <https://openenergy-platform.org/>`_.
The data processing in egon-data needs to be performed locally as calculations on the Open Energy Platform are prohibited.
More information on how to run the workflow can be found in the `getting started section <https://egon-data.readthedocs.io/en/latest/getting_started.html#run-the-workflow>`_ of our documentation.

.. _DP_workflow_sketch:
.. figure:: images/DP_Workflow_15012021.svg


Versioning
----------

.. warning::
Please note, the following is not implemented yet, but we are working on it.

Source code and data are versioned independendly from each other. Every data table uploaded to the Open Energy Platform contains a column 'version' which is used to identify different versions of the same data set. The version number is maintained for every table separately. This is a major difference to the versioning concept applied in the former data processing where all (interim) results were versioned under the same version number.








Loading

0 comments on commit 727d149

Please sign in to comment.