Merge branch 'dev' into features/#259-dsm-cts

openego · Oct 4, 2021 · 65278fd · 65278fd
2 parents b7677bb + eb377ae
commit 65278fd
Show file tree

Hide file tree

Showing 23 changed files with 897 additions and 399 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -137,6 +137,10 @@ Added
 * Extend zensus by a combined table with all cells where
   there's either building, apartment or population data
   `#359 <https://github.com/openego/eGon-data/issues/359>`_
+* Add example metadata for OSM, VG250 and Zensus VG250.
+  Add metadata templates for licences, context and some helper
+  functions. Extend docs on how to create metadata for tables.
+  `#139 <https://github.com/openego/eGon-data/issues/139>`_
 * Integrate DSM potentials for CTS and industry
   `#259 <https://github.com/openego/eGon-data/issues/259>`_
 
@@ -206,18 +210,17 @@ Changed
   `#397 <https://github.com/openego/eGon-data/issues/397>`_
 * Rename columns gid to id
   `#169 <https://github.com/openego/eGon-data/issues/169>`_
-* Limit number of parallel proccesses per task
-  `#265 <https://github.com/openego/eGon-data/issues/265>`_
 * Remove upper version limit of pandas
   `#383 <https://github.com/openego/eGon-data/issues/383>`_
-* Rename columns gid to id
-  `#169 <https://github.com/openego/eGon-data/issues/169>`_
+* Use random seed from CLI parameters for CHP and society prognosis functions
+  `#351 <https://github.com/openego/eGon-data/issues/351>`_
 * Changed demand.egon_schmidt_industrial_sites - table and merged table (industrial_sites)
   `#423 <https://github.com/openego/eGon-data/issues/423>`_
-  * Use random seed from CLI parameters for CHP and society prognosis functions
-  `#351 <https://github.com/openego/eGon-data/issues/351>`_
 * Adjust file path for industrial sites import
   `#397 <https://github.com/openego/eGon-data/issues/418>`_
+* Rename columns subst_id to bus_id
+  `#335 <https://github.com/openego/eGon-data/issues/335>`_
+
 
 Bug fixes
 ---------
@@ -270,11 +273,12 @@ Bug fixes
   `#398 <https://github.com/openego/eGon-data/issues/398>`_
 * Add missing dependency in pipeline.py
   `#412 <https://github.com/openego/eGon-data/issues/412>`_
-* Replace NAN by 0 to avoid empty p_set column in DB
-  `#414 <https://github.com/openego/eGon-data/issues/414>`_
 * Add prefix egon to MV grid district tables
   `#349 <https://github.com/openego/eGon-data/issues/349>`_
 * Bump MV grid district version no
   `#432 <https://github.com/openego/eGon-data/issues/432>`_
 * Add curl to prerequisites in the docs
   `#440 <https://github.com/openego/eGon-data/issues/440>`_
+* Replace NAN by 0 to avoid empty p_set column in DB
+  `#414 <https://github.com/openego/eGon-data/issues/414>`_
+
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -275,6 +275,105 @@ be saved locally, please use `CWD` to store the data. This is achieved by using
   filepath = Path(".") / "filename.csv"
   urlretrieve("https://url/to/file", filepath)
 
+Add metadata
+------------
+
+Add a metadata for every dataset you create for describing data with
+machine-readable information. Adhere to the OEP Metadata v1.4.1, you can
+follow
+`the example <https://github.com/OpenEnergyPlatform/oemetadata/blob/develop/metadata/latest/example.json>`_
+to understand how the fields are used. Field are described in detail in the
+`Open Energy Metadata Description`_.
+
+You can obtain the metadata string from a table you created in SQL via
+
+.. code-block:: sql
+
+  SELECT obj_description('<SCHEMA>.<TABLE>'::regclass);
+
+Alternatively, you can write the table comment directly to a JSON file by
+
+.. code-block:: bash
+
+  psql -h <HOST> -p <PORT> -d <DB> -U <USER> -c "\COPY (SELECT obj_description('<SCHEMA>.<TABLE>'::regclass)) TO '/PATH/TO/FILE.json';"
+
+For bulk export of all DB's table comments you can use `this script
+<https://gist.github.com/nesnoj/86145999eca8182f43c2bca36bcc984f>`_.
+Please verify that your metadata string is in compliance with the OEP Metadata
+standard version 1.4.1 using the `OMI tool
+<https://github.com/OpenEnergyPlatform/omi>`_ (tool is shipped with eGon-data):
+
+.. code-block:: bash
+
+  omi translate -f oep-v1.4 metadata_file.json
+
+If your metadata string is correct, OMI puts the keys in the correct order and
+prints the full string (use `-o` option for export).
+
+You may omit the fields `id` and `publicationDate` in your string as it will be
+automatically set at the end of the pipeline but you're required to set them to
+some value for a complete validation with OMI. For datasets published on the
+OEP `id` will be the URL which points to the table, it will follow the pattern
+`https://openenergy-platform.org/dataedit/view/SCHEMA/TABLE`.
+
+For previous discussions on metadata, you may want to check
+`PR 176 <https://github.com/openego/eGon-data/pull/176>`_.
+
+Helpers
+^^^^^^^
+
+There are some **licence templates** provided in :py:mod:`egon.data.metadata`
+you can make use of for fields 11.4 and 12 of the
+`Open Energy Metadata Description`_. Also, there's a template for the
+**metaMetadata** (field 16).
+
+There are some functions to quickly generate a template for the
+**resource fields** (field 14.6.1 in `Open Energy Metadata Description`_) from
+a SQLA table class or a DB table. This might be especially helpful if your
+table has plenty of columns.
+
+* From SQLA table class:
+  :py:func:`egon.data.metadata.generate_resource_fields_from_sqla_model`
+* From database table:
+  :py:func:`egon.data.metadata.generate_resource_fields_from_db_table`
+
+Sources
+^^^^^^^
+
+The **sources** (field 11) are the most important parts of the metadata which
+need to be filled manually. You may also add references to tables in eGon-data
+(e.g. from an upstream task) so you don't have to list all original sources
+again. Make sure you include all upstream attribution requirements.
+
+The following example uses various input datasets whose attribution must be
+retained:
+
+.. code-block:: python
+
+  "sources": [
+      {
+          "title": "eGo^n - Medium voltage grid districts",
+          "description": (
+              "Medium-voltage grid districts describe the area supplied by "
+              "one MV grid. Medium-voltage grid districts are defined by one "
+              "polygon that represents the supply area. Each MV grid district "
+              "is connected to the HV grid via a single substation."
+          ),
+          "path": "https://openenergy-platform.org/dataedit/view/"
+                  "grid/egon_mv_grid_district", # "id" in the source dataset
+          "licenses": [
+              license_odbl(attribution=
+                  "© OpenStreetMap contributors, 2021; "
+                  "© Statistische Ämter des Bundes und der Länder, 2014; "
+                  "© Statistisches Bundesamt, Wiesbaden 2015; "
+                  "(Daten verändert)"
+              )
+          ]
+      },
+      # more sources...
+  ]
+
+.. _Open Energy Metadata Description: https://github.com/OpenEnergyPlatform/oemetadata/blob/develop/metadata/v141/metadata_key_description.md
 
 Adjusting test mode data
 ------------------------
@@ -301,7 +400,7 @@ How to document Python scripts
 
 Use docstrings to document your Python code. Note that PEP 8 also
 contains a `section <PEP8-docstrings_>`_ on docstrings and that there is
-a whole `PEP <PEP257_>`_ dedicated to docstring convetions. Try to
+a whole `PEP <PEP257_>`_ dedicated to docstring conventions. Try to
 adhere to both of them.
 Additionally every Python script needs to contain a header describing
 the general functionality and objective and including information on

diff --git a/setup.py b/setup.py
@@ -102,6 +102,7 @@ def read(*names, **kwargs):
         "xarray",
         "xlrd",
         "rioxarray",
+        "omi"
     ],
     extras_require={
         "dev": ["black", "flake8", "isort>=5", "pre-commit", "pytest", "tox"]

diff --git a/src/egon/data/datasets/electricity_demand/__init__.py b/src/egon/data/datasets/electricity_demand/__init__.py
@@ -16,7 +16,7 @@ class HouseholdElectricityDemand(Dataset):
     def __init__(self, dependencies):
         super().__init__(
             name="HouseholdElectricityDemand",
-            version="0.0.1",
+            version="0.0.2",
             dependencies=dependencies,
             tasks=(create_tables,
                    distribute_household_demands)
@@ -26,7 +26,7 @@ class CtsElectricityDemand(Dataset):
     def __init__(self, dependencies):
         super().__init__(
             name="CtsElectricityDemand",
-            version="0.0.1",
+            version="0.0.2",
             dependencies=dependencies,
             tasks=(distribute_cts_demands,
                    insert_cts_load)

diff --git a/src/egon/data/datasets/electricity_demand/temporal.py b/src/egon/data/datasets/electricity_demand/temporal.py
@@ -16,7 +16,7 @@ class EgonEtragoElectricityCts(Base):
     __tablename__ = "egon_etrago_electricity_cts"
     __table_args__ = {"schema": "demand"}
 
-    subst_id = Column(Integer, primary_key=True)
+    bus_id = Column(Integer, primary_key=True)
     scn_name = Column(String, primary_key=True)
     p_set = Column(ARRAY(Float))
     q_set = Column(ARRAY(Float))
@@ -113,7 +113,7 @@ def calc_load_curves_cts(scenario):
     Returns
     -------
     pandas.DataFrame
-        Demand timeseries of cts per substation id
+        Demand timeseries of cts per bus id
 
     """
 
@@ -138,7 +138,7 @@ def calc_load_curves_cts(scenario):
     demands_zensus = db.select_dataframe(
             f"""SELECT a.zensus_population_id, a.demand,
             b.vg250_nuts3 as nuts3,
-            c.subst_id
+            c.bus_id
             FROM {sources['zensus_electricity']['schema']}.
             {sources['zensus_electricity']['table']} a
             INNER JOIN
@@ -168,10 +168,10 @@ def calc_load_curves_cts(scenario):
 
     # Calculate shares of cts branches per hvmv substation
     share_subst = demands_zensus.drop(
-        'demand', axis=1).groupby('subst_id').mean()
+        'demand', axis=1).groupby('bus_id').mean()
 
     # Calculate cts annual demand per hvmv substation
-    annual_demand_subst = demands_zensus.groupby('subst_id').demand.sum()
+    annual_demand_subst = demands_zensus.groupby('bus_id').demand.sum()
 
     # Return electrical load curves per hvmv substation
     return calc_load_curve(share_subst, annual_demand_subst)

diff --git a/src/egon/data/datasets/electricity_demand_etrago.py b/src/egon/data/datasets/electricity_demand_etrago.py
@@ -29,11 +29,11 @@ def demands_per_bus(scenario):
 
     # Select data on CTS electricity demands per bus
     cts_curves = db.select_dataframe(
-        f"""SELECT subst_id, p_set FROM
+        f"""SELECT bus_id, p_set FROM
                 {sources['cts_curves']['schema']}.
                 {sources['cts_curves']['table']}
                 WHERE scn_name = '{scenario}'""",
-        index_col="subst_id",
+        index_col="bus_id",
     )
 
     # Rename index
@@ -187,7 +187,7 @@ class ElectricalLoadEtrago(Dataset):
     def __init__(self, dependencies):
         super().__init__(
             name="Electrical_load_etrago",
-            version="0.0.1",
+            version="0.0.2",
             dependencies=dependencies,
             tasks=(export_to_db,),
         )
diff --git a/src/egon/data/datasets/heat_etrago/__init__.py b/src/egon/data/datasets/heat_etrago/__init__.py
@@ -228,7 +228,7 @@ class HeatEtrago(Dataset):
     def __init__(self, dependencies):
         super().__init__(
             name="HeatEtrago",
-            version="0.0.2",
+            version="0.0.3",
             dependencies=dependencies,
             tasks=(buses, supply),
         )
diff --git a/src/egon/data/datasets/heat_etrago/power_to_heat.py b/src/egon/data/datasets/heat_etrago/power_to_heat.py
@@ -301,7 +301,7 @@ def assign_electrical_bus(heat_pumps, multiple_per_mv_grid=False):
     # Select mv grid distrcits
     mv_grid_district = db.select_geodataframe(
         f"""
-        SELECT subst_id, geom FROM
+        SELECT bus_id, geom FROM
         {sources['egon_mv_grid_district']['schema']}.
         {sources['egon_mv_grid_district']['table']}
         """
@@ -339,7 +339,7 @@ def assign_electrical_bus(heat_pumps, multiple_per_mv_grid=False):
     # Assign power bus per zensus cell
     cells["power_bus"] = gpd.sjoin(
         cells, mv_grid_district, how="inner", op="intersects"
-    ).subst_id
+    ).bus_id
 
     # Calclate district heating demand per substaion
     demand_per_substation = pd.DataFrame(

diff --git a/src/egon/data/datasets/heat_supply/__init__.py b/src/egon/data/datasets/heat_supply/__init__.py
@@ -134,7 +134,7 @@ class HeatSupply(Dataset):
     def __init__(self, dependencies):
         super().__init__(
             name="HeatSupply",
-            version="0.0.1",
+            version="0.0.2",
             dependencies=dependencies,
             tasks=(create_tables,
                 district_heating, individual_heating, potential_germany),

diff --git a/src/egon/data/datasets/heat_supply/individual_heating.py b/src/egon/data/datasets/heat_supply/individual_heating.py
@@ -124,7 +124,7 @@ def cascade_heat_supply_indiv(scenario, distribution_level, plotting=True):
     # Select residential heat demand per mv grid district and federal state
     heat_per_mv = db.select_geodataframe(
         f"""
-        SELECT d.subst_id as bus_id, SUM(demand) as demand,
+        SELECT d.bus_id as bus_id, SUM(demand) as demand,
         c.vg250_lan as state, d.geom
         FROM {sources['heat_demand']['schema']}.
         {sources['heat_demand']['table']} a
@@ -133,17 +133,17 @@ def cascade_heat_supply_indiv(scenario, distribution_level, plotting=True):
         ON a.zensus_population_id = b.zensus_population_id
         JOIN {sources['map_vg250_grid']['schema']}.
         {sources['map_vg250_grid']['table']} c
-        ON b.subst_id = c.bus_id
+        ON b.bus_id = c.bus_id
         JOIN {sources['mv_grids']['schema']}.
         {sources['mv_grids']['table']} d
-        ON d.subst_id = c.bus_id
+        ON d.bus_id = c.bus_id
         WHERE scenario = '{scenario}'
         AND sector = 'residential'
         AND a.zensus_population_id NOT IN (
             SELECT zensus_population_id
             FROM {sources['map_dh']['schema']}.{sources['map_dh']['table']}
             WHERE scenario = '{scenario}')
-        GROUP BY d.subst_id, vg250_lan, geom
+        GROUP BY d.bus_id, vg250_lan, geom
         """,
         index_col = 'bus_id')
 
@@ -191,7 +191,7 @@ def plot_heat_supply(resulting_capacities):
     mv_grids = db.select_geodataframe(
         """
         SELECT * FROM grid.egon_mv_grid_district
-        """, index_col='subst_id')
+        """, index_col='bus_id')
 
     for c in ['CHP', 'heat_pump']:
         mv_grids[c] = resulting_capacities[

diff --git a/src/egon/data/datasets/hh_demand_profiles.py b/src/egon/data/datasets/hh_demand_profiles.py
@@ -226,7 +226,7 @@ class EgonEtragoElectricityHouseholds(Base):
     __table_args__ = {"schema": "demand"}
 
     version = Column(String, primary_key=True)
-    subst_id = Column(Integer, primary_key=True)
+    bus_id = Column(Integer, primary_key=True)
     scn_name = Column(String, primary_key=True)
     p_set = Column(ARRAY(Float))
     q_set = Column(ARRAY(Float))
@@ -235,7 +235,7 @@ class EgonEtragoElectricityHouseholds(Base):
 hh_demand_setup = partial(
     Dataset,
     name="HH Demand",
-    version="0.0.1",
+    version="0.0.2",
     dependencies=[],
     # Tasks are declared in pipeline as function is used multiple times with different args
     # To differentiate these tasks PythonOperator with specific id-names are used
@@ -1451,15 +1451,15 @@ def mv_grid_district_HH_electricity_load(
     Returns
     -------
     pd.DataFrame
-        Multiindexed dataframe with `timestep` and `subst_id` as indexers.
+        Multiindexed dataframe with `timestep` and `bus_id` as indexers.
         Demand is given in kWh.
     """
     engine = db.engine()
 
     with db.session_scope() as session:
         cells_query = session.query(
             HouseholdElectricityProfilesInCensusCells,
-            MapZensusGridDistricts.subst_id,
+            MapZensusGridDistricts.bus_id,
         ).join(
             MapZensusGridDistricts,
             HouseholdElectricityProfilesInCensusCells.cell_id
@@ -1481,7 +1481,7 @@ def mv_grid_district_HH_electricity_load(
 
     # Create aggregated load profile for each MV grid district
     mvgd_profiles_dict = {}
-    for grid_district, data in cells.groupby("subst_id"):
+    for grid_district, data in cells.groupby("bus_id"):
         mvgd_profile = get_load_timeseries(
             df_profiles=df_profiles,
             df_cell_demand_metadata=data,
@@ -1494,7 +1494,7 @@ def mv_grid_district_HH_electricity_load(
 
     # Reshape data: put MV grid ids in columns to a single index column
     mvgd_profiles = mvgd_profiles.reset_index()
-    mvgd_profiles.columns = ["subst_id", "p_set"]
+    mvgd_profiles.columns = ["bus_id", "p_set"]
 
     # Add remaining columns
     mvgd_profiles["version"] = version