Skip to content

Commit

Permalink
Merge branch 'dev' into features/#259-dsm-cts
Browse files Browse the repository at this point in the history
  • Loading branch information
KathiEsterl committed Oct 4, 2021
2 parents b7677bb + eb377ae commit 65278fd
Show file tree
Hide file tree
Showing 23 changed files with 897 additions and 399 deletions.
20 changes: 12 additions & 8 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,10 @@ Added
* Extend zensus by a combined table with all cells where
there's either building, apartment or population data
`#359 <https://github.com/openego/eGon-data/issues/359>`_
* Add example metadata for OSM, VG250 and Zensus VG250.
Add metadata templates for licences, context and some helper
functions. Extend docs on how to create metadata for tables.
`#139 <https://github.com/openego/eGon-data/issues/139>`_
* Integrate DSM potentials for CTS and industry
`#259 <https://github.com/openego/eGon-data/issues/259>`_

Expand Down Expand Up @@ -206,18 +210,17 @@ Changed
`#397 <https://github.com/openego/eGon-data/issues/397>`_
* Rename columns gid to id
`#169 <https://github.com/openego/eGon-data/issues/169>`_
* Limit number of parallel proccesses per task
`#265 <https://github.com/openego/eGon-data/issues/265>`_
* Remove upper version limit of pandas
`#383 <https://github.com/openego/eGon-data/issues/383>`_
* Rename columns gid to id
`#169 <https://github.com/openego/eGon-data/issues/169>`_
* Use random seed from CLI parameters for CHP and society prognosis functions
`#351 <https://github.com/openego/eGon-data/issues/351>`_
* Changed demand.egon_schmidt_industrial_sites - table and merged table (industrial_sites)
`#423 <https://github.com/openego/eGon-data/issues/423>`_
* Use random seed from CLI parameters for CHP and society prognosis functions
`#351 <https://github.com/openego/eGon-data/issues/351>`_
* Adjust file path for industrial sites import
`#397 <https://github.com/openego/eGon-data/issues/418>`_
* Rename columns subst_id to bus_id
`#335 <https://github.com/openego/eGon-data/issues/335>`_


Bug fixes
---------
Expand Down Expand Up @@ -270,11 +273,12 @@ Bug fixes
`#398 <https://github.com/openego/eGon-data/issues/398>`_
* Add missing dependency in pipeline.py
`#412 <https://github.com/openego/eGon-data/issues/412>`_
* Replace NAN by 0 to avoid empty p_set column in DB
`#414 <https://github.com/openego/eGon-data/issues/414>`_
* Add prefix egon to MV grid district tables
`#349 <https://github.com/openego/eGon-data/issues/349>`_
* Bump MV grid district version no
`#432 <https://github.com/openego/eGon-data/issues/432>`_
* Add curl to prerequisites in the docs
`#440 <https://github.com/openego/eGon-data/issues/440>`_
* Replace NAN by 0 to avoid empty p_set column in DB
`#414 <https://github.com/openego/eGon-data/issues/414>`_

101 changes: 100 additions & 1 deletion CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,105 @@ be saved locally, please use `CWD` to store the data. This is achieved by using
filepath = Path(".") / "filename.csv"
urlretrieve("https://url/to/file", filepath)
Add metadata
------------

Add a metadata for every dataset you create for describing data with
machine-readable information. Adhere to the OEP Metadata v1.4.1, you can
follow
`the example <https://github.com/OpenEnergyPlatform/oemetadata/blob/develop/metadata/latest/example.json>`_
to understand how the fields are used. Field are described in detail in the
`Open Energy Metadata Description`_.

You can obtain the metadata string from a table you created in SQL via

.. code-block:: sql
SELECT obj_description('<SCHEMA>.<TABLE>'::regclass);
Alternatively, you can write the table comment directly to a JSON file by

.. code-block:: bash
psql -h <HOST> -p <PORT> -d <DB> -U <USER> -c "\COPY (SELECT obj_description('<SCHEMA>.<TABLE>'::regclass)) TO '/PATH/TO/FILE.json';"
For bulk export of all DB's table comments you can use `this script
<https://gist.github.com/nesnoj/86145999eca8182f43c2bca36bcc984f>`_.
Please verify that your metadata string is in compliance with the OEP Metadata
standard version 1.4.1 using the `OMI tool
<https://github.com/OpenEnergyPlatform/omi>`_ (tool is shipped with eGon-data):

.. code-block:: bash
omi translate -f oep-v1.4 metadata_file.json
If your metadata string is correct, OMI puts the keys in the correct order and
prints the full string (use `-o` option for export).

You may omit the fields `id` and `publicationDate` in your string as it will be
automatically set at the end of the pipeline but you're required to set them to
some value for a complete validation with OMI. For datasets published on the
OEP `id` will be the URL which points to the table, it will follow the pattern
`https://openenergy-platform.org/dataedit/view/SCHEMA/TABLE`.

For previous discussions on metadata, you may want to check
`PR 176 <https://github.com/openego/eGon-data/pull/176>`_.

Helpers
^^^^^^^

There are some **licence templates** provided in :py:mod:`egon.data.metadata`
you can make use of for fields 11.4 and 12 of the
`Open Energy Metadata Description`_. Also, there's a template for the
**metaMetadata** (field 16).

There are some functions to quickly generate a template for the
**resource fields** (field 14.6.1 in `Open Energy Metadata Description`_) from
a SQLA table class or a DB table. This might be especially helpful if your
table has plenty of columns.

* From SQLA table class:
:py:func:`egon.data.metadata.generate_resource_fields_from_sqla_model`
* From database table:
:py:func:`egon.data.metadata.generate_resource_fields_from_db_table`

Sources
^^^^^^^

The **sources** (field 11) are the most important parts of the metadata which
need to be filled manually. You may also add references to tables in eGon-data
(e.g. from an upstream task) so you don't have to list all original sources
again. Make sure you include all upstream attribution requirements.

The following example uses various input datasets whose attribution must be
retained:

.. code-block:: python
"sources": [
{
"title": "eGo^n - Medium voltage grid districts",
"description": (
"Medium-voltage grid districts describe the area supplied by "
"one MV grid. Medium-voltage grid districts are defined by one "
"polygon that represents the supply area. Each MV grid district "
"is connected to the HV grid via a single substation."
),
"path": "https://openenergy-platform.org/dataedit/view/"
"grid/egon_mv_grid_district", # "id" in the source dataset
"licenses": [
license_odbl(attribution=
"© OpenStreetMap contributors, 2021; "
"© Statistische Ämter des Bundes und der Länder, 2014; "
"© Statistisches Bundesamt, Wiesbaden 2015; "
"(Daten verändert)"
)
]
},
# more sources...
]
.. _Open Energy Metadata Description: https://github.com/OpenEnergyPlatform/oemetadata/blob/develop/metadata/v141/metadata_key_description.md

Adjusting test mode data
------------------------
Expand All @@ -301,7 +400,7 @@ How to document Python scripts

Use docstrings to document your Python code. Note that PEP 8 also
contains a `section <PEP8-docstrings_>`_ on docstrings and that there is
a whole `PEP <PEP257_>`_ dedicated to docstring convetions. Try to
a whole `PEP <PEP257_>`_ dedicated to docstring conventions. Try to
adhere to both of them.
Additionally every Python script needs to contain a header describing
the general functionality and objective and including information on
Expand Down
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ def read(*names, **kwargs):
"xarray",
"xlrd",
"rioxarray",
"omi"
],
extras_require={
"dev": ["black", "flake8", "isort>=5", "pre-commit", "pytest", "tox"]
Expand Down
4 changes: 2 additions & 2 deletions src/egon/data/datasets/electricity_demand/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ class HouseholdElectricityDemand(Dataset):
def __init__(self, dependencies):
super().__init__(
name="HouseholdElectricityDemand",
version="0.0.1",
version="0.0.2",
dependencies=dependencies,
tasks=(create_tables,
distribute_household_demands)
Expand All @@ -26,7 +26,7 @@ class CtsElectricityDemand(Dataset):
def __init__(self, dependencies):
super().__init__(
name="CtsElectricityDemand",
version="0.0.1",
version="0.0.2",
dependencies=dependencies,
tasks=(distribute_cts_demands,
insert_cts_load)
Expand Down
10 changes: 5 additions & 5 deletions src/egon/data/datasets/electricity_demand/temporal.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ class EgonEtragoElectricityCts(Base):
__tablename__ = "egon_etrago_electricity_cts"
__table_args__ = {"schema": "demand"}

subst_id = Column(Integer, primary_key=True)
bus_id = Column(Integer, primary_key=True)
scn_name = Column(String, primary_key=True)
p_set = Column(ARRAY(Float))
q_set = Column(ARRAY(Float))
Expand Down Expand Up @@ -113,7 +113,7 @@ def calc_load_curves_cts(scenario):
Returns
-------
pandas.DataFrame
Demand timeseries of cts per substation id
Demand timeseries of cts per bus id
"""

Expand All @@ -138,7 +138,7 @@ def calc_load_curves_cts(scenario):
demands_zensus = db.select_dataframe(
f"""SELECT a.zensus_population_id, a.demand,
b.vg250_nuts3 as nuts3,
c.subst_id
c.bus_id
FROM {sources['zensus_electricity']['schema']}.
{sources['zensus_electricity']['table']} a
INNER JOIN
Expand Down Expand Up @@ -168,10 +168,10 @@ def calc_load_curves_cts(scenario):

# Calculate shares of cts branches per hvmv substation
share_subst = demands_zensus.drop(
'demand', axis=1).groupby('subst_id').mean()
'demand', axis=1).groupby('bus_id').mean()

# Calculate cts annual demand per hvmv substation
annual_demand_subst = demands_zensus.groupby('subst_id').demand.sum()
annual_demand_subst = demands_zensus.groupby('bus_id').demand.sum()

# Return electrical load curves per hvmv substation
return calc_load_curve(share_subst, annual_demand_subst)
Expand Down
6 changes: 3 additions & 3 deletions src/egon/data/datasets/electricity_demand_etrago.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ def demands_per_bus(scenario):

# Select data on CTS electricity demands per bus
cts_curves = db.select_dataframe(
f"""SELECT subst_id, p_set FROM
f"""SELECT bus_id, p_set FROM
{sources['cts_curves']['schema']}.
{sources['cts_curves']['table']}
WHERE scn_name = '{scenario}'""",
index_col="subst_id",
index_col="bus_id",
)

# Rename index
Expand Down Expand Up @@ -187,7 +187,7 @@ class ElectricalLoadEtrago(Dataset):
def __init__(self, dependencies):
super().__init__(
name="Electrical_load_etrago",
version="0.0.1",
version="0.0.2",
dependencies=dependencies,
tasks=(export_to_db,),
)
2 changes: 1 addition & 1 deletion src/egon/data/datasets/heat_etrago/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ class HeatEtrago(Dataset):
def __init__(self, dependencies):
super().__init__(
name="HeatEtrago",
version="0.0.2",
version="0.0.3",
dependencies=dependencies,
tasks=(buses, supply),
)
4 changes: 2 additions & 2 deletions src/egon/data/datasets/heat_etrago/power_to_heat.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,7 +301,7 @@ def assign_electrical_bus(heat_pumps, multiple_per_mv_grid=False):
# Select mv grid distrcits
mv_grid_district = db.select_geodataframe(
f"""
SELECT subst_id, geom FROM
SELECT bus_id, geom FROM
{sources['egon_mv_grid_district']['schema']}.
{sources['egon_mv_grid_district']['table']}
"""
Expand Down Expand Up @@ -339,7 +339,7 @@ def assign_electrical_bus(heat_pumps, multiple_per_mv_grid=False):
# Assign power bus per zensus cell
cells["power_bus"] = gpd.sjoin(
cells, mv_grid_district, how="inner", op="intersects"
).subst_id
).bus_id

# Calclate district heating demand per substaion
demand_per_substation = pd.DataFrame(
Expand Down
2 changes: 1 addition & 1 deletion src/egon/data/datasets/heat_supply/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ class HeatSupply(Dataset):
def __init__(self, dependencies):
super().__init__(
name="HeatSupply",
version="0.0.1",
version="0.0.2",
dependencies=dependencies,
tasks=(create_tables,
district_heating, individual_heating, potential_germany),
Expand Down
10 changes: 5 additions & 5 deletions src/egon/data/datasets/heat_supply/individual_heating.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def cascade_heat_supply_indiv(scenario, distribution_level, plotting=True):
# Select residential heat demand per mv grid district and federal state
heat_per_mv = db.select_geodataframe(
f"""
SELECT d.subst_id as bus_id, SUM(demand) as demand,
SELECT d.bus_id as bus_id, SUM(demand) as demand,
c.vg250_lan as state, d.geom
FROM {sources['heat_demand']['schema']}.
{sources['heat_demand']['table']} a
Expand All @@ -133,17 +133,17 @@ def cascade_heat_supply_indiv(scenario, distribution_level, plotting=True):
ON a.zensus_population_id = b.zensus_population_id
JOIN {sources['map_vg250_grid']['schema']}.
{sources['map_vg250_grid']['table']} c
ON b.subst_id = c.bus_id
ON b.bus_id = c.bus_id
JOIN {sources['mv_grids']['schema']}.
{sources['mv_grids']['table']} d
ON d.subst_id = c.bus_id
ON d.bus_id = c.bus_id
WHERE scenario = '{scenario}'
AND sector = 'residential'
AND a.zensus_population_id NOT IN (
SELECT zensus_population_id
FROM {sources['map_dh']['schema']}.{sources['map_dh']['table']}
WHERE scenario = '{scenario}')
GROUP BY d.subst_id, vg250_lan, geom
GROUP BY d.bus_id, vg250_lan, geom
""",
index_col = 'bus_id')

Expand Down Expand Up @@ -191,7 +191,7 @@ def plot_heat_supply(resulting_capacities):
mv_grids = db.select_geodataframe(
"""
SELECT * FROM grid.egon_mv_grid_district
""", index_col='subst_id')
""", index_col='bus_id')

for c in ['CHP', 'heat_pump']:
mv_grids[c] = resulting_capacities[
Expand Down
12 changes: 6 additions & 6 deletions src/egon/data/datasets/hh_demand_profiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,7 @@ class EgonEtragoElectricityHouseholds(Base):
__table_args__ = {"schema": "demand"}

version = Column(String, primary_key=True)
subst_id = Column(Integer, primary_key=True)
bus_id = Column(Integer, primary_key=True)
scn_name = Column(String, primary_key=True)
p_set = Column(ARRAY(Float))
q_set = Column(ARRAY(Float))
Expand All @@ -235,7 +235,7 @@ class EgonEtragoElectricityHouseholds(Base):
hh_demand_setup = partial(
Dataset,
name="HH Demand",
version="0.0.1",
version="0.0.2",
dependencies=[],
# Tasks are declared in pipeline as function is used multiple times with different args
# To differentiate these tasks PythonOperator with specific id-names are used
Expand Down Expand Up @@ -1451,15 +1451,15 @@ def mv_grid_district_HH_electricity_load(
Returns
-------
pd.DataFrame
Multiindexed dataframe with `timestep` and `subst_id` as indexers.
Multiindexed dataframe with `timestep` and `bus_id` as indexers.
Demand is given in kWh.
"""
engine = db.engine()

with db.session_scope() as session:
cells_query = session.query(
HouseholdElectricityProfilesInCensusCells,
MapZensusGridDistricts.subst_id,
MapZensusGridDistricts.bus_id,
).join(
MapZensusGridDistricts,
HouseholdElectricityProfilesInCensusCells.cell_id
Expand All @@ -1481,7 +1481,7 @@ def mv_grid_district_HH_electricity_load(

# Create aggregated load profile for each MV grid district
mvgd_profiles_dict = {}
for grid_district, data in cells.groupby("subst_id"):
for grid_district, data in cells.groupby("bus_id"):
mvgd_profile = get_load_timeseries(
df_profiles=df_profiles,
df_cell_demand_metadata=data,
Expand All @@ -1494,7 +1494,7 @@ def mv_grid_district_HH_electricity_load(

# Reshape data: put MV grid ids in columns to a single index column
mvgd_profiles = mvgd_profiles.reset_index()
mvgd_profiles.columns = ["subst_id", "p_set"]
mvgd_profiles.columns = ["bus_id", "p_set"]

# Add remaining columns
mvgd_profiles["version"] = version
Expand Down
Loading

0 comments on commit 65278fd

Please sign in to comment.