diff --git a/CHANGELOG.rst b/CHANGELOG.rst index e2c772944..fee9eddd3 100644 --- a/CHANGELOG.rst +++ b/CHANGELOG.rst @@ -137,6 +137,10 @@ Added * Extend zensus by a combined table with all cells where there's either building, apartment or population data `#359 `_ +* Add example metadata for OSM, VG250 and Zensus VG250. + Add metadata templates for licences, context and some helper + functions. Extend docs on how to create metadata for tables. + `#139 `_ * Integrate DSM potentials for CTS and industry `#259 `_ @@ -206,18 +210,17 @@ Changed `#397 `_ * Rename columns gid to id `#169 `_ -* Limit number of parallel proccesses per task - `#265 `_ * Remove upper version limit of pandas `#383 `_ -* Rename columns gid to id - `#169 `_ +* Use random seed from CLI parameters for CHP and society prognosis functions + `#351 `_ * Changed demand.egon_schmidt_industrial_sites - table and merged table (industrial_sites) `#423 `_ - * Use random seed from CLI parameters for CHP and society prognosis functions - `#351 `_ * Adjust file path for industrial sites import `#397 `_ +* Rename columns subst_id to bus_id + `#335 `_ + Bug fixes --------- @@ -270,11 +273,12 @@ Bug fixes `#398 `_ * Add missing dependency in pipeline.py `#412 `_ -* Replace NAN by 0 to avoid empty p_set column in DB - `#414 `_ * Add prefix egon to MV grid district tables `#349 `_ * Bump MV grid district version no `#432 `_ * Add curl to prerequisites in the docs `#440 `_ +* Replace NAN by 0 to avoid empty p_set column in DB + `#414 `_ + diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst index 4d0dce337..a20a0f443 100644 --- a/CONTRIBUTING.rst +++ b/CONTRIBUTING.rst @@ -275,6 +275,105 @@ be saved locally, please use `CWD` to store the data. This is achieved by using filepath = Path(".") / "filename.csv" urlretrieve("https://url/to/file", filepath) +Add metadata +------------ + +Add a metadata for every dataset you create for describing data with +machine-readable information. Adhere to the OEP Metadata v1.4.1, you can +follow +`the example `_ +to understand how the fields are used. Field are described in detail in the +`Open Energy Metadata Description`_. + +You can obtain the metadata string from a table you created in SQL via + +.. code-block:: sql + + SELECT obj_description('.'::regclass); + +Alternatively, you can write the table comment directly to a JSON file by + +.. code-block:: bash + + psql -h -p -d -U -c "\COPY (SELECT obj_description('.
'::regclass)) TO '/PATH/TO/FILE.json';" + +For bulk export of all DB's table comments you can use `this script +`_. +Please verify that your metadata string is in compliance with the OEP Metadata +standard version 1.4.1 using the `OMI tool +`_ (tool is shipped with eGon-data): + +.. code-block:: bash + + omi translate -f oep-v1.4 metadata_file.json + +If your metadata string is correct, OMI puts the keys in the correct order and +prints the full string (use `-o` option for export). + +You may omit the fields `id` and `publicationDate` in your string as it will be +automatically set at the end of the pipeline but you're required to set them to +some value for a complete validation with OMI. For datasets published on the +OEP `id` will be the URL which points to the table, it will follow the pattern +`https://openenergy-platform.org/dataedit/view/SCHEMA/TABLE`. + +For previous discussions on metadata, you may want to check +`PR 176 `_. + +Helpers +^^^^^^^ + +There are some **licence templates** provided in :py:mod:`egon.data.metadata` +you can make use of for fields 11.4 and 12 of the +`Open Energy Metadata Description`_. Also, there's a template for the +**metaMetadata** (field 16). + +There are some functions to quickly generate a template for the +**resource fields** (field 14.6.1 in `Open Energy Metadata Description`_) from +a SQLA table class or a DB table. This might be especially helpful if your +table has plenty of columns. + +* From SQLA table class: + :py:func:`egon.data.metadata.generate_resource_fields_from_sqla_model` +* From database table: + :py:func:`egon.data.metadata.generate_resource_fields_from_db_table` + +Sources +^^^^^^^ + +The **sources** (field 11) are the most important parts of the metadata which +need to be filled manually. You may also add references to tables in eGon-data +(e.g. from an upstream task) so you don't have to list all original sources +again. Make sure you include all upstream attribution requirements. + +The following example uses various input datasets whose attribution must be +retained: + +.. code-block:: python + + "sources": [ + { + "title": "eGo^n - Medium voltage grid districts", + "description": ( + "Medium-voltage grid districts describe the area supplied by " + "one MV grid. Medium-voltage grid districts are defined by one " + "polygon that represents the supply area. Each MV grid district " + "is connected to the HV grid via a single substation." + ), + "path": "https://openenergy-platform.org/dataedit/view/" + "grid/egon_mv_grid_district", # "id" in the source dataset + "licenses": [ + license_odbl(attribution= + "© OpenStreetMap contributors, 2021; " + "© Statistische Ämter des Bundes und der Länder, 2014; " + "© Statistisches Bundesamt, Wiesbaden 2015; " + "(Daten verändert)" + ) + ] + }, + # more sources... + ] + +.. _Open Energy Metadata Description: https://github.com/OpenEnergyPlatform/oemetadata/blob/develop/metadata/v141/metadata_key_description.md Adjusting test mode data ------------------------ @@ -301,7 +400,7 @@ How to document Python scripts Use docstrings to document your Python code. Note that PEP 8 also contains a `section `_ on docstrings and that there is -a whole `PEP `_ dedicated to docstring convetions. Try to +a whole `PEP `_ dedicated to docstring conventions. Try to adhere to both of them. Additionally every Python script needs to contain a header describing the general functionality and objective and including information on diff --git a/setup.py b/setup.py index 50456e5e2..2a9a2a773 100755 --- a/setup.py +++ b/setup.py @@ -102,6 +102,7 @@ def read(*names, **kwargs): "xarray", "xlrd", "rioxarray", + "omi" ], extras_require={ "dev": ["black", "flake8", "isort>=5", "pre-commit", "pytest", "tox"] diff --git a/src/egon/data/datasets/electricity_demand/__init__.py b/src/egon/data/datasets/electricity_demand/__init__.py index 1c4fb5a8a..f0ab90558 100644 --- a/src/egon/data/datasets/electricity_demand/__init__.py +++ b/src/egon/data/datasets/electricity_demand/__init__.py @@ -16,7 +16,7 @@ class HouseholdElectricityDemand(Dataset): def __init__(self, dependencies): super().__init__( name="HouseholdElectricityDemand", - version="0.0.1", + version="0.0.2", dependencies=dependencies, tasks=(create_tables, distribute_household_demands) @@ -26,7 +26,7 @@ class CtsElectricityDemand(Dataset): def __init__(self, dependencies): super().__init__( name="CtsElectricityDemand", - version="0.0.1", + version="0.0.2", dependencies=dependencies, tasks=(distribute_cts_demands, insert_cts_load) diff --git a/src/egon/data/datasets/electricity_demand/temporal.py b/src/egon/data/datasets/electricity_demand/temporal.py index 29b421ad1..3ae8bc3b4 100644 --- a/src/egon/data/datasets/electricity_demand/temporal.py +++ b/src/egon/data/datasets/electricity_demand/temporal.py @@ -16,7 +16,7 @@ class EgonEtragoElectricityCts(Base): __tablename__ = "egon_etrago_electricity_cts" __table_args__ = {"schema": "demand"} - subst_id = Column(Integer, primary_key=True) + bus_id = Column(Integer, primary_key=True) scn_name = Column(String, primary_key=True) p_set = Column(ARRAY(Float)) q_set = Column(ARRAY(Float)) @@ -113,7 +113,7 @@ def calc_load_curves_cts(scenario): Returns ------- pandas.DataFrame - Demand timeseries of cts per substation id + Demand timeseries of cts per bus id """ @@ -138,7 +138,7 @@ def calc_load_curves_cts(scenario): demands_zensus = db.select_dataframe( f"""SELECT a.zensus_population_id, a.demand, b.vg250_nuts3 as nuts3, - c.subst_id + c.bus_id FROM {sources['zensus_electricity']['schema']}. {sources['zensus_electricity']['table']} a INNER JOIN @@ -168,10 +168,10 @@ def calc_load_curves_cts(scenario): # Calculate shares of cts branches per hvmv substation share_subst = demands_zensus.drop( - 'demand', axis=1).groupby('subst_id').mean() + 'demand', axis=1).groupby('bus_id').mean() # Calculate cts annual demand per hvmv substation - annual_demand_subst = demands_zensus.groupby('subst_id').demand.sum() + annual_demand_subst = demands_zensus.groupby('bus_id').demand.sum() # Return electrical load curves per hvmv substation return calc_load_curve(share_subst, annual_demand_subst) diff --git a/src/egon/data/datasets/electricity_demand_etrago.py b/src/egon/data/datasets/electricity_demand_etrago.py index eb3b694ef..cf3a30571 100644 --- a/src/egon/data/datasets/electricity_demand_etrago.py +++ b/src/egon/data/datasets/electricity_demand_etrago.py @@ -29,11 +29,11 @@ def demands_per_bus(scenario): # Select data on CTS electricity demands per bus cts_curves = db.select_dataframe( - f"""SELECT subst_id, p_set FROM + f"""SELECT bus_id, p_set FROM {sources['cts_curves']['schema']}. {sources['cts_curves']['table']} WHERE scn_name = '{scenario}'""", - index_col="subst_id", + index_col="bus_id", ) # Rename index @@ -187,7 +187,7 @@ class ElectricalLoadEtrago(Dataset): def __init__(self, dependencies): super().__init__( name="Electrical_load_etrago", - version="0.0.1", + version="0.0.2", dependencies=dependencies, tasks=(export_to_db,), ) diff --git a/src/egon/data/datasets/heat_etrago/__init__.py b/src/egon/data/datasets/heat_etrago/__init__.py index 758b694db..417661f3e 100644 --- a/src/egon/data/datasets/heat_etrago/__init__.py +++ b/src/egon/data/datasets/heat_etrago/__init__.py @@ -228,7 +228,7 @@ class HeatEtrago(Dataset): def __init__(self, dependencies): super().__init__( name="HeatEtrago", - version="0.0.2", + version="0.0.3", dependencies=dependencies, tasks=(buses, supply), ) diff --git a/src/egon/data/datasets/heat_etrago/power_to_heat.py b/src/egon/data/datasets/heat_etrago/power_to_heat.py index 726472125..0fb03a02a 100644 --- a/src/egon/data/datasets/heat_etrago/power_to_heat.py +++ b/src/egon/data/datasets/heat_etrago/power_to_heat.py @@ -301,7 +301,7 @@ def assign_electrical_bus(heat_pumps, multiple_per_mv_grid=False): # Select mv grid distrcits mv_grid_district = db.select_geodataframe( f""" - SELECT subst_id, geom FROM + SELECT bus_id, geom FROM {sources['egon_mv_grid_district']['schema']}. {sources['egon_mv_grid_district']['table']} """ @@ -339,7 +339,7 @@ def assign_electrical_bus(heat_pumps, multiple_per_mv_grid=False): # Assign power bus per zensus cell cells["power_bus"] = gpd.sjoin( cells, mv_grid_district, how="inner", op="intersects" - ).subst_id + ).bus_id # Calclate district heating demand per substaion demand_per_substation = pd.DataFrame( diff --git a/src/egon/data/datasets/heat_supply/__init__.py b/src/egon/data/datasets/heat_supply/__init__.py index a7d46b707..b742b3052 100644 --- a/src/egon/data/datasets/heat_supply/__init__.py +++ b/src/egon/data/datasets/heat_supply/__init__.py @@ -134,7 +134,7 @@ class HeatSupply(Dataset): def __init__(self, dependencies): super().__init__( name="HeatSupply", - version="0.0.1", + version="0.0.2", dependencies=dependencies, tasks=(create_tables, district_heating, individual_heating, potential_germany), diff --git a/src/egon/data/datasets/heat_supply/individual_heating.py b/src/egon/data/datasets/heat_supply/individual_heating.py index ca00a433c..726bf3739 100644 --- a/src/egon/data/datasets/heat_supply/individual_heating.py +++ b/src/egon/data/datasets/heat_supply/individual_heating.py @@ -124,7 +124,7 @@ def cascade_heat_supply_indiv(scenario, distribution_level, plotting=True): # Select residential heat demand per mv grid district and federal state heat_per_mv = db.select_geodataframe( f""" - SELECT d.subst_id as bus_id, SUM(demand) as demand, + SELECT d.bus_id as bus_id, SUM(demand) as demand, c.vg250_lan as state, d.geom FROM {sources['heat_demand']['schema']}. {sources['heat_demand']['table']} a @@ -133,17 +133,17 @@ def cascade_heat_supply_indiv(scenario, distribution_level, plotting=True): ON a.zensus_population_id = b.zensus_population_id JOIN {sources['map_vg250_grid']['schema']}. {sources['map_vg250_grid']['table']} c - ON b.subst_id = c.bus_id + ON b.bus_id = c.bus_id JOIN {sources['mv_grids']['schema']}. {sources['mv_grids']['table']} d - ON d.subst_id = c.bus_id + ON d.bus_id = c.bus_id WHERE scenario = '{scenario}' AND sector = 'residential' AND a.zensus_population_id NOT IN ( SELECT zensus_population_id FROM {sources['map_dh']['schema']}.{sources['map_dh']['table']} WHERE scenario = '{scenario}') - GROUP BY d.subst_id, vg250_lan, geom + GROUP BY d.bus_id, vg250_lan, geom """, index_col = 'bus_id') @@ -191,7 +191,7 @@ def plot_heat_supply(resulting_capacities): mv_grids = db.select_geodataframe( """ SELECT * FROM grid.egon_mv_grid_district - """, index_col='subst_id') + """, index_col='bus_id') for c in ['CHP', 'heat_pump']: mv_grids[c] = resulting_capacities[ diff --git a/src/egon/data/datasets/hh_demand_profiles.py b/src/egon/data/datasets/hh_demand_profiles.py index 63d2b9388..96e6ced0a 100644 --- a/src/egon/data/datasets/hh_demand_profiles.py +++ b/src/egon/data/datasets/hh_demand_profiles.py @@ -226,7 +226,7 @@ class EgonEtragoElectricityHouseholds(Base): __table_args__ = {"schema": "demand"} version = Column(String, primary_key=True) - subst_id = Column(Integer, primary_key=True) + bus_id = Column(Integer, primary_key=True) scn_name = Column(String, primary_key=True) p_set = Column(ARRAY(Float)) q_set = Column(ARRAY(Float)) @@ -235,7 +235,7 @@ class EgonEtragoElectricityHouseholds(Base): hh_demand_setup = partial( Dataset, name="HH Demand", - version="0.0.1", + version="0.0.2", dependencies=[], # Tasks are declared in pipeline as function is used multiple times with different args # To differentiate these tasks PythonOperator with specific id-names are used @@ -1451,7 +1451,7 @@ def mv_grid_district_HH_electricity_load( Returns ------- pd.DataFrame - Multiindexed dataframe with `timestep` and `subst_id` as indexers. + Multiindexed dataframe with `timestep` and `bus_id` as indexers. Demand is given in kWh. """ engine = db.engine() @@ -1459,7 +1459,7 @@ def mv_grid_district_HH_electricity_load( with db.session_scope() as session: cells_query = session.query( HouseholdElectricityProfilesInCensusCells, - MapZensusGridDistricts.subst_id, + MapZensusGridDistricts.bus_id, ).join( MapZensusGridDistricts, HouseholdElectricityProfilesInCensusCells.cell_id @@ -1481,7 +1481,7 @@ def mv_grid_district_HH_electricity_load( # Create aggregated load profile for each MV grid district mvgd_profiles_dict = {} - for grid_district, data in cells.groupby("subst_id"): + for grid_district, data in cells.groupby("bus_id"): mvgd_profile = get_load_timeseries( df_profiles=df_profiles, df_cell_demand_metadata=data, @@ -1494,7 +1494,7 @@ def mv_grid_district_HH_electricity_load( # Reshape data: put MV grid ids in columns to a single index column mvgd_profiles = mvgd_profiles.reset_index() - mvgd_profiles.columns = ["subst_id", "p_set"] + mvgd_profiles.columns = ["bus_id", "p_set"] # Add remaining columns mvgd_profiles["version"] = version diff --git a/src/egon/data/datasets/industry/temporal.py b/src/egon/data/datasets/industry/temporal.py index feeadb7ac..0dc1592b7 100644 --- a/src/egon/data/datasets/industry/temporal.py +++ b/src/egon/data/datasets/industry/temporal.py @@ -43,7 +43,7 @@ def identify_bus(load_curves, demand_area): # Select mv griddistrict griddistrict = db.select_geodataframe( - f"""SELECT subst_id, geom FROM + f"""SELECT bus_id, geom FROM {sources['egon_mv_grid_district']['schema']}. {sources['egon_mv_grid_district']['table']}""", geom_col="geom", @@ -86,7 +86,7 @@ def identify_bus(load_curves, demand_area): # Combine dataframes to bring loadcurves and bus id together curves_da = pd.merge( load_curves.T, - peak_bus[["subst_id", "id"]], + peak_bus[["bus_id", "id"]], left_index=True, right_on="id", ) @@ -165,7 +165,7 @@ def calc_load_curves_ind_osm(scenario): curves_da = identify_bus(load_curves, demand_area) # Group all load curves per bus - curves_bus = curves_da.drop(["id"], axis=1).fillna(0).groupby("subst_id").sum() + curves_bus = curves_da.drop(["id"], axis=1).fillna(0).groupby("bus_id").sum() # Initalize pandas.DataFrame for export to database load_ts_df = pd.DataFrame(index=curves_bus.index, columns=["p_set"]) @@ -288,7 +288,7 @@ def calc_load_curves_ind_sites(scenario): # Group all load curves per bus and wz curves_bus = ( - curves_da.fillna(0).groupby(["subst_id", "wz"]).sum().drop(["id"], axis=1) + curves_da.fillna(0).groupby(["bus_id", "wz"]).sum().drop(["id"], axis=1) ) # Initalize pandas.DataFrame for pf table load timeseries diff --git a/src/egon/data/datasets/mv_grid_districts.py b/src/egon/data/datasets/mv_grid_districts.py index 0371e39c1..8396e2c57 100644 --- a/src/egon/data/datasets/mv_grid_districts.py +++ b/src/egon/data/datasets/mv_grid_districts.py @@ -1,16 +1,63 @@ """ -Implements the methods for creating medium-voltage grid district areas from -HV-MV substation locations and municipality borders +Medium-voltage grid districts describe the area supplied by one MV grid -Methods are heavily inspired by `Hülk et al. (2017) -`_ -(section 2.3), but differ in detail. -Direct adjacency is preferred over proximity. For polygons of municipalities -without a substation inside it is iteratively checked for direct adjacent -other polygons that have a substation inside. Hence, MV grid districts grow -around a polygon with a substation inside. +Medium-voltage grid districts are defined by one polygon that represents the +supply area. Each MV grid district is connected to the HV grid via a single +substation. -See :func:`define_mv_grid_districts` for more details. +The methods used for identifying the MV grid districts are heavily inspired +by `Hülk et al. (2017) +`_ +(section 2.3), but the implementation differs in detail. +The main difference is that direct adjacency is preferred over proximity. +For polygons of municipalities +without a substation inside, it is iteratively checked for direct adjacent +other polygons that have a substation inside. Speaking visually, a MV grid +district grows around a polygon with a substation inside. + +The grid districts are identified using three data sources + +1. Polygons of municipalities (:class:`Vg250GemClean`) +2. HV-MV substations (:class:`EgonHvmvSubstation`) +3. HV-MV substation voronoi polygons (:class:`EgonHvmvSubstationVoronoi`) + +Fundamentally, it is assumed that grid districts (supply areas) often go +along borders of administrative units, in particular along the borders of +municipalities due to the concession levy. +Furthermore, it is assumed that one grid district is supplied via a single +substation and that locations of substations and grid districts are designed +for aiming least lengths of grid line and cables. + +With these assumptions, the three data sources from above are processed as +follows: + +* Find the number of substations inside each municipality +* Split municipalities with more than one substation inside + * Cut polygons of municipalities with voronoi polygons of respective + substations + * Assign resulting municipality polygon fragments to nearest substation +* Assign municipalities without a single substation to nearest substation in + the neighborhood +* Merge all municipality polygons and parts of municipality polygons to a + single polygon grouped by the assigned substation + +For finding the nearest substation, as already said, direct adjacency is +preferred over closest distance. This means, the nearest substation does not +necessarily have to be the closest substation in the sense of beeline distance. +But it is the substation definitely located in a neighboring polygon. This +prevents the algorithm to find solutions where a MV grid districts consists of +multi-polygons with some space in between. +Nevertheless, beeline distance still plays an important role, as the algorithm +acts in two steps + +1. Iteratively look for neighboring polygons until there are no further + polygons +2. Find a polygon to assign to by minimum beeline distance + +The second step is required in order to cover edge cases, such as islands. + +For understanding how this is implemented into separate functions, please +see :func:`define_mv_grid_districts`. """ from functools import partial @@ -81,7 +128,7 @@ class HvmvSubstPerMunicipality(Base): class VoronoiMunicipalityCutsBase(object): - subst_id = Column(Integer) + bus_id = Column(Integer) municipality_id = Column(Integer) voronoi_id = Column(Integer) ags_0 = Column(String) @@ -122,7 +169,7 @@ class MvGridDistrictsDissolved(Base): Sequence(f"{__tablename__}_id_seq", schema="grid"), primary_key=True, ) - subst_id = Column(Integer) + bus_id = Column(Integer) geom = Column(Geometry("MultiPolygon", 3035)) area = Column(Float) @@ -131,7 +178,7 @@ class MvGridDistricts(Base): __tablename__ = "egon_mv_grid_district" __table_args__ = {"schema": "grid"} - subst_id = Column(Integer, primary_key=True) + bus_id = Column(Integer, primary_key=True) geom = Column(Geometry("MultiPolygon", 3035)) area = Column(Float) @@ -262,7 +309,7 @@ def split_multi_substation_municipalities(): VoronoiMunicipalityCuts.municipality_id, VoronoiMunicipalityCuts.ags_0, VoronoiMunicipalityCuts.geom, - VoronoiMunicipalityCuts.subst_id, + VoronoiMunicipalityCuts.bus_id, VoronoiMunicipalityCuts.voronoi_id, ], q, @@ -298,7 +345,7 @@ def split_multi_substation_municipalities(): ).update( { "subst_count": cuts_substation_subquery.c.subst_count, - "subst_id": cuts_substation_subquery.c.bus_id, + "bus_id": cuts_substation_subquery.c.bus_id, "geom_sub": cuts_substation_subquery.c.geom_sub, }, synchronize_session="fetch", @@ -397,7 +444,7 @@ def assign_substation_municipality_fragments( with_substation, without_substation, strategy, session ): """ - Assign subst_id from next neighboring polygon to municipality fragment + Assign bus_id from next neighboring polygon to municipality fragment For parts municipalities without a substation inside their polygon the next municipality polygon part is found and assigned. @@ -427,7 +474,7 @@ def assign_substation_municipality_fragments( different in detail. """ # Determine nearest neighboring polygon that has a substation - columns_from_cut1_subst = ["subst_id", "subst_count", "geom_sub"] + columns_from_cut1_subst = ["bus_id", "subst_count", "geom_sub"] if strategy == "touches": neighboring_criterion = func.ST_Touches( @@ -526,14 +573,14 @@ def merge_polygons_to_grid_district(): # Step 1: Merge municipality parts cut by voronoi polygons according # to prior determined associated substation joined_municipality_parts = session.query( - VoronoiMunicipalityCutsAssigned.subst_id, + VoronoiMunicipalityCutsAssigned.bus_id, func.ST_Multi( func.ST_Union(VoronoiMunicipalityCutsAssigned.geom) ).label("geom"), func.sum(func.ST_Area(VoronoiMunicipalityCutsAssigned.geom)).label( "area" ), - ).group_by(VoronoiMunicipalityCutsAssigned.subst_id) + ).group_by(VoronoiMunicipalityCutsAssigned.bus_id) joined_municipality_parts_insert = ( MvGridDistrictsDissolved.__table__.insert().from_select( @@ -585,7 +632,7 @@ def merge_polygons_to_grid_district(): while True: previous_ids_length = len(already_assigned) with_substation = session.query( - MvGridDistrictsDissolved.subst_id, + MvGridDistrictsDissolved.bus_id, MvGridDistrictsDissolved.geom, MvGridDistrictsDissolved.id, ).subquery() @@ -615,7 +662,7 @@ def merge_polygons_to_grid_district(): # Step 4: Merge MV grid district parts # Forms one (multi-)polygon for each substation joined_mv_grid_district_parts = session.query( - MvGridDistrictsDissolved.subst_id, + MvGridDistrictsDissolved.bus_id, func.ST_Multi( func.ST_Buffer( func.ST_Buffer( @@ -625,7 +672,7 @@ def merge_polygons_to_grid_district(): ) ).label("geom"), func.sum(MvGridDistrictsDissolved.area).label("area"), - ).group_by(MvGridDistrictsDissolved.subst_id) + ).group_by(MvGridDistrictsDissolved.bus_id) joined_mv_grid_district_parts_insert = ( MvGridDistricts.__table__.insert().from_select( @@ -688,7 +735,7 @@ def nearest_polygon_with_substation( session.query( without_substation.c.id, func.ST_Multi(without_substation.c.geom).label("geom"), - with_substation.c.subst_id, + with_substation.c.bus_id, func.ST_Area(func.ST_Multi(without_substation.c.geom)).label( "area" ), @@ -718,7 +765,7 @@ def nearest_polygon_with_substation( # Take only one candidate polygon for assgning it nearest_neighbors = session.query( - all_nearest_neighbors.c.subst_id, + all_nearest_neighbors.c.bus_id, all_nearest_neighbors.c.geom, all_nearest_neighbors.c.area, ).distinct(all_nearest_neighbors.c.id) @@ -772,7 +819,7 @@ def define_mv_grid_districts(): mv_grid_districts_setup = partial( Dataset, name="MvGridDistricts", - version="0.0.1", + version="0.0.2", dependencies=[], tasks=(define_mv_grid_districts), ) diff --git a/src/egon/data/datasets/osm/__init__.py b/src/egon/data/datasets/osm/__init__.py index faee42367..664a06eb5 100644 --- a/src/egon/data/datasets/osm/__init__.py +++ b/src/egon/data/datasets/osm/__init__.py @@ -11,15 +11,21 @@ from pathlib import Path from urllib.request import urlretrieve +import datetime import json import os import shutil import time +import re import importlib_resources as resources from egon.data import db from egon.data.config import settings +from egon.data.metadata import (context, + license_odbl, + meta_metadata, + generate_resource_fields_from_db_table) from egon.data.datasets import Dataset import egon.data.config import egon.data.subprocess as subprocess @@ -114,7 +120,9 @@ def to_postgres(num_processes=1, cache_size=4096): def add_metadata(): - """Writes metadata JSON string into table comment.""" + """Writes metadata JSON string into table comment. + + """ # Prepare variables osm_config = egon.data.config.datasets()["openstreetmap"] @@ -125,66 +133,56 @@ def add_metadata(): osm_url = osm_config["original_data"]["source"]["url_testmode"] input_filename = osm_config["original_data"]["target"]["file_testmode"] - spatial_and_date = Path(input_filename).name.split("-") - spatial_extend = spatial_and_date[0] - osm_data_date = ( - "20" - + spatial_and_date[1][0:2] - + "-" - + spatial_and_date[1][2:4] - + "-" - + spatial_and_date[1][4:6] - ) + # Extract spatial extend and date + (spatial_extend, osm_data_date) = re.compile( + "^([\\w-]*).*-(\\d+)$").findall( + Path(input_filename).name.split('.')[0] + )[0] + osm_data_date = datetime.datetime.strptime( + osm_data_date, '%y%m%d').strftime('%y-%m-%d') # Insert metadata for each table - licenses = [ - { - "name": "Open Data Commons Open Database License 1.0", - "title": "", - "path": "https://opendatacommons.org/licenses/odbl/1.0/", - "instruction": ( - "You are free: To Share, To Create, To Adapt;" - " As long as you: Attribute, Share-Alike, Keep open!" - ), - "attribution": "© Reiner Lemoine Institut", - } - ] + licenses = [license_odbl(attribution="© OpenStreetMap contributors")] + for table in osm_config["processed"]["tables"]: + schema_table = ".".join([osm_config["processed"]["schema"], table]) table_suffix = table.split("_")[1] meta = { + "name": schema_table, "title": f"OpenStreetMap (OSM) - Germany - {table_suffix}", + "id": "WILL_BE_SET_AT_PUBLICATION", "description": ( "OpenStreetMap is a free, editable map of the" " whole world that is being built by volunteers" " largely from scratch and released with" - " an open-content license." + " an open-content license.\n\n" + "The OpenStreetMap data here is the result of an PostgreSQL " + "database import using osm2pgsql with a custom style file." ), - "language": ["EN", "DE"], + "language": ["en-EN", "de-DE"], + "publicationDate": datetime.date.today().isoformat(), + "context": context(), "spatial": { - "location": "", + "location": None, "extent": f"{spatial_extend}", - "resolution": "", + "resolution": None, }, "temporal": { "referenceDate": f"{osm_data_date}", "timeseries": { - "start": "", - "end": "", - "resolution": "", - "alignment": "", - "aggregationType": "", + "start": None, + "end": None, + "resolution": None, + "alignment": None, + "aggregationType": None, }, }, "sources": [ { - "title": ( - "Geofabrik - Download - OpenStreetMap Data Extracts" - ), + "title": "OpenStreetMap Data Extracts (Geofabrik)", "description": ( - 'Data dump taken on "referenceDate",' - f" i.e. {osm_data_date}." - " A subset of this is selected using osm2pgsql" - ' using the style file "oedb.style".' + "Full data extract of OpenStreetMap data for defined " + "spatial extent at ''referenceDate''" ), "path": f"{osm_url}", "licenses": licenses, @@ -196,20 +194,38 @@ def add_metadata(): "title": "Guido Pleßmann", "email": "http://github.com/gplssm", "date": time.strftime("%Y-%m-%d"), - "object": "", + "object": None, "comment": "Imported data", + }, + { + "title": "Jonathan Amme", + "email": "http://github.com/nesnoj", + "date": time.strftime("%Y-%m-%d"), + "object": None, + "comment": "Metadata extended", } ], - "metaMetadata": { - "metadataVersion": "OEP-1.4.0", - "metadataLicense": { - "name": "CC0-1.0", - "title": "Creative Commons Zero v1.0 Universal", - "path": ( - "https://creativecommons.org/publicdomain/zero/1.0/" - ), - }, - }, + "resources": [ + { + "profile": "tabular-data-resource", + "name": schema_table, + "path": None, + "format": "PostgreSQL", + "encoding": "UTF-8", + "schema": { + "fields": generate_resource_fields_from_db_table( + osm_config["processed"]["schema"], + table), + "primaryKey": ["id"], + "foreignKeys": [] + }, + "dialect": { + "delimiter": None, + "decimalSeparator": "." + } + } + ], + "metaMetadata": meta_metadata(), } meta_json = "'" + json.dumps(meta) + "'" @@ -285,7 +301,7 @@ class OpenStreetMap(Dataset): def __init__(self, dependencies): super().__init__( name="OpenStreetMap", - version="0.0.2", + version="0.0.3", dependencies=dependencies, tasks=(download, to_postgres, modify_tables, add_metadata), ) diff --git a/src/egon/data/datasets/power_plants/__init__.py b/src/egon/data/datasets/power_plants/__init__.py index bd6ab4dbc..963378e03 100755 --- a/src/egon/data/datasets/power_plants/__init__.py +++ b/src/egon/data/datasets/power_plants/__init__.py @@ -48,7 +48,7 @@ class PowerPlants(Dataset): def __init__(self, dependencies): super().__init__( name="PowerPlants", - version="0.0.1", + version="0.0.2", dependencies=dependencies, tasks=( create_tables, @@ -509,7 +509,7 @@ def assign_bus_id(power_plants, cfg): power_plants.loc[power_plants_hv, "bus_id"] = gpd.sjoin( power_plants[power_plants.index.isin(power_plants_hv)], mv_grid_districts, - ).subst_id + ).bus_id # Assign power plants in ehv to ehv bus power_plants_ehv = power_plants[power_plants.voltage_level < 3].index diff --git a/src/egon/data/datasets/power_plants/pv_ground_mounted.py b/src/egon/data/datasets/power_plants/pv_ground_mounted.py index 9f8da5abd..84848f0c1 100644 --- a/src/egon/data/datasets/power_plants/pv_ground_mounted.py +++ b/src/egon/data/datasets/power_plants/pv_ground_mounted.py @@ -451,9 +451,9 @@ def build_additional_pv(potentials, pv, pow_per_area, con): """ # get MV grid districts - sql = "SELECT subst_id, geom FROM grid.egon_mv_grid_district" + sql = "SELECT bus_id, geom FROM grid.egon_mv_grid_district" distr = gpd.GeoDataFrame.from_postgis(sql, con) - distr = distr.set_index("subst_id") + distr = distr.set_index("bus_id") # identify potential areas where there are no PV parks yet for index, pv in pv.iterrows(): @@ -910,9 +910,9 @@ def run_methodology( # 1) eGon2035 # get MV grid districts - sql = "SELECT subst_id, geom FROM grid.egon_mv_grid_district" + sql = "SELECT bus_id, geom FROM grid.egon_mv_grid_district" distr = gpd.GeoDataFrame.from_postgis(sql, con) - distr = distr.set_index("subst_id") + distr = distr.set_index("bus_id") # assign pv_per_distr-power to districts distr["capacity"] = pd.Series() @@ -959,9 +959,9 @@ def run_methodology( # 2) eGon100RE # get MV grid districts - sql = "SELECT subst_id, geom FROM grid.egon_mv_grid_district" + sql = "SELECT bus_id, geom FROM grid.egon_mv_grid_district" distr = gpd.GeoDataFrame.from_postgis(sql, con) - distr = distr.set_index("subst_id") + distr = distr.set_index("bus_id") # assign pv_per_distr-power to districts distr["capacity"] = pd.Series() diff --git a/src/egon/data/datasets/power_plants/pv_rooftop.py b/src/egon/data/datasets/power_plants/pv_rooftop.py index b78d90a07..2d6d69eb9 100644 --- a/src/egon/data/datasets/power_plants/pv_rooftop.py +++ b/src/egon/data/datasets/power_plants/pv_rooftop.py @@ -79,7 +79,7 @@ def pv_rooftop_per_mv_grid(scenario="eGon2035", level="federal_state"): demand = db.select_dataframe( f""" SELECT SUM(demand) as demand, - subst_id as bus_id, vg250_lan + b.bus_id, vg250_lan FROM {sources['electricity_demand']['schema']}. {sources['electricity_demand']['table']} a JOIN {sources['map_zensus_grid_districts']['schema']}. @@ -87,9 +87,9 @@ def pv_rooftop_per_mv_grid(scenario="eGon2035", level="federal_state"): ON a.zensus_population_id = b.zensus_population_id JOIN {sources['map_grid_boundaries']['schema']}. {sources['map_grid_boundaries']['table']} c - ON c.bus_id = b.subst_id + ON c.bus_id = b.bus_id WHERE scenario = 'eGon2035' - GROUP BY (subst_id, vg250_lan) + GROUP BY (b.bus_id, vg250_lan) """ ) @@ -163,7 +163,7 @@ def pv_rooftop_per_mv_grid(scenario="eGon2035", level="federal_state"): mv_grid_districts = db.select_geodataframe( f""" - SELECT subst_id as bus_id, ST_Centroid(geom) as geom + SELECT bus_id as bus_id, ST_Centroid(geom) as geom FROM {sources['egon_mv_grid_district']['schema']}. {sources['egon_mv_grid_district']['table']} """, diff --git a/src/egon/data/datasets/vg250/__init__.py b/src/egon/data/datasets/vg250/__init__.py index ca8d12be2..944e00c80 100644 --- a/src/egon/data/datasets/vg250/__init__.py +++ b/src/egon/data/datasets/vg250/__init__.py @@ -11,6 +11,9 @@ from pathlib import Path from urllib.request import urlretrieve +import time +import datetime +import codecs import json import os @@ -21,6 +24,9 @@ from egon.data.config import settings from egon.data.datasets import Dataset import egon.data.config +from egon.data.metadata import (context, + meta_metadata, + licenses_datenlizenz_deutschland) def download_files(): @@ -149,124 +155,85 @@ def add_metadata(): }, } - url = vg250_config["original_data"]["source"]["url"] - - # Insert metadata for each table - licenses = [ - { - "title": "Datenlizenz Deutschland – Namensnennung – Version 2.0", - "path": "www.govdata.de/dl-de/by-2-0", - "instruction": ( - "Jede Nutzung ist unter den Bedingungen dieser „Datenlizenz " - "Deutschland - Namensnennung - Version 2.0 zulässig.\nDie " - "bereitgestellten Daten und Metadaten dürfen für die " - "kommerzielle und nicht kommerzielle Nutzung insbesondere:" - "(1) vervielfältigt, ausgedruckt, präsentiert, verändert, " - "bearbeitet sowie an Dritte übermittelt werden;\n " - "(2) mit eigenen Daten und Daten Anderer zusammengeführt und " - "zu selbständigen neuen Datensätzen verbunden werden;\n " - "(3) in interne und externe Geschäftsprozesse, Produkte und " - "Anwendungen in öffentlichen und nicht öffentlichen " - "elektronischen Netzwerken eingebunden werden." - ), - "attribution": "© Bundesamt für Kartographie und Geodäsie", - } - ] + licenses = [licenses_datenlizenz_deutschland( + attribution="© Bundesamt für Kartographie und Geodäsie " + "2020 (Daten verändert)" + )] + + vg250_source = { + "title": "Verwaltungsgebiete 1:250 000 (Ebenen)", + "description": + "Der Datenbestand umfasst sämtliche Verwaltungseinheiten der " + "hierarchischen Verwaltungsebenen vom Staat bis zu den Gemeinden " + "mit ihren Grenzen, statistischen Schlüsselzahlen, Namen der " + "Verwaltungseinheit sowie die spezifische Bezeichnung der " + "Verwaltungsebene des jeweiligen Landes.", + "path": vg250_config["original_data"]["source"]["url"], + "licenses": licenses + } + for table in vg250_config["processed"]["file_table_map"].values(): + schema_table = ".".join([vg250_config["processed"]["schema"], table]) meta = { + "name": schema_table, "title": title_and_description[table]["title"], + "id": "WILL_BE_SET_AT_PUBLICATION", "description": title_and_description[table]["title"], - "language": ["DE"], + "language": ["de-DE"], + "publicationDate": datetime.date.today().isoformat(), + "context": context(), "spatial": { - "location": "", + "location": None, "extent": "Germany", - "resolution": "vector", + "resolution": "1:250000", }, "temporal": { "referenceDate": "2020-01-01", "timeseries": { - "start": "", - "end": "", - "resolution": "", - "alignment": "", - "aggregationType": "", + "start": None, + "end": None, + "resolution": None, + "alignment": None, + "aggregationType": None, }, }, - "sources": [ - { - "title": "Dienstleistungszentrum des Bundes für " - "Geoinformation und Geodäsie - Open Data", - "description": "Dieser Datenbestand steht über " - "Geodatendienste gemäß " - "Geodatenzugangsgesetz (GeoZG) " - "(http://www.geodatenzentrum.de/auftrag/pdf" - "/geodatenzugangsgesetz.pdf) für die " - "kommerzielle und nicht kommerzielle " - "Nutzung geldleistungsfrei zum Download " - "und zur Online-Nutzung zur Verfügung. Die " - "Nutzung der Geodaten und Geodatendienste " - "wird durch die Verordnung zur Festlegung " - "der Nutzungsbestimmungen für die " - "Bereitstellung von Geodaten des Bundes " - "(GeoNutzV) (http://www.geodatenzentrum.de" - "/auftrag/pdf/geonutz.pdf) geregelt. " - "Insbesondere hat jeder Nutzer den " - "Quellenvermerk zu allen Geodaten, " - "Metadaten und Geodatendiensten erkennbar " - "und in optischem Zusammenhang zu " - "platzieren. Veränderungen, Bearbeitungen, " - "neue Gestaltungen oder sonstige " - "Abwandlungen sind mit einem " - "Veränderungshinweis im Quellenvermerk zu " - "versehen. Quellenvermerk und " - "Veränderungshinweis sind wie folgt zu " - "gestalten. Bei der Darstellung auf einer " - "Webseite ist der Quellenvermerk mit der " - "URL http://www.bkg.bund.de zu verlinken. " - "© GeoBasis-DE / BKG © GeoBasis-DE / BKG " - " " - "(Daten verändert) Beispiel: " - "© GeoBasis-DE / BKG 2013", - "path": url, - "licenses": "Geodatenzugangsgesetz (GeoZG)", - "copyright": "© GeoBasis-DE / BKG 2016 (Daten verändert)", - }, - { - "title": "BKG - Verwaltungsgebiete 1:250.000 (vg250)", - "description": "Der Datenbestand umfasst sämtliche " - "Verwaltungseinheiten aller hierarchischen " - "Verwaltungsebenen vom Staat bis zu den " - "Gemeinden mit ihren Verwaltungsgrenzen, " - "statistischen Schlüsselzahlen und dem " - "Namen der Verwaltungseinheit sowie der " - "spezifischen Bezeichnung der " - "Verwaltungsebene des jeweiligen " - "Bundeslandes.", - "path": "http://www.bkg.bund.de", - "licenses": licenses, - }, - ], + "sources": [vg250_source], "licenses": licenses, "contributors": [ { "title": "Guido Pleßmann", "email": "http://github.com/gplssm", - "date": "2020-12-04", - "object": "", + "date": time.strftime("%Y-%m-%d"), + "object": None, "comment": "Imported data", + }, + { + "title": "Jonathan Amme", + "email": "http://github.com/nesnoj", + "date": time.strftime("%Y-%m-%d"), + "object": None, + "comment": "Metadata extended", } ], - "metaMetadata": { - "metadataVersion": "OEP-1.4.0", - "metadataLicense": { - "name": "CC0-1.0", - "title": "Creative Commons Zero v1.0 Universal", - "path": ( - "https://creativecommons.org/publicdomain/zero/1.0/" - ), - }, - }, + "resources": [ + { + "profile": "tabular-data-resource", + "name": schema_table, + "path": None, + "format": "PostgreSQL", + "encoding": "UTF-8", + "schema": { + "fields": vg250_metadata_resources_fields(), + "primaryKey": ["id"], + "foreignKeys": [] + }, + "dialect": { + "delimiter": None, + "decimalSeparator": "." + } + } + ], + "metaMetadata": meta_metadata(), } meta_json = "'" + json.dumps(meta) + "'" @@ -290,6 +257,43 @@ def cleaning_and_preperation(): ) +def vg250_metadata_resources_fields(): + + return [ + {'description': 'Index', 'name': 'id', 'type': 'integer', 'unit': 'none'}, + {'description': 'Administrative level', 'name': 'ade', 'type': 'integer', 'unit': 'none'}, + {'description': 'Geofactor', 'name': 'gf', 'type': 'integer', 'unit': 'none'}, + {'description': 'Particular areas', 'name': 'bsg', 'type': 'integer', 'unit': 'none'}, + {'description': 'Territorial code', 'name': 'ars', 'type': 'string', 'unit': 'none'}, + {'description': 'Official Municipality Key', 'name': 'ags', 'type': 'string', 'unit': 'none'}, + {'description': 'Seat of the administration (territorial code)', 'name': 'sdv_ars', 'type': 'string', 'unit': 'none'}, + {'description': 'Geographical name', 'name': 'gen', 'type': 'string', 'unit': 'none'}, + {'description': 'Designation of the administrative unit', 'name': 'bez', 'type': 'string', 'unit': 'none'}, + {'description': 'Identifier', 'name': 'ibz', 'type': 'integer', 'unit': 'none'}, + {'description': 'Note', 'name': 'bem', 'type': 'string', 'unit': 'none'}, + {'description': 'Name generation', 'name': 'nbd', 'type': 'string', 'unit': 'none'}, + {'description': 'Land (state)', 'name': 'sn_l', 'type': 'string', 'unit': 'none'}, + {'description': 'Administrative district', 'name': 'sn_r', 'type': 'string', 'unit': 'none'}, + {'description': 'District', 'name': 'sn_k', 'type': 'string', 'unit': 'none'}, + {'description': 'Administrative association – front part', 'name': 'sn_v1', 'type': 'string', 'unit': 'none'}, + {'description': 'Administrative association – rear part', 'name': 'sn_v2', 'type': 'string', 'unit': 'none'}, + {'description': 'Municipality', 'name': 'sn_g', 'type': 'string', 'unit': 'none'}, + {'description': 'Function of the 3rd key digit', 'name': 'fk_s3', 'type': 'string', 'unit': 'none'}, + {'description': 'European statistics key', 'name': 'nuts', 'type': 'string', 'unit': 'none'}, + {'description': 'Filled territorial code', 'name': 'ars_0', 'type': 'string', 'unit': 'none'}, + {'description': 'Filled Official Municipality Key', 'name': 'ags_0', 'type': 'string', 'unit': 'none'}, + {'description': 'Effectiveness', 'name': 'wsk', 'type': 'string', 'unit': 'none'}, + {'description': 'DLM identifier', 'name': 'debkg_id', 'type': 'string', 'unit': 'none'}, + {'description': 'Territorial code (deprecated column)', 'name': 'rs', 'type': 'string', 'unit': 'none'}, + {'description': 'Seat of the administration (territorial code, deprecated column)', 'name': 'sdv_rs', 'type': 'string', 'unit': 'none'}, + {'description': 'Filled territorial code (deprecated column)', 'name': 'rs_0', 'type': 'string', 'unit': 'none'}, + {'description': 'Geometry of areas as WKB', + 'name': 'geometry', + 'type': "Geometry(Polygon, srid=4326)", + 'unit': 'none'} + ] + + class Vg250(Dataset): filename = egon.data.config.datasets()["vg250"]["original_data"]["source"][ @@ -299,7 +303,7 @@ class Vg250(Dataset): def __init__(self, dependencies): super().__init__( name="VG250", - version=self.filename + "-0.0.3", + version=self.filename + "-0.0.4", dependencies=dependencies, tasks=( download_files, diff --git a/src/egon/data/datasets/vg250_mv_grid_districts.py b/src/egon/data/datasets/vg250_mv_grid_districts.py index f8dcf25ed..d6f556b37 100644 --- a/src/egon/data/datasets/vg250_mv_grid_districts.py +++ b/src/egon/data/datasets/vg250_mv_grid_districts.py @@ -15,7 +15,7 @@ class Vg250MvGridDistricts(Dataset): def __init__(self, dependencies): super().__init__( name="Vg250MvGridDistricts", - version="0.0.0", + version="0.0.1", dependencies=dependencies, tasks=(mapping), ) @@ -62,7 +62,7 @@ def mapping(): # Select sources from database mv_grid_districts = db.select_geodataframe( f""" - SELECT subst_id as bus_id, ST_Centroid(geom) as geom + SELECT bus_id as bus_id, ST_Centroid(geom) as geom FROM {sources['egon_mv_grid_district']['schema']}. {sources['egon_mv_grid_district']['table']} """, diff --git a/src/egon/data/datasets/zensus_mv_grid_districts.py b/src/egon/data/datasets/zensus_mv_grid_districts.py index bf11b9229..7c8d98c79 100644 --- a/src/egon/data/datasets/zensus_mv_grid_districts.py +++ b/src/egon/data/datasets/zensus_mv_grid_districts.py @@ -17,7 +17,7 @@ class ZensusMvGridDistricts(Dataset): def __init__(self, dependencies): super().__init__( name="ZensusMvGridDistricts", - version="0.0.0", + version="0.0.1", dependencies=dependencies, tasks=(mapping), ) @@ -37,7 +37,7 @@ class MapZensusGridDistricts(Base): primary_key=True, index=True, ) - subst_id = Column(Integer, ForeignKey(MvGridDistricts.subst_id)) + bus_id = Column(Integer, ForeignKey(MvGridDistricts.bus_id)) def mapping(): @@ -64,7 +64,7 @@ def mapping(): ) grid_districts = db.select_geodataframe( - f"""SELECT subst_id, geom + f"""SELECT bus_id, geom FROM {cfg['sources']['egon_mv_grid_district']['schema']}. {cfg['sources']['egon_mv_grid_district']['table']}""", geom_col="geom", @@ -75,7 +75,7 @@ def mapping(): join = gpd.sjoin(zensus, grid_districts, how="inner", op="intersects") # Insert results to database - join[["zensus_population_id", "subst_id"]].to_sql( + join[["zensus_population_id", "bus_id"]].to_sql( cfg["targets"]["map"]["table"], schema=cfg["targets"]["map"]["schema"], con=db.engine(), diff --git a/src/egon/data/datasets/zensus_vg250.py b/src/egon/data/datasets/zensus_vg250.py index da7c7db7e..a5e4b1943 100644 --- a/src/egon/data/datasets/zensus_vg250.py +++ b/src/egon/data/datasets/zensus_vg250.py @@ -1,4 +1,6 @@ import json +import time +import datetime from geoalchemy2 import Geometry from sqlalchemy import ( @@ -16,15 +18,20 @@ from egon.data import db import egon.data.config +from egon.data.metadata import (context, + licenses_datenlizenz_deutschland, + meta_metadata) from egon.data.datasets import Dataset +from egon.data.datasets.vg250 import vg250_metadata_resources_fields Base = declarative_base() + class ZensusVg250(Dataset): def __init__(self, dependencies): super().__init__( name="ZensusVg250", - version="0.0.1", + version="0.0.2", dependencies=dependencies, tasks=( map_zensus_vg250, inside_germany, @@ -33,6 +40,7 @@ def __init__(self, dependencies): ) ) + class Vg250Sta(Base): __tablename__ = "vg250_sta" __table_args__ = {"schema": "boundaries"} @@ -330,115 +338,155 @@ def add_metadata_zensus_inside_ger(): Creates a metdadata JSON string and writes it to the database table comment """ + schema_table = ".".join([ + DestatisZensusPopulationPerHaInsideGermany.__table__.schema, + DestatisZensusPopulationPerHaInsideGermany.__table__.name, + ]) + metadata = { + "name": schema_table, "title": "DESTATIS - Zensus 2011 - Population per hectar", - "description": "National census in Germany in 2011 with the bounds on Germanys boarders.", - "language": ["eng", "ger"], + "id": "WILL_BE_SET_AT_PUBLICATION", + "description": ( + "National census in Germany in 2011 with the bounds on Germanys " + "borders." + ), + "language": ["en-EN", "de-DE"], + "publicationDate": datetime.date.today().isoformat(), + "context": context(), "spatial": { - "location": "none", + "location": None, "extent": "Germany", "resolution": "1 ha", }, "temporal": { - "reference_date": "2011", - "start": "none", - "end": "none", - "resolution": "none", + "reference_date": "2011-12-31", + "timeseries": { + "start": None, + "end": None, + "resolution": None, + "alignment": None, + "aggregationType": None, + }, }, "sources": [ { - "name": "Statistisches Bundesamt (Destatis) - Ergebnisse des " - "Zensus 2011 zum Download", - "description": "Als Download bieten wir Ihnen auf dieser Seite " - "zusätzlich zur Zensusdatenbank CSV- und " - "teilweise Excel-Tabellen mit umfassenden " - "Personen-, Haushalts- und Familien- sowie " - "Gebäude- und Wohnungs­merkmalen. Die " - "Ergebnisse liegen auf Bundes-, Länder-, Kreis- " - "und Gemeinde­ebene vor. Außerdem sind einzelne " - "Ergebnisse für Gitterzellen verfügbar.", - "url": "https://www.zensus2011.de/SharedDocs/Aktuelles/Ergebnis" - "se/DemografischeGrunddaten.html;jsessionid=E0A2B4F894B2" - "58A3B22D20448F2E4A91.2_cid380?nn=3065474", - "license": "", - "copyright": "© Statistische Ämter des Bundes und der Länder 2014", + "title": "Statistisches Bundesamt (Destatis) - Ergebnisse des " + "Zensus 2011 zum Download", + "description": ( + "Als Download bieten wir Ihnen auf dieser Seite " + "zusätzlich zur Zensusdatenbank CSV- und " + "teilweise Excel-Tabellen mit umfassenden " + "Personen-, Haushalts- und Familien- sowie " + "Gebäude- und Wohnungsmerkmalen. Die " + "Ergebnisse liegen auf Bundes-, Länder-, Kreis- " + "und Gemeindeebene vor. Außerdem sind einzelne " + "Ergebnisse für Gitterzellen verfügbar." + ), + "path": "https://www.zensus2011.de/DE/Home/Aktuelles/" + "DemografischeGrunddaten.html", + "licenses": [ + licenses_datenlizenz_deutschland( + attribution="© Statistische Ämter des Bundes und der " + "Länder 2014" + ) + ], }, { - "name": "Dokumentation - Zensus 2011 - Methoden und Verfahren", - "description": "Diese Publikation beschreibt ausführlich die " - "Methoden und Verfahren des registergestützten " - "Zensus 2011; von der Datengewinnung und " - "-aufbereitung bis hin zur Ergebniserstellung" - " und Geheimhaltung. Der vorliegende Band wurde " - "von den Statistischen Ämtern des Bundes und " - "der Länder im Juni 2015 veröffentlicht.", - "url": "https://www.destatis.de/DE/Publikationen/Thematisch/Be" - "voelkerung/Zensus/ZensusBuLaMethodenVerfahren51211051" - "19004.pdf?__blob=publicationFile", - "license": "Vervielfältigung und Verbreitung, auch " - "auszugsweise, mit Quellenangabe gestattet.", - "copyright": "© Statistisches Bundesamt, Wiesbaden, 2015 " - "(im Auftrag der Herausgebergemeinschaft)", + "title": "Dokumentation - Zensus 2011 - Methoden und Verfahren", + "description": ( + "Diese Publikation beschreibt ausführlich die " + "Methoden und Verfahren des registergestützten " + "Zensus 2011; von der Datengewinnung und " + "-aufbereitung bis hin zur Ergebniserstellung" + " und Geheimhaltung. Der vorliegende Band wurde " + "von den Statistischen Ämtern des Bundes und " + "der Länder im Juni 2015 veröffentlicht." + ), + "path": "https://www.destatis.de/DE/Publikationen/Thematisch/Be" + "voelkerung/Zensus/ZensusBuLaMethodenVerfahren51211051" + "19004.pdf?__blob=publicationFile", + "licenses": [ + licenses_datenlizenz_deutschland( + attribution="© Statistisches Bundesamt, Wiesbaden " + "2015 (im Auftrag der " + "Herausgebergemeinschaft)" + ) + ], }, ], - "license": { - "id": "dl-de/by-2-0", - "name": "Datenlizenz by-2-0", - "version": "2.0", - "url": "www.govdata.de/dl-de/by-2-0", - "instruction": "Empfohlene Zitierweise des Quellennachweises: " - "Datenquelle: Statistisches Bundesamt, Wiesbaden, " - "Genesis-Online, Abrufdatum; Datenlizenz " - "by-2-0. Quellenvermerk bei eigener Berechnung / " - "Darstellung: Datenquelle: Statistisches Bundesamt, " - "Wiesbaden, Genesis-Online, Abrufdatum; " - "Datenlizenz by-2-0; eigene Berechnung/eigene " - "Darstellung. In elektronischen Werken ist im " - "Quellenverweis dem Begriff (Datenlizenz by-2-0) " - "der Link www.govdata.de/dl-de/by-2-0 als " - "Verknüpfung zu hinterlegen.", - "copyright": "Statistisches Bundesamt, Wiesbaden, Genesis-Online; " - "Datenlizenz by-2-0; eigene Berechnung", - }, + "licenses": [ + licenses_datenlizenz_deutschland( + attribution="© Statistische Ämter des Bundes und der Länder " + "2014; © Statistisches Bundesamt, Wiesbaden 2015 " + "(Daten verändert)" + ) + ], "contributors": [ { "title": "Guido Pleßmann", "email": "http://github.com/gplssm", - "date": "2021-03-11", - "object": "", - "comment": "Created processing ", + "date": time.strftime("%Y-%m-%d"), + "object": None, + "comment": "Imported data", + }, + { + "title": "Jonathan Amme", + "email": "http://github.com/nesnoj", + "date": time.strftime("%Y-%m-%d"), + "object": None, + "comment": "Metadata extended", } ], "resources": [ { - "name": "society.destatis_zensus_population_per_ha", + "profile": "tabular-data-resource", + "name": schema_table, + "path": None, "format": "PostgreSQL", - "fields": [ - { - "name": "id", - "description": "Unique identifier", - "unit": "none", - }, - { - "name": "grid_id", - "description": "Grid number of source", - "unit": "none", - }, - { - "name": "population", - "description": "Number of registred residents", - "unit": "resident", - }, - { - "name": "geom_point", - "description": "Geometry centroid", - "unit": "none", - }, - {"name": "geom", "description": "Geometry", "unit": ""}, - ], + "encoding": "UTF-8", + "schema": { + "fields": [ + { + "name": "id", + "description": "Unique identifier", + "type": "none", + "unit": "integer", + }, + { + "name": "grid_id", + "description": "Grid number of source", + "type": "string", + "unit": "none", + }, + { + "name": "population", + "description": "Number of registred residents", + "type": "integer", + "unit": "resident", + }, + { + "name": "geom_point", + "description": "Geometry centroid", + "type": "Geometry", + "unit": "none", + }, + { + "name": "geom", + "description": "Geometry", + "type": "Geometry", + "unit": ""}, + ], + "primaryKey": ["id"], + "foreignKeys": [] + }, + "dialect": { + "delimiter": None, + "decimalSeparator": "." + } } ], - "metadata_version": "1.3", + "metaMetadata": meta_metadata(), } meta_json = "'" + json.dumps(metadata) + "'" @@ -457,107 +505,113 @@ def add_metadata_vg250_gem_pop(): Creates a metdadata JSON string and writes it to the database table comment """ vg250_config = egon.data.config.datasets()["vg250"] + schema_table = ".".join([Vg250GemPopulation.__table__.schema, + Vg250GemPopulation.__table__.name]) + + licenses = [licenses_datenlizenz_deutschland( + attribution="© Bundesamt für Kartographie und Geodäsie " + "2020 (Daten verändert)" + )] + + vg250_source = { + "title": "Verwaltungsgebiete 1:250 000 (Ebenen)", + "description": + "Der Datenbestand umfasst sämtliche Verwaltungseinheiten der " + "hierarchischen Verwaltungsebenen vom Staat bis zu den Gemeinden " + "mit ihren Grenzen, statistischen Schlüsselzahlen, Namen der " + "Verwaltungseinheit sowie die spezifische Bezeichnung der " + "Verwaltungsebene des jeweiligen Landes.", + "path": vg250_config["original_data"]["source"]["url"], + "licenses": licenses + } - licenses = [ - { - "title": "Datenlizenz Deutschland – Namensnennung – Version 2.0", - "path": "www.govdata.de/dl-de/by-2-0", - "instruction": ( - "Jede Nutzung ist unter den Bedingungen dieser „Datenlizenz " - "Deutschland - Namensnennung - Version 2.0 zulässig.\nDie " - "bereitgestellten Daten und Metadaten dürfen für die " - "kommerzielle und nicht kommerzielle Nutzung insbesondere:" - "(1) vervielfältigt, ausgedruckt, präsentiert, verändert, " - "bearbeitet sowie an Dritte übermittelt werden;\n " - "(2) mit eigenen Daten und Daten Anderer zusammengeführt und " - "zu selbständigen neuen Datensätzen verbunden werden;\n " - "(3) in interne und externe Geschäftsprozesse, Produkte und " - "Anwendungen in öffentlichen und nicht öffentlichen " - "elektronischen Netzwerken eingebunden werden." - ), - "attribution": "© Bundesamt für Kartographie und Geodäsie", - } - ] + resources_fields = vg250_metadata_resources_fields() + resources_fields.extend([ + {'name': 'area_ha', + 'description': 'Area in ha', + 'type': 'float', + 'unit': 'ha'}, + {'name': 'area_km2', + 'description': 'Area in km2', + 'type': 'float', + 'unit': 'km2'}, + {'name': 'population_total', + 'description': 'Number of inhabitants', + 'type': 'integer', + 'unit': 'none'}, + {'name': 'cell_count', + 'description': 'Number of Zensus cells', + 'type': 'integer', + 'unit': 'none'}, + {'name': 'population_density', + 'description': 'Number of inhabitants per km2', + 'type': 'float', + 'unit': 'inhabitants/km²'}, + ]) metadata = { - "title": "Municipalities (BKG Verwaltungsgebiete 250) and population " - "(Destatis Zensus)", + "name": schema_table, + "title": ( + "Municipalities (BKG Verwaltungsgebiete 250) and population " + "(Destatis Zensus)" + ), + "id": "WILL_BE_SET_AT_PUBLICATION", "description": "Municipality data enriched by population data", - "language": ["DE"], + "language": ["de-DE"], + "publicationDate": datetime.date.today().isoformat(), + "context": context(), "spatial": { - "location": "", + "location": None, "extent": "Germany", - "resolution": "vector", + "resolution": "1:250000", }, "temporal": { "referenceDate": "2020-01-01", "timeseries": { - "start": "", - "end": "", - "resolution": "", - "alignment": "", - "aggregationType": "", + "start": None, + "end": None, + "resolution": None, + "alignment": None, + "aggregationType": None, }, }, - "sources": [ - { - "title": "Dienstleistungszentrum des Bundes für " - "Geoinformation und Geodäsie - Open Data", - "description": "Dieser Datenbestand steht über " - "Geodatendienste gemäß " - "Geodatenzugangsgesetz (GeoZG) " - "(http://www.geodatenzentrum.de/auftrag/pdf" - "/geodatenzugangsgesetz.pdf) für die " - "kommerzielle und nicht kommerzielle " - "Nutzung geldleistungsfrei zum Download " - "und zur Online-Nutzung zur Verfügung. Die " - "Nutzung der Geodaten und Geodatendienste " - "wird durch die Verordnung zur Festlegung " - "der Nutzungsbestimmungen für die " - "Bereitstellung von Geodaten des Bundes " - "(GeoNutzV) (http://www.geodatenzentrum.de" - "/auftrag/pdf/geonutz.pdf) geregelt. " - "Insbesondere hat jeder Nutzer den " - "Quellenvermerk zu allen Geodaten, " - "Metadaten und Geodatendiensten erkennbar " - "und in optischem Zusammenhang zu " - "platzieren. Veränderungen, Bearbeitungen, " - "neue Gestaltungen oder sonstige " - "Abwandlungen sind mit einem " - "Veränderungshinweis im Quellenvermerk zu " - "versehen. Quellenvermerk und " - "Veränderungshinweis sind wie folgt zu " - "gestalten. Bei der Darstellung auf einer " - "Webseite ist der Quellenvermerk mit der " - "URL http://www.bkg.bund.de zu verlinken. " - "© GeoBasis-DE / BKG © GeoBasis-DE / BKG " - " " - "(Daten verändert) Beispiel: " - "© GeoBasis-DE / BKG 2013", - "path": vg250_config["original_data"]["source"]["url"], - "licenses": licenses, - "copyright": "© GeoBasis-DE / BKG 2016 (Daten verändert)", - }, - ], + "sources": [vg250_source], "licenses": licenses, "contributors": [ { "title": "Guido Pleßmann", "email": "http://github.com/gplssm", - "date": "2021-03-11", - "object": "", + "date": time.strftime("%Y-%m-%d"), + "object": None, "comment": "Imported data", + }, + { + "title": "Jonathan Amme", + "email": "http://github.com/nesnoj", + "date": time.strftime("%Y-%m-%d"), + "object": None, + "comment": "Metadata extended", } ], - "metaMetadata": { - "metadataVersion": "OEP-1.4.0", - "metadataLicense": { - "name": "CC0-1.0", - "title": "Creative Commons Zero v1.0 Universal", - "path": ("https://creativecommons.org/publicdomain/zero/1.0/"), - }, - }, + "resources": [ + { + "profile": "tabular-data-resource", + "name": schema_table, + "path": None, + "format": "PostgreSQL", + "encoding": "UTF-8", + "schema": { + "fields": resources_fields, + "primaryKey": ["id"], + "foreignKeys": [] + }, + "dialect": { + "delimiter": None, + "decimalSeparator": "." + } + } + ], + "metaMetadata": meta_metadata(), } meta_json = "'" + json.dumps(metadata) + "'" diff --git a/src/egon/data/db.py b/src/egon/data/db.py index 4bc5be3ba..b7816e2dd 100644 --- a/src/egon/data/db.py +++ b/src/egon/data/db.py @@ -68,8 +68,8 @@ def submit_comment(json, schema, table): """Add comment to table. We use `Open Energy Metadata `_ - standard for describging our data. Metadata is stored as JSON in the table + oemetadata/blob/develop/metadata/v141/metadata_key_description.md>`_ + standard for describing our data. Metadata is stored as JSON in the table comment. Parameters @@ -95,6 +95,7 @@ def submit_comment(json, schema, table): # The query throws an error if JSON is invalid execute_sql(check_json_str) + def execute_sql_script(script, encoding="utf-8-sig"): """Execute a SQL script given as a file name. @@ -115,6 +116,7 @@ def execute_sql_script(script, encoding="utf-8-sig"): execute_sql(sqlfile) + @contextmanager def session_scope(): """Provide a transactional scope around a series of operations.""" @@ -181,6 +183,7 @@ def select_dataframe(sql, index_col=None): return df + def select_geodataframe(sql, index_col=None, geom_col='geom', epsg=3035): """ Select data from local database as geopandas.GeoDataFrame @@ -211,6 +214,7 @@ def select_geodataframe(sql, index_col=None, geom_col='geom', epsg=3035): return gdf + def next_etrago_id(component): """ Select next id value for components in etrago tables diff --git a/src/egon/data/metadata.py b/src/egon/data/metadata.py new file mode 100644 index 000000000..53ad6a446 --- /dev/null +++ b/src/egon/data/metadata.py @@ -0,0 +1,269 @@ +from sqlalchemy import MetaData, Table +from sqlalchemy.dialects.postgresql.base import ischema_names +from geoalchemy2 import Geometry + +from egon.data.db import engine + + +def context(): + """ + Project context information for metadata + + Returns + ------- + dict + OEP metadata conform data license information + """ + + return { + "homepage": "https://ego-n.org/", + "documentation": "https://egon-data.readthedocs.io/en/latest/", + "sourceCode": "https://github.com/openego/eGon-data", + "contact": "https://ego-n.org/partners/", + "grantNo": "03EI1002", + "fundingAgency": "Bundesministerium für Wirtschaft und Energie", + "fundingAgencyLogo": "https://www.innovation-beratung-" + "foerderung.de/INNO/Redaktion/DE/Bilder/" + "Titelbilder/titel_foerderlogo_bmwi.jpg?" + "__blob=normal&v=3", + "publisherLogo": "https://ego-n.org/images/eGon_logo_" + "noborder_transbg.svg" + } + + +def meta_metadata(): + """ + Meta data on metadata + + Returns + ------- + dict + OEP metadata conform metadata on metadata + """ + + return { + "metadataVersion": "OEP-1.4.1", + "metadataLicense": { + "name": "CC0-1.0", + "title": "Creative Commons Zero v1.0 Universal", + "path": ( + "https://creativecommons.org/publicdomain/zero/1.0/" + ), + }, + } + + +def licenses_datenlizenz_deutschland(attribution): + """ + License information for Datenlizenz Deutschland + + Parameters + ---------- + attribution : str + Attribution for the dataset incl. © symbol, e.g. '© GeoBasis-DE / BKG' + + Returns + ------- + dict + OEP metadata conform data license information + """ + + return { + "name": "dl-by-de/2.0", + "title": "Datenlizenz Deutschland – Namensnennung – Version 2.0", + "path": "www.govdata.de/dl-de/by-2-0", + "instruction": ( + "Jede Nutzung ist unter den Bedingungen dieser „Datenlizenz " + "Deutschland - Namensnennung - Version 2.0 zulässig.\nDie " + "bereitgestellten Daten und Metadaten dürfen für die " + "kommerzielle und nicht kommerzielle Nutzung insbesondere:" + "(1) vervielfältigt, ausgedruckt, präsentiert, verändert, " + "bearbeitet sowie an Dritte übermittelt werden;\n " + "(2) mit eigenen Daten und Daten Anderer zusammengeführt und " + "zu selbständigen neuen Datensätzen verbunden werden;\n " + "(3) in interne und externe Geschäftsprozesse, Produkte und " + "Anwendungen in öffentlichen und nicht öffentlichen " + "elektronischen Netzwerken eingebunden werden.\n" + "Bei der Nutzung ist sicherzustellen, dass folgende Angaben " + "als Quellenvermerk enthalten sind:\n" + "(1) Bezeichnung des Bereitstellers nach dessen Maßgabe,\n" + "(2) der Vermerk Datenlizenz Deutschland – Namensnennung – " + "Version 2.0 oder dl-de/by-2-0 mit Verweis auf den Lizenztext " + "unter www.govdata.de/dl-de/by-2-0 sowie\n" + "(3) einen Verweis auf den Datensatz (URI)." + "Dies gilt nur soweit die datenhaltende Stelle die Angaben" + "(1) bis (3) zum Quellenvermerk bereitstellt.\n" + "Veränderungen, Bearbeitungen, neue Gestaltungen oder " + "sonstige Abwandlungen sind im Quellenvermerk mit dem Hinweis " + "zu versehen, dass die Daten geändert wurden." + ), + "attribution": attribution + } + + +def license_odbl(attribution): + """ + License information for Open Data Commons Open Database License (ODbL-1.0) + + Parameters + ---------- + attribution : str + Attribution for the dataset incl. © symbol, e.g. + '© OpenStreetMap contributors' + + Returns + ------- + dict + OEP metadata conform data license information + """ + return { + "name": "ODbL-1.0", + "title": "Open Data Commons Open Database License 1.0", + "path": "https://opendatacommons.org/licenses/odbl/1.0/index.html", + "instruction": "You are free: To Share, To Create, To Adapt; " + "As long as you: Attribute, Share-Alike, Keep open!", + "attribution": attribution + } + + +def license_ccby(attribution): + """ + License information for Creative Commons Attribution 4.0 International + (CC-BY-4.0) + + Parameters + ---------- + attribution : str + Attribution for the dataset incl. © symbol, e.g. '© GeoBasis-DE / BKG' + + Returns + ------- + dict + OEP metadata conform data license information + """ + return { + "name": "CC-BY-4.0", + "title": "Creative Commons Attribution 4.0 International", + "path": "https://creativecommons.org/licenses/by/4.0/legalcode", + "instruction": "You are free: To Share, To Create, To Adapt; " + "As long as you: Attribute.", + "attribution": attribution + } + + +def license_geonutzv(attribution): + """ + License information for GeoNutzV + + Parameters + ---------- + attribution : str + Attribution for the dataset incl. © symbol, e.g. '© GeoBasis-DE / BKG' + + Returns + ------- + dict + OEP metadata conform data license information + """ + return { + "name": "geonutzv-de-2013-03-19", + "title": "Verordnung zur Festlegung der Nutzungsbestimmungen für die " + "Bereitstellung von Geodaten des Bundes", + "path": "https://www.gesetze-im-internet.de/geonutzv/", + "instruction": "Geodaten und Geodatendienste, einschließlich " + "zugehöriger Metadaten, werden für alle derzeit " + "bekannten sowie für alle zukünftig bekannten Zwecke " + "kommerzieller und nicht kommerzieller Nutzung " + "geldleistungsfrei zur Verfügung gestellt, soweit " + "durch besondere Rechtsvorschrift nichts anderes " + "bestimmt ist oder vertragliche oder gesetzliche " + "Rechte Dritter dem nicht entgegenstehen.", + "attribution": attribution + } + + +def generate_resource_fields_from_sqla_model(model): + """ Generate a template for the resource fields for metadata from a SQL + Alchemy model. + + For details on the fields see field 14.6.1 of `Open Energy Metadata + `_ standard. + The fields `name` and `type` are automatically filled, the `description` + and `unit` must be filled manually. + + Examples + -------- + >>> from egon.data.metadata import generate_resource_fields_from_sqla_model + ... from egon.data.datasets.zensus_vg250 import Vg250Sta + >>> resources = generate_resource_fields_from_sqla_model(Vg250Sta) + + Parameters + ---------- + model : sqlalchemy.ext.declarative.declarative_base() + SQLA model + + Returns + ------- + list of dict + Resource fields + """ + + return [{'name': col.name, + 'description': '', + 'type': str(col.type).lower(), + 'unit': 'none'} + for col in model.__table__.columns] + + +def generate_resource_fields_from_db_table(schema, table, geom_columns=None): + """ Generate a template for the resource fields for metadata from a + database table. + + For details on the fields see field 14.6.1 of `Open Energy Metadata + `_ standard. + The fields `name` and `type` are automatically filled, the `description` + and `unit` must be filled manually. + + Examples + -------- + >>> from egon.data.metadata import generate_resource_fields_from_db_table + ... resources = generate_resource_fields_from_db_table( + ... 'openstreetmap', 'osm_point', ['geom', 'geom_centroid'] + >>> ) + + Parameters + ---------- + schema : str + The target table's database schema + table : str + Database table on which to put the given comment + geom_columns : list of str + Names of all geometry columns in the table. This is required to return + Geometry data type for those columns as SQL Alchemy does not recognize + them correctly. Defaults to ['geom']. + + Returns + ------- + list of dict + Resource fields + """ + + # handle geometry columns + if geom_columns is None: + geom_columns = ['geom'] + for col in geom_columns: + ischema_names[col] = Geometry + + table = Table(table, + MetaData(), + schema=schema, + autoload=True, + autoload_with=engine()) + + return [{'name': col.name, + 'description': '', + 'type': str(col.type).lower(), + 'unit': 'none'} + for col in table.c]