Eurostat 2.0 #1041

adrian-wojcik · 2024-09-19T08:16:35Z

Summary

Adding Eurostat connector

Importance

To use Eurostat in Prefect 2.0

Checklist

This PR:

follows the guidelines laid out in CONTRIBUTING.md
links relevant issue(s)
adds/updates tests (if appropriate)
adds/updates docstrings (if appropriate)
adds an entry in CHANGELOG.md

… eurostat_2.0

trymzet

Added my comments

CHANGELOG.md

trymzet · 2024-09-20T14:17:19Z

src/viadot/sources/eurostat.py

+
+Structure for the Eurostat API connector.
+
+This module provides functionalities for connecting to Eurostat  API and download
+the datasets. It includes the following features:
+- Pulling json file with all data from specific dataset.
+- Creating pandas Data Frame from pulled json file.
+- Creating dataset parameters validation if specified.
+
+Typical usage example:
+
+    eurostat = Eurostat()
+
+    eurostat.to_df(
+        dataset_code: str,
+        params: dict = None,
+        columns: list = None,
+        tests: dict = None,
+    )
+
+Functions:
+
+    get_parameters_codes(dataset_code: str, url: str): Validate available API request
+        parameters and their codes.
+    validate_params(dataset_code: str, url: str, params: dict): Validates given
+        parameters against the available parameters in the dataset
+    eurostat_dictionary_to_df(*signals: list): Function for creating DataFrame from
+        JSON pulled from Eurostat
+    to_df(dataset_code: str, params: dict = None, columns: list = None,
+        tests: dict = None): Function responsible for getting response and creating
+        DataFrame using method 'eurostat_dictionary_to_df' with validation of provided
+        parameters and their codes if needed.
+


All this info should be part of relevant docstrings (of the class and its methods); no need to repeat this info here.

Suggested change

Structure for the Eurostat API connector.

This module provides functionalities for connecting to Eurostat API and download

the datasets. It includes the following features:

- Pulling json file with all data from specific dataset.

- Creating pandas Data Frame from pulled json file.

- Creating dataset parameters validation if specified.

Typical usage example:

eurostat = Eurostat()

eurostat.to_df(

dataset_code: str,

params: dict = None,

columns: list = None,

tests: dict = None,

)

Functions:

get_parameters_codes(dataset_code: str, url: str): Validate available API request

parameters and their codes.

validate_params(dataset_code: str, url: str, params: dict): Validates given

parameters against the available parameters in the dataset

eurostat_dictionary_to_df(*signals: list): Function for creating DataFrame from

JSON pulled from Eurostat

to_df(dataset_code: str, params: dict = None, columns: list = None,

tests: dict = None): Function responsible for getting response and creating

DataFrame using method 'eurostat_dictionary_to_df' with validation of provided

parameters and their codes if needed.

trymzet · 2024-09-20T14:18:21Z

src/viadot/orchestration/prefect/tasks/eurostat.py

+
+This module provides an intermediate wrapper between the prefect flow and the connector:
+- Generate the Eurostat Cloud API connector.
+- Create and return a pandas Data Frame with the response of the API.
+
+Typical usage example:
+
+    data_frame = eurostat_to_df(
+        dataset_code: str,
+        params: dict = None,
+        columns: list = None,
+        tests: dict = None,
+    )
+
+Functions:
+
+    eurostat_to_df(
+        dataset_code: str,
+        params: dict = None,
+        columns: list = None,
+        tests: dict = None,
+    ):
+    Task to download data from Eurostat Cloud API.
+


This info should be part of relevant function docstrings, no need to repeat this info here.

Suggested change

This module provides an intermediate wrapper between the prefect flow and the connector:

- Generate the Eurostat Cloud API connector.

- Create and return a pandas Data Frame with the response of the API.

Typical usage example:

data_frame = eurostat_to_df(

dataset_code: str,

params: dict = None,

columns: list = None,

tests: dict = None,

)

Functions:

eurostat_to_df(

dataset_code: str,

params: dict = None,

columns: list = None,

tests: dict = None,

):

Task to download data from Eurostat Cloud API.

trymzet · 2024-09-20T14:19:16Z

src/viadot/orchestration/prefect/flows/eurostat_to_adls.py

+
+This module provides a prefect flow function to use the Eurostat connector:
+- Call to the prefect task wrapper to get a final Data Frame from the connector.
+- Upload that data to Azure Data Lake Storage.
+
+Typical usage example:
+
+    eurostat_to_adls(
+        dataset_code: str,
+        params: dict = None,
+        columns: list = None,
+        tests: dict = None,
+        adls_path: str = None,
+        adls_credentials_secret: str = None,
+        overwrite_adls: bool = False,
+        adls_config_key: str = None,
+    )
+
+Functions:
+
+    eurostat_to_adls(
+        dataset_code: str,
+        params: dict = None,
+        columns: list = None,
+        tests: dict = None,
+        adls_path: str = None,
+        adls_credentials_secret: str = None,
+        overwrite_adls: bool = False,
+        adls_config_key: str = None,
+    ):
+        Flow to download data from Eurostat Cloud API and upload to ADLS.


All this info should be part of relevant function docstrings, no need to repeat this info here.

Suggested change

This module provides a prefect flow function to use the Eurostat connector:

- Call to the prefect task wrapper to get a final Data Frame from the connector.

- Upload that data to Azure Data Lake Storage.

Typical usage example:

eurostat_to_adls(

dataset_code: str,

params: dict = None,

columns: list = None,

tests: dict = None,

adls_path: str = None,

adls_credentials_secret: str = None,

overwrite_adls: bool = False,

adls_config_key: str = None,

)

Functions:

eurostat_to_adls(

dataset_code: str,

params: dict = None,

columns: list = None,

tests: dict = None,

adls_path: str = None,

adls_credentials_secret: str = None,

overwrite_adls: bool = False,

adls_config_key: str = None,

):

Flow to download data from Eurostat Cloud API and upload to ADLS.

trymzet · 2024-09-20T14:22:16Z

src/viadot/orchestration/prefect/flows/eurostat_to_adls.py

+            A dictionary with optional URL parameters. The key represents the
+            parameter ID, while the value is the code for a specific parameter,
+            for example 'params = {'unit': 'EUR'}' where "unit" is the parameter
+            to set and "EUR" is the specific parameter code. You can add more
+            than one parameter, but only one code per parameter! So you CANNOT
+            provide a list of codes, e.g., 'params = {'unit': ['EUR', 'USD',
+            'PLN']}'. This parameter is REQUIRED in most cases to pull a specific
+            dataset from the API. Both the parameter and code must be provided
+            as a string! Defaults to None.


All this info should just be included in typing: params: dict[str, str] | None = None. BTW you need to fix all the typing for the CI check to pass - I suggest running pre-commit locally before committing. See https://github.com/dyvenia/viadot/blob/2.0/CONTRIBUTING.md#pre-commit-hooks

trymzet · 2024-09-23T08:32:36Z

tests/unit/test_eurostat.py

+
+def test_eurostat_dictionary_to_df():
+    """Test eurostat_dictionary_to_df method from source class."""
+    eurostat = EurostatMock(dataset_code="")  # Możesz przekazać pusty string lub None


Suggested change

eurostat = EurostatMock(dataset_code="") # Możesz przekazać pusty string lub None

eurostat = EurostatMock(dataset_code="")

trymzet · 2024-09-23T08:34:37Z

tests/unit/test_eurostat.py

+URL = (
+    "https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0"
+    "/data/ILC_DI04?format=JSON&lang=EN"
+)


This should be at the top of the module

trymzet · 2024-09-23T08:38:45Z

tests/unit/test_eurostat.py

+    task = Eurostat(dataset_code="ILC_DI04").to_df()
+
+    assert isinstance(task, pd.DataFrame)
+    assert not task.empty


Suggested change

task = Eurostat(dataset_code="ILC_DI04").to_df()

assert isinstance(task, pd.DataFrame)

assert not task.empty

df = Eurostat(dataset_code="ILC_DI04").to_df()

assert isinstance(df, pd.DataFrame)

assert not df.empty

trymzet · 2024-09-23T08:39:17Z

tests/unit/test_eurostat.py

+    task = Eurostat(dataset_code="ILC_DI04E")
+
+    with pytest.raises(ValueError, match="DataFrame is empty!"):
+        with caplog.at_level(logging.ERROR):
+            task.to_df()


Suggested change

task = Eurostat(dataset_code="ILC_DI04E")

with pytest.raises(ValueError, match="DataFrame is empty!"):

with caplog.at_level(logging.ERROR):

task.to_df()

eurostat = Eurostat(dataset_code="ILC_DI04E")

with pytest.raises(ValueError, match="DataFrame is empty!"):

with caplog.at_level(logging.ERROR):

eurostat.to_df()

trymzet · 2024-09-23T08:40:37Z

tests/unit/test_eurostat.py

+
+    For a valid dataset code
+    """
+    task = Eurostat(dataset_code="ILC_DI04").to_df()


It looks like this test doesn't use the mocked source, so it's not a unit test. You need to rewrite it (and other tests below which also do the same) so that it doesn't actually connect to the API or move all these integration tests into tests/integration directory.

adrian-wojcik and others added 5 commits September 4, 2024 09:59

🚀 Added Eurostat connector with tests

052084c

🚀 Adding entry in changelog

ec0268a

azure dependency for prefect alligned

e46e775

🐛 Add missing filter_df_columns function to utils

c58a02d

Merge branch 'eurostat_2.0' of https://github.com/dyvenia/viadot into…

0d7989d

… eurostat_2.0

trymzet requested changes Sep 23, 2024

View reviewed changes

adrian-wojcik added 8 commits September 23, 2024 11:02

🎨 Refactor docstrings in Eurostat source class

aa96848

🎨 Add Tsignal stributes in docstring and change typing

f0faa60

🎨 Change typing in methods

27d4267

🎨 Change docstrings in Eurostat task

5a7a8f4

🎨 Refactor Eurostat flow dosctrings

b4f045c

🎨 Improve typing

6e8c5aa

🎨 Changed test function name

c374bae

♻️ Refactor unitests for Eurostat

7824be9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eurostat 2.0 #1041

Eurostat 2.0 #1041

adrian-wojcik commented Sep 19, 2024 •

edited

Loading

trymzet left a comment

trymzet Sep 20, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 20, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 20, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 20, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 23, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 23, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 23, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 23, 2024

adrian-wojcik Sep 23, 2024

trymzet Sep 23, 2024

adrian-wojcik Sep 23, 2024

	eurostat = EurostatMock(dataset_code="") # Możesz przekazać pusty string lub None
	eurostat = EurostatMock(dataset_code="")

Eurostat 2.0 #1041

Are you sure you want to change the base?

Eurostat 2.0 #1041

Conversation

adrian-wojcik commented Sep 19, 2024 • edited Loading

Summary

Importance

Checklist

trymzet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrian-wojcik commented Sep 19, 2024 •

edited

Loading