-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eurostat 2.0 #1041
base: 2.0
Are you sure you want to change the base?
Eurostat 2.0 #1041
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added my comments
src/viadot/sources/eurostat.py
Outdated
Structure for the Eurostat API connector. | ||
This module provides functionalities for connecting to Eurostat API and download | ||
the datasets. It includes the following features: | ||
- Pulling json file with all data from specific dataset. | ||
- Creating pandas Data Frame from pulled json file. | ||
- Creating dataset parameters validation if specified. | ||
Typical usage example: | ||
eurostat = Eurostat() | ||
eurostat.to_df( | ||
dataset_code: str, | ||
params: dict = None, | ||
columns: list = None, | ||
tests: dict = None, | ||
) | ||
Functions: | ||
get_parameters_codes(dataset_code: str, url: str): Validate available API request | ||
parameters and their codes. | ||
validate_params(dataset_code: str, url: str, params: dict): Validates given | ||
parameters against the available parameters in the dataset | ||
eurostat_dictionary_to_df(*signals: list): Function for creating DataFrame from | ||
JSON pulled from Eurostat | ||
to_df(dataset_code: str, params: dict = None, columns: list = None, | ||
tests: dict = None): Function responsible for getting response and creating | ||
DataFrame using method 'eurostat_dictionary_to_df' with validation of provided | ||
parameters and their codes if needed. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this info should be part of relevant docstrings (of the class and its methods); no need to repeat this info here.
Structure for the Eurostat API connector. | |
This module provides functionalities for connecting to Eurostat API and download | |
the datasets. It includes the following features: | |
- Pulling json file with all data from specific dataset. | |
- Creating pandas Data Frame from pulled json file. | |
- Creating dataset parameters validation if specified. | |
Typical usage example: | |
eurostat = Eurostat() | |
eurostat.to_df( | |
dataset_code: str, | |
params: dict = None, | |
columns: list = None, | |
tests: dict = None, | |
) | |
Functions: | |
get_parameters_codes(dataset_code: str, url: str): Validate available API request | |
parameters and their codes. | |
validate_params(dataset_code: str, url: str, params: dict): Validates given | |
parameters against the available parameters in the dataset | |
eurostat_dictionary_to_df(*signals: list): Function for creating DataFrame from | |
JSON pulled from Eurostat | |
to_df(dataset_code: str, params: dict = None, columns: list = None, | |
tests: dict = None): Function responsible for getting response and creating | |
DataFrame using method 'eurostat_dictionary_to_df' with validation of provided | |
parameters and their codes if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module provides an intermediate wrapper between the prefect flow and the connector: | ||
- Generate the Eurostat Cloud API connector. | ||
- Create and return a pandas Data Frame with the response of the API. | ||
Typical usage example: | ||
data_frame = eurostat_to_df( | ||
dataset_code: str, | ||
params: dict = None, | ||
columns: list = None, | ||
tests: dict = None, | ||
) | ||
Functions: | ||
eurostat_to_df( | ||
dataset_code: str, | ||
params: dict = None, | ||
columns: list = None, | ||
tests: dict = None, | ||
): | ||
Task to download data from Eurostat Cloud API. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This info should be part of relevant function docstrings, no need to repeat this info here.
This module provides an intermediate wrapper between the prefect flow and the connector: | |
- Generate the Eurostat Cloud API connector. | |
- Create and return a pandas Data Frame with the response of the API. | |
Typical usage example: | |
data_frame = eurostat_to_df( | |
dataset_code: str, | |
params: dict = None, | |
columns: list = None, | |
tests: dict = None, | |
) | |
Functions: | |
eurostat_to_df( | |
dataset_code: str, | |
params: dict = None, | |
columns: list = None, | |
tests: dict = None, | |
): | |
Task to download data from Eurostat Cloud API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module provides a prefect flow function to use the Eurostat connector: | ||
- Call to the prefect task wrapper to get a final Data Frame from the connector. | ||
- Upload that data to Azure Data Lake Storage. | ||
Typical usage example: | ||
eurostat_to_adls( | ||
dataset_code: str, | ||
params: dict = None, | ||
columns: list = None, | ||
tests: dict = None, | ||
adls_path: str = None, | ||
adls_credentials_secret: str = None, | ||
overwrite_adls: bool = False, | ||
adls_config_key: str = None, | ||
) | ||
Functions: | ||
eurostat_to_adls( | ||
dataset_code: str, | ||
params: dict = None, | ||
columns: list = None, | ||
tests: dict = None, | ||
adls_path: str = None, | ||
adls_credentials_secret: str = None, | ||
overwrite_adls: bool = False, | ||
adls_config_key: str = None, | ||
): | ||
Flow to download data from Eurostat Cloud API and upload to ADLS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this info should be part of relevant function docstrings, no need to repeat this info here.
This module provides a prefect flow function to use the Eurostat connector: | |
- Call to the prefect task wrapper to get a final Data Frame from the connector. | |
- Upload that data to Azure Data Lake Storage. | |
Typical usage example: | |
eurostat_to_adls( | |
dataset_code: str, | |
params: dict = None, | |
columns: list = None, | |
tests: dict = None, | |
adls_path: str = None, | |
adls_credentials_secret: str = None, | |
overwrite_adls: bool = False, | |
adls_config_key: str = None, | |
) | |
Functions: | |
eurostat_to_adls( | |
dataset_code: str, | |
params: dict = None, | |
columns: list = None, | |
tests: dict = None, | |
adls_path: str = None, | |
adls_credentials_secret: str = None, | |
overwrite_adls: bool = False, | |
adls_config_key: str = None, | |
): | |
Flow to download data from Eurostat Cloud API and upload to ADLS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A dictionary with optional URL parameters. The key represents the | ||
parameter ID, while the value is the code for a specific parameter, | ||
for example 'params = {'unit': 'EUR'}' where "unit" is the parameter | ||
to set and "EUR" is the specific parameter code. You can add more | ||
than one parameter, but only one code per parameter! So you CANNOT | ||
provide a list of codes, e.g., 'params = {'unit': ['EUR', 'USD', | ||
'PLN']}'. This parameter is REQUIRED in most cases to pull a specific | ||
dataset from the API. Both the parameter and code must be provided | ||
as a string! Defaults to None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this info should just be included in typing: params: dict[str, str] | None = None
. BTW you need to fix all the typing for the CI check to pass - I suggest running pre-commit locally before committing. See https://github.com/dyvenia/viadot/blob/2.0/CONTRIBUTING.md#pre-commit-hooks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/test_eurostat.py
Outdated
|
||
def test_eurostat_dictionary_to_df(): | ||
"""Test eurostat_dictionary_to_df method from source class.""" | ||
eurostat = EurostatMock(dataset_code="") # Możesz przekazać pusty string lub None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eurostat = EurostatMock(dataset_code="") # Możesz przekazać pusty string lub None | |
eurostat = EurostatMock(dataset_code="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/test_eurostat.py
Outdated
URL = ( | ||
"https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0" | ||
"/data/ILC_DI04?format=JSON&lang=EN" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be at the top of the module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/test_eurostat.py
Outdated
task = Eurostat(dataset_code="ILC_DI04").to_df() | ||
|
||
assert isinstance(task, pd.DataFrame) | ||
assert not task.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task = Eurostat(dataset_code="ILC_DI04").to_df() | |
assert isinstance(task, pd.DataFrame) | |
assert not task.empty | |
df = Eurostat(dataset_code="ILC_DI04").to_df() | |
assert isinstance(df, pd.DataFrame) | |
assert not df.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/test_eurostat.py
Outdated
task = Eurostat(dataset_code="ILC_DI04E") | ||
|
||
with pytest.raises(ValueError, match="DataFrame is empty!"): | ||
with caplog.at_level(logging.ERROR): | ||
task.to_df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task = Eurostat(dataset_code="ILC_DI04E") | |
with pytest.raises(ValueError, match="DataFrame is empty!"): | |
with caplog.at_level(logging.ERROR): | |
task.to_df() | |
eurostat = Eurostat(dataset_code="ILC_DI04E") | |
with pytest.raises(ValueError, match="DataFrame is empty!"): | |
with caplog.at_level(logging.ERROR): | |
eurostat.to_df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/unit/test_eurostat.py
Outdated
For a valid dataset code | ||
""" | ||
task = Eurostat(dataset_code="ILC_DI04").to_df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this test doesn't use the mocked source, so it's not a unit test. You need to rewrite it (and other tests below which also do the same) so that it doesn't actually connect to the API or move all these integration tests into tests/integration
directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
Adding Eurostat connector
Importance
To use Eurostat in Prefect 2.0
Checklist
This PR:
CONTRIBUTING.md
CHANGELOG.md