Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Zenodo download manager #697

Merged
merged 4 commits into from
Nov 2, 2022
Merged

Create a Zenodo download manager #697

merged 4 commits into from
Nov 2, 2022

Conversation

rouille
Copy link
Collaborator

@rouille rouille commented Nov 1, 2022

Pull Request doc

Purpose

Build the Zenodo class around the Zenodo Rest API to efficiently handle data coming from different record. Partially addresses #687.

What the code is doing

  • downloads data of a user specified record
  • calculate checksum of previously downloaded data and erase then download new copy if necessary
  • print information about record (title, version, etc)

Testing

Manual testing.

Where to look

  • write Zenodo class in powersimdata.network.zenodo module
  • update the TUB class located in the powersimdata.network.europe_tub.model module
  • remove obsolete zenodo_get package from Pipfile and requirements.txt

Usage Example/Visuals

>>> from powersimdata.network.zenodo import Zenodo
>>> z = Zenodo("3601881")
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
>>> z.load_data("powersimdata/network/europe_tub")
100% [....................................................................] 1784815481 / 1784815481
networks.zip (1702.1 MB)
>>> z.load_data("powersimdata/network/europe_tub")
networks.zip has been downloaded previously
>>> z = Zenodo("7251657")
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
>>> z.load_data("powersimdata/network/europe_tub")
networks.zip has been downloaded previously

We can now do that:

>>> tub = TUB("Europe", zenodo_record_id="latest", reduction=128)
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
networks.zip has been downloaded previously
>>> tub.build()
INFO:pypsa.io:Imported network elec_s_128_ec.nc has buses, carriers, generators, lines, links, loads, storage_units, stores
>>> tub = TUB("Europe", reduction=128)
Title: PyPSA-Eur: An Open Optimisation Model of the European Transmission System (Dataset)
Publication date: 2022-09-20
Version: v0.6.1
DOI: 10.5281/zenodo.7251657
networks.zip has been downloaded previously
>>> tub.build()
INFO:pypsa.io:Imported network elec_s_128_ec.nc has buses, carriers, generators, lines, links, loads, storage_units, stores

Time estimate

30min

@jenhagg
Copy link
Collaborator

jenhagg commented Nov 1, 2022

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

@rouille
Copy link
Collaborator Author

rouille commented Nov 1, 2022

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

powersimdata/network/europe_tub/model.py Show resolved Hide resolved
import requests
import wget

url = "https://zenodo.org/api/records/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Not sure whether we should put it in a const module.

powersimdata/network/zenodo.py Outdated Show resolved Hide resolved
powersimdata/network/zenodo.py Outdated Show resolved Hide resolved
powersimdata/network/zenodo.py Show resolved Hide resolved
powersimdata/network/zenodo.py Outdated Show resolved Hide resolved
@BainanXia
Copy link
Collaborator

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

I'm about to say the same thing since it's in STL.

@jenhagg
Copy link
Collaborator

jenhagg commented Nov 1, 2022

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

We can do this with requests too:

def _wget(url, filename, size=None):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        if size is None:
            size = r.headers.get("Content-Length")
        with open(filename, "wb") as f:
            with tqdm(
                unit="B",
                unit_scale=True,
                unit_divisor=1024,
                miniters=1,
                total=size,
            ) as pbar:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
                    pbar.update(len(chunk))

@rouille
Copy link
Collaborator Author

rouille commented Nov 1, 2022

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

Would you recommend using urllib instead?

We can do this with requests too:

def _wget(url, filename, size=None):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        if size is None:
            size = r.headers.get("Content-Length")
        with open(filename, "wb") as f:
            with tqdm(
                unit="B",
                unit_scale=True,
                unit_divisor=1024,
                miniters=1,
                total=size,
            ) as pbar:
                for chunk in r.iter_content(chunk_size=8192):
                    f.write(chunk)
                    pbar.update(len(chunk))

Done

@rouille rouille force-pushed the ben/zenodo branch 2 times, most recently from 10345b6 to 763b1ac Compare November 2, 2022 05:38
@rouille
Copy link
Collaborator Author

rouille commented Nov 2, 2022

We might want to update Pipfile.lock. Also, looks like wget is only installed as part of zenodo_get, so we should add that explicitly (otherwise it will be removed when the lock file is regenerated). I also noticed the wget package hasn't been updated since 2015, but.. if it works, that's probably fine.

The update of the Pipfile.lock will be taken care of in a separate PR as some tests fail due to an updated version of pandas

Copy link
Collaborator

@jenhagg jenhagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@rouille rouille merged commit b2df44a into develop Nov 2, 2022
@rouille rouille deleted the ben/zenodo branch November 2, 2022 20:04
@jenhagg jenhagg mentioned this pull request Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants