Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Datasets with loaders #430

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open

Conversation

Repcak2000
Copy link

No description provided.

@ofurman ofurman force-pushed the obsrai branch 2 times, most recently from f727c5b to 53b7c8a Compare April 17, 2024 14:58
@ofurman
Copy link

ofurman commented Apr 17, 2024

@RaczeQ @Calychas PR is ready, please check :)

Copy link

codecov bot commented Apr 17, 2024

Codecov Report

Attention: Patch coverage is 98.64253% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 90.62%. Comparing base (5622aa2) to head (aa686c0).

Current head aa686c0 differs from pull request most recent head f01ac4f

Please upload reports for the commit f01ac4f to get more accurate results.

Files Patch % Lines
srai/datasets/nyc_bike/dataset.py 93.93% 2 Missing ⚠️
srai/datasets/_base.py 94.44% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #430      +/-   ##
==========================================
+ Coverage   89.72%   90.62%   +0.89%     
==========================================
  Files          62       77      +15     
  Lines        2444     2635     +191     
==========================================
+ Hits         2193     2388     +195     
+ Misses        251      247       -4     
Flag Coverage Δ
macos-13-python3.12 ?
ubuntu-latest-python3.10 90.51% <98.64%> (+0.78%) ⬆️
ubuntu-latest-python3.11 90.55% <98.64%> (+0.82%) ⬆️
ubuntu-latest-python3.12 90.55% <98.64%> (+0.90%) ⬆️
ubuntu-latest-python3.9 90.53% <98.64%> (+0.82%) ⬆️
windows-latest-python3.12 90.55% <98.64%> (+0.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Repcak2000 Repcak2000 changed the title [WIP] HF loader addition HF loader addition Apr 17, 2024
@Repcak2000 Repcak2000 changed the title HF loader addition feat: Datasets with loaders Apr 17, 2024
Copy link
Collaborator

@piotrgramacki piotrgramacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from meeting

srai/datasets/_base.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@piotrgramacki piotrgramacki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions to make datasets easier to use and reuse with different datasets (not just ours)

srai/datasets/philadelphia_crime.py Outdated Show resolved Hide resolved
srai/loaders/hf_loader.py Outdated Show resolved Hide resolved
@Calychas
Copy link
Collaborator

Calychas commented Apr 19, 2024

Try to add tests to that and make it pass in ci/cd (you might need to wait on #438 as some issues will be resolved there)

srai/loaders/hf_loader.py Outdated Show resolved Hide resolved
srai/datasets/philadelphia_crime.py Outdated Show resolved Hide resolved
@Calychas
Copy link
Collaborator

Calychas commented Apr 19, 2024

Note:

  • change HFDataset to HuggingFaceDataset
  • Better export the datasets so no ".dataset" is needed

"""
AirbnbMulticity dataset.

Dataset description will be added.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add description

Defaults to None.

Returns:
gpd.GeoDataFrame: _description_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix description

srai/datasets/chicago_crime/dataset.py Show resolved Hide resolved
tests/datasets/test_datasets.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Calychas Calychas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that docs are added and example notebooks are generated there correctly.

Make hf datasets public and remove getting and setting the hf_tokens in the notebooks. (dont remove the functionality of passing the hf_token in the datasets)

Make hf_token and version parameters in load() in reverse order - load(version, hf_token), not load(hf_token, version). Update all the docstrings and usages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants