Skip to content

Conversation

@m-muecke
Copy link
Collaborator

Closes: #1

@m-muecke m-muecke requested a review from fabian-s November 24, 2024 16:59
@fabian-s
Copy link
Contributor

covar <- readr::read_csv(here::here("data-raw", "covariate.csv"), )
activity <- readr::read_csv(here::here("data-raw", "activity.csv"))

can you add those files to data-raw please?

@m-muecke
Copy link
Collaborator Author

covar <- readr::read_csv(here::here("data-raw", "covariate.csv"), )
activity <- readr::read_csv(here::here("data-raw", "activity.csv"))

can you add those files to data-raw please?

covariate.csv is not a problem since its only 2 kb, but activity.csv is 230 mb

@m-muecke
Copy link
Collaborator Author

covar <- readr::read_csv(here::here("data-raw", "covariate.csv"), )
activity <- readr::read_csv(here::here("data-raw", "activity.csv"))

can you add those files to data-raw please?

covariate.csv is not a problem since its only 2 kb, but activity.csv is 230 mb

@fabian-s whats your preference, since the activity.csv file is too large?

@fabian-s
Copy link
Contributor

sorry, need to think about this a little more --
seems like a general question on how/where/whether to preserve the raw, un-pre-processed data for a data package like this.

one idea would be putting the resources that belong into /data-raw into a zenodo or figshare deposit and accessing it from there in a reproducible manner.
other idea is using Git LFS, but i just know that that exists not how to use it.... 🙈

@m-muecke
Copy link
Collaborator Author

sorry, need to think about this a little more -- seems like a general question on how/where/whether to preserve the raw, un-pre-processed data for a data package like this.

one idea would be putting the resources that belong into /data-raw into a zenodo or figshare deposit and accessing it from there in a reproducible manner. other idea is using Git LFS, but i just know that that exists not how to use it.... 🙈

Makes sense, there is also the GitHub large file storage: https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage

@m-muecke
Copy link
Collaborator Author

m-muecke commented Jul 27, 2025

I think we should move forward with using GitHub LSF, as it looks to be very well integrated with the Git repo.
Another option would be to store the files as parquet. The activity CSV shrinks from 200mb to 34mb using Parquet, and I don't think there is much, if any, benefit in keeping the actual CSV.

Having a tidyfun zenodo would actually also be quite lovely. Downloading the records from R is also very straightforward, for the Monash time series zendo I've written a script here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

accelerometry data

3 participants