Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Include generating file(s) in the repo #111

Closed
adrinjalali opened this issue Aug 23, 2022 · 9 comments
Closed

RFC Include generating file(s) in the repo #111

adrinjalali opened this issue Aug 23, 2022 · 9 comments

Comments

@adrinjalali
Copy link
Member

While thinking about #110 I realized it'd be much easier to fix existing issues in repos if the file generating the model and the modelcard was included in the repo.

The question is, do we want to include the scripts generating them in the repo, and if yes, how.

cc @skops-dev/maintainers , @julien-c, @osanseviero , @LysandreJik

@BenjaminBossan
Copy link
Collaborator

In general, I'm in favor of this, because it makes it easy to replicate everything. I would not go as far as to force this but it should be encouraged.

Maybe we can think of a standard. E.g. for models, there could be a train.py script that generates the model artifact, config, and model card when called (without arguments). The requirements file should be sufficient to run that script.

@adrinjalali
Copy link
Member Author

I agree that it shouldn't be a requirement, but I'm not sure of the API to let this happen.

@BenjaminBossan
Copy link
Collaborator

but I'm not sure of the API to let this happen.

You mean an API from the skops side to provide this functionality? I don't think it's really possible. My thought was rather to lead by example.

@adrinjalali
Copy link
Member Author

Yeah, I was thinking something like having an API which would take __file__ and save it to the model folder, but I'm not sure how clean it is.

@BenjaminBossan
Copy link
Collaborator

Hmm, wouldn't that result in users potentially uploading files that they may not have intended to be uploaded?

@merveenoyan
Copy link
Collaborator

I think it makes sense and we should have it, just like they do with Space repositories (app.py file). I also agree with @BenjaminBossan that it might cause users to potentially upload files that they don't want in the repository. Also they might be putting a couple of API keys in their scripts or notebooks if it's generated by a notebook.
We could take the file during init and by default not have it inside. Maybe also add a warning for users to check if there's any variable that shouldn't be public.

@adrinjalali
Copy link
Member Author

So instead of adding it automatically, we could have a method like hub_utils.add_file which can also accept __file__ as input value.

@BenjaminBossan
Copy link
Collaborator

So instead of adding it automatically, we could have a method like hub_utils.add_file which can also accept __file__ as input value.

I'll take a look at that

@adrinjalali
Copy link
Member Author

I think we can close this one, and try to add generating files to existing repos, we should clean them up at some point anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants