Skip to content

Conversation

polinaeterna
Copy link
Contributor

We can pass split to _split_generators().
But I'm not sure if it's possible to solve cache issues, mostly with dataset_info.json

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

As discussed in our weekly meeting:

  • I already started to work on this long ago: you can see the corresponding PR for context:
  • The caching issue should be addressed first:
    • Currently, the cache only checks for the existence of the corresponding dataset directory
    • It should be enhanced and check the content of that directory, because it might be partially filled with only some of the splits

@albertvillanova
Copy link
Member

My previous comment didn't create the retro-link in the PR. I write it here again.

You can check the context and the discussions we had about this feature enhancement in this PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants