You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"Croissant is a high-level format for machine learning datasets that brings together four rich layers."
This issue tracks activities related to our collaboration with the Kaggle Team related to Croissant.
Mission
Make datasets easier to find and work with for Machine Learning, at scale and by diverse stakeholders (e.g. AI engineers, AI ethicists[e], compliance managers, interested public)
Vision
Croissant is the most convenient and widely used machine-readable format for ML-ready datasets.
Include a facet for the Search API for datasets that only have files that are truly open (no custom terms, no guestbooks).
Let Kaggle know how many dataset and bytes to expect when copying CC0 dataset from Harvard Dataverse (see notes from 2024-07-18 meeting and Slack)
Let Kaggle know the best way to see when datasets have changed
Commit data from Dataverse to Kaggle via CroissantML via a button, as an explicit action from the user. Is this part of a larger story around pushing data to other systems, such as data lakes?
Issues we've opened or are keeping an eye on
Depending on the outcome of these issues, we may enhance our Croissant implementation to cover additional use cases.
Overview
Mission
Vision
Issues
Issues we will probably work on
Issues we've opened or are keeping an eye on
Depending on the outcome of these issues, we may enhance our Croissant implementation to cover additional use cases.
<head>
mlcommons/croissant#646Related
The text was updated successfully, but these errors were encountered: