-
-
Notifications
You must be signed in to change notification settings - Fork 378
Description
Datashader is agnostic about file formats, working with anything that can be loaded into a dataframe-like object (currently supporting Pandas and Dask dataframes). But because datashader focuses on having good performance for large datasets, the performance of the file format is a major factor in the usability of the library. Thus we should use examples that serve to guide users towards good solutions for their own problems, recommending and demonstrating approaches that we find to work well.
Right now, our examples use CSV and castra or HDF5 formats. It is of course important to show a CSV example, since nearly every dataset can be obtained in CSV for import into the library. However, CSV is highly inefficient in both file size and reading speed, and it also truncates floating-point precision in ways that are problematic when zooming in closely to a dataset.
Castra is a relatively high-performance binary format that works well for the large datasets in the examples, but it is not yet a mature project, and is not available on the main conda channel. Should we invest in making castra be more fully supported? If not, I think we should choose another binary format (HDF5?) to use for our examples.