-
Notifications
You must be signed in to change notification settings - Fork 22
Description
I was having a look at the size of DIALS data-files, to see if there are things we can do to avoid things like download time outs that affect CI tests. The largest sub-directory is image_examples/. Just the top 3 largest files take over 70 MB of space:
-rw-rw-r-- 1 fcx32934 fcx32934 33M Oct 27 11:02 APS_22ID-mar300.0001
-rw-rw-r-- 1 fcx32934 fcx32934 20M Apr 7 2025 APS_19ID-q315_unbinned_a.0001.img.bz2
-rw-rw-r-- 1 fcx32934 fcx32934 18M Oct 27 11:02 MacScience-reallysurprise_001.ipf
We could save a bit of space by compressing two of these, but before doing that I'd like to explore what value there is in keeping these files. They are used in test_experiment_files.py (and test_filecache.py in the case of MacScience-reallysurprise_001.ipf).
We have good support for old file formats in dxtbx, and yet it is far from complete. If we were actually aiming for comprehensive support then I would be in favour of keeping these files and finding examples from all other missing instruments. However, I think the work involved in making dxtbx truly comprehensive is far beyond our resources. So, in that case is there really any value in testing this support for a smattering of legacy file formats?
I just picked these 3 files as the largest, but there are many other images from legacy detectors in this directory that I think have limited value.