Virtual large arrays and lazy evaluation.
For example, we can combine multiple array data sources into a single virtual array:
>>> first_time_series = OrthoArrayAdapter(hdf_var_a) >>> second_time_series = OrthoArrayAdapater(hdf_var_b) >>> print first_time_series.shape, second_time_series.shape (52000, 800, 600) (56000, 800, 600) >>> time_series = biggus.LinearMosaic([first_time_series, second_time_series], axis=0) >>> time_series <LinearMosaic shape=(108000, 800, 600) dtype=dtype('float32')>
Any biggus Array can then be indexed, independent of underlying data sources:
>>> time_series[51999:52001, 10, 12] <LinearMosaic shape=(2,) dtype=dtype('float32')>
And an Array can be converted to a numpy ndarray on demand:
>>> time_series[51999:52001, 10, 12].ndarray() array([ 0.72151309, 0.54654914], dtype=float32)
Biggus was originally developed primarily to provide large data handling for Iris .
For most applications it has now been surpassed by Dask, which is used by the majority of other projects working with virtual arrays.
Thus, Dask is now generally preferred to Biggus.
Iris itself now uses Dask in place of Biggus, from version 2.0 onwards. As Biggus is little used by projects other than Iris, any future development and support will depend on interest from other groups.
We therefore recommend that new code, at least, should now use Dask instead.
Likewise, you may want to consider porting existing code to Dask, especially if it has demanding requirements, or encounters bugs in Biggus, or requires new features to be added.
To get more ideas of what Biggus can do, please browse the wiki, and its examples.
If you have any questions or feedback please feel free to post to the discussion group or raise an issue on the issue tracker.