Description
openedon Nov 30, 2020
Background
I've been looking at Circutscape.jl as an interesting use case for DataSets.jl. Here's a design for how DataSets could support circuitscape user workflows.
Circuitscape is an interesting case because it's a complete application with existing data management code etc — there's the Circuitscape.compute()
function which takes a config file and uses that to discover the input data and output location, and the Circuitscae.start()
function which is a wizard which helps users create such a config file0. Because DataSets
tries to do IO management and data discovery, some of the data discovery parts of Circuitscape should be replaced with a DataSets-based interface.
I think users should be able to interactively
- Manage their project datasets — provided by the data REPL (in future, perhaps some GUI data browser)
- Launch circuitscape jobs — provided by a data REPL
run
command.
Workflow example
Here's a quick sketch of the workflow:
The wizard Circuitscape.start()
acts as it does currently, but instead of linking to existing data in some arbitrary location in the filesystem, it copies the data into a new DataSet. The type of that dataset can be CircuitScapeInput
or some such — internally it's just backed by the exact same directory structure as Circutscape currently has.
data> run circuitscape # If run with no data, calls start (?)
# wizard steps ...
[ Info: Created new input dataset `raster_pairwise_1`
data>
I'm imagining that the Circuitscape.compute()
would be replaced by the data REPL run
command, and add functionality for listing which data is available for running with. Something like:
Available circuitscape input data:
📂 raster_pairwise_1 type=CircuitScapeInput
📂 raster_one_to_all_1 type=CircuitScapeInput
data> run circuitscape raster_pairwise_1 output1!
[ Info: ...
data> ls
📂 output_1 type=CircuitScapeOutput
📂 raster_pairwise_1 type=CircuitScapeInput
📂 raster_one_to_all_1 type=CircuitScapeInput
For run to work, the data REPL needs to be resurrected and taught look at the database of entry points which is currently set up by @datafunc
. Then circuitscape would declare several data entry points @datafunc circuitscape
to hook into data> run
.