This repository contains Common Lisp code for exporting sample data and storing it in an LMDB database. The data to be exported can be identified using RDF and will later be used directly from GN3. We provide a basic api for reading this data into a multi-dimensional array and storing it into lmdb as json.
Features:
- Inbuilt Versioning
- Metadata storage
- Garbage collection for any unused data
Sample Data represents data from an experiment. Here’s how that would look like in from of a CSV file:
Strain Name,Value,SE,Count,Sex
BXD1,18,x,0
BXD12,16,x,x
BXD14,15,x,x
BXD15,14,x,x
In essence, the above data could be represented as a matrix with the headers being extra metadata that could be related to the above data. This data could be represented as a vector of vectors which would look like this:
(("#BXD1" 18 "x" 0)
("#BXD12" 16 "x" "x")
("#BXD14" 15 "x" "x")
("#BXD15" 14 "x" "x"))
And the metadata associated with it would be:
(("header" . #("Strain Name" "Value" "SE" "Count" "Sex")))
To enter data into a database, use “import-into-sampledata-db-data”. Here’s an example of how to do that:
(let ((data (make-sampledata
:matrix
(make-array
'(4 4)
:initial-contents
'(("#BXD1" 18 "x" 0)
("#BXD12" 16 "x" "x")
("#BXD14" 15 "x" "x")
("#BXD15" 14 "x" "x")))
:metadata
'(("header" . #("Strain Name" "Value" "SE" "Count" "Sex"))))))
(import-into-sampledata-db data "/tmp/BXD/10007/"))
To read the data back into a matrix object:
;; Retrieving the current matrix
(with-sampledata-db (db "/tmp/BXD/10007/" :write t)
(sampledata-db-current-matrix db))
which outputs the following struct:
#S(SAMPLEDATA-DB-MATRIX
:DB #<DB NIL {1004DA5B63}>
:HASH NIL
:NROWS 4
:NCOLS 4
:ROW-POINTERS NIL
:COLUMN-POINTERS NIL
:ARRAY #2A(("#BXD1" 18 "x" 0)
("#BXD12" 16 "x" "x")
("#BXD14" 15 "x" "x")
("#BXD15" 14 "x" "x"))
:TRANSPOSE #2A(("#BXD1" "#BXD12" "#BXD14" "#BXD15")
(18 16 15 14)
("x" "x" "x" "x")
(0 "x" "x" "x")))
To obtain the data:
(with-sampledata-db (db "/tmp/BXD/10007/" :write t)
(sampledata-db-matrix-array
(sampledata-db-current-matrix db)))
To print information about the database, use the following print-sampledata-db-info
:
(print-sampledata-db-info "/tmp/BXD/10007/")
which would output something like:
Path: /tmp/BXD/10007/
Versions: 4
Keys: 26
Version 1
Dimensions: 4 x 4
Version 2
Dimensions: 2 x 4
Version 3
Dimensions: 2 x 4
Version 4
Dimensions: 2 x 4
Drop into a development environment with:
guix shell -m manifest.scm
To build the dump binary, run:
sbcl --load build.lisp
Dump sampledata from /tmp/dataset-dump/BXDPublish
to the base directory: $HOME/sampledata-lmdb
:
./dump import /tmp/dataset-dump/BXDPublish $HOME/sampledata-lmdb/
If $HOME/sampledata-lmdb/
does not exist, it will be created on the fly.
To print info, say for a the matrix database $HOME/sampledata-lmdb/10007/
:
./dump info $HOME/sampledata-lmdb/10007/
- [X] Dump actual data
- [X] Make this accessible from the CLI
- [X] Read access from Python
- [ ] Previous version access