Description
Background
Presently, ChunkDB
s are built as an enumeration of "chunk" maps;
Drasil/code/drasil-database/lib/Database/Drasil/ChunkDB.hs
Lines 165 to 183 in 10f9f9e
Problem
i. We are unable to add (almost) arbitrary data into the ChunkDBs for various different systems.
ii. We manually track chunks in lists (by a single type, for later further processing) in SystemInformation
when we already have them registered in the ChunkDB.
Drasil/code/drasil-database/lib/Database/Drasil/SystemInformation.hs
Lines 45 to 53 in 10f9f9e
iii. Adding arbitrary data to a ChunkDB is impossible without editing core
drasil-database
files, and difficult when editing core drasil-database
files because it affects compilation of other examples.iv. Working with similar maps requires redundant manual recreation (swhs and nopcm has a bit of this).
v. We have occurrences of conflicting UIDs because they are registered in different maps.
Potential Solution : Merging Maps into a single one
I think we can solve these 5 or so problems by merging our maps into a single Map, and using more of the core Data.Map
functionality.
We will need to hide type information to create a homogeneous Chunk type, and then merge our maps into a single Chunk Map.
The solution would, approximately, be as follows:
(1) Assume we have the following UID type, and typeclasses:
type UID = String
-- | An interface to get the UID of a chunk.
class HasUID a where
uid :: a -> UID
-- | Dump data to Strings for debugging-purposes.
class DrasilDumpable a where
dump :: a -> String
(2) We build a Chunk
type intended to carry arbitrary data in our final ChunkDB
:
import Data.Typeable -- this is an important design decision (as opposed to Type.Reflection) for simplicity
...
data Chunk where -- Personally, I find GADTs to be more readable, but this can be done without them via ExistentialQuantification
CHUNK :: (HasUID t, DrasilDumpable t, Typeable t) => t -> Chunk
instance HasUID Chunk where uid (CHUNK t) = uid t
instance DrasilDumpable Chunk where dump (CHUNK t) = dump t
...
Here, we build a commonality between all chunks registered in Drasil (e.g., they ALL must have a UID [a design decision], they all must be Typeable [Thanks to GHC 8.2+, this is the new 'normal'], and they all must be DrasilDumpable
[not really required, but I think this would help in debugging -- this could also be something that we mass-dump into a single JSON file for our stable examples to show all raw knowledge we use to build up softifacts]). Additionally, we discard the real type information because the Typeable allows us to retrieve it later, and it will be helpful in creating a homogeneous map type.
(3) Thanks to capturing all of our chunks in a single Chunk type, we can construct a single homogeneous ChunkDB
type:
type ChunkDB = M.Map UID Chunk
(4) We build a few functions to help with registering chunks into the ChunkDB:
-- Registering a list of like-typed chunks
chunksToChunkDB :: (HasUID t, Typeable t, DrasilDumpable t) => [t] -> ChunkDB
chunksToChunkDB = M.fromList . map (\x -> (uid x, CHUNK x))
-- Registering
registerChunk :: (HasUID t, Typeable t, DrasilDumpable t) => t -> ChunkDB -> ChunkDB
registerChunk c = M.insert (uid c) (CHUNK c)
-- Retrieval (I prefer using `Maybe` for prototyping instead of forced errors, but this would likely be a strict value or an error in practice)
retrieveChunk :: Typeable a => ChunkDB -> UID -> Maybe a
retrieveChunk cdb u = do
(CHUNK r) <- M.lookup u cdb
cast r
(5) When we want to pull arbitrary lists of chunks from the db by their TypeRep
s, we can via casting:
retrieveChunksByType :: (HasUID t, DrasilDumpable t, Typeable t) => ChunkDB -> TypeRep -> [t]
retrieveChunksByType cdb tr = relevantChunks
where
allChunks = M.toList cdb
relevantChunks = mapMaybe (\(_, CHUNK c) -> if typeOf c == tr then cast c else Nothing) allChunks
NOTE: There is one problem here (which I go into small discussion into in my prototype), but it's not too important here. Essentially, the if typeOf c == tr then cast c else Nothing
seems to be too strict of a requirement, and we might want to lessen to just a cast c
(in which case, we can get rid of the then redundant TypeRep
parameter), but I'm not entirely certain yet (I will have to do more research on this).
Of course, whenever we want to do this, it will be rather inefficient, so we can create a cache:
(6) A Map of ChunkDBs by their TypeReps:
type ChunksByTypeRep = M.Map TypeRep ChunkDB
dumpChunkDBToTypeRepMap :: ChunkDB -> ChunksByTypeRep
dumpChunkDBToTypeRepMap cdb = allRegistered
where
-- gather a list of registered types
knownChunkTypes :: [TypeRep]
knownChunkTypes = nub . map (typeOf . snd) $ M.toList cdb
-- gather types with a list of all registered chunks that are of that type
chunksWithTypes :: [(TypeRep, [Chunk])]
chunksWithTypes = map (\x -> (,) x $ retrieveChunksByTypeInChunkBox cdb x) knownChunkTypes
-- reconcile everything found, into a single ChunksByTypeRep map
allRegistered :: ChunksByTypeRep
allRegistered = M.fromList $ map (second chunksToChunkDB) chunksWithTypes
This still might not be the right solution for us because it constructs the ChunksByTypeRepMap
after having fully constructed a single ChunkDB
, but we can easily create an alternative solution which registers chunks in both maps when registering them.
type ChunkDB' = (ChunkDB, ChunksByTypeRep)
-- There is one small problem here w.r.t uniqueness of UIDs and ensuring that the TypeRepMap is an accurate `view` of the `ChunkDB`
-- To fix it, we would just need to add an extra lookup to the initial chunkDB, removing it from the TypeRepMap manually before re-adding it to both maps (not a large problem, but I noticed it while typing up this ticket)
registerChunk' :: (HasUID t, Typeable t, DrasilDumpable t) => t -> ChunkDB' -> ChunkDB'
registerChunk' c (cdb, trcdb) = (cdb', trcdb')
where
u = uid c
chk = CHUNK c
cdb' = M.insert u chk cdb
trcdb' = M.alter
(maybe
(Just $ M.singleton u chk)
(Just . M.insert u chk))
(typeOf c)
trcdb
retrieveChunk' :: Typeable a => ChunkDB' -> UID -> Maybe a
retrieveChunk' (cdb, _) u = do
(CHUNK r) <- M.lookup u cdb
cast r
retrieveChunksByType' :: Typeable a => ChunkDB' -> TypeRep -> [a]
retrieveChunksByType' (_, trcdb) tr = maybe [] (mapMaybe ((\(CHUNK c) -> cast c) . snd) . M.toList) (M.lookup tr trcdb)
Does it solve all noted problems?
i. We are unable to add (almost) arbitrary data into the ChunkDBs for various different systems.
Yes, so as long all all relevant data is typeable, has a UID, and (optionally) can be dumped to a String (DrasilDumpable
).
ii. We manually track chunks in lists (by a single type, for later further processing) in SystemInformation
when we already have them in nicely placed maps.
Yes, we would have additional functionality for grabbing all chunks in a chunkdb by a particular TypeRep. However, we lose out on the ability of polymorphic types with type constraint restrictions, but it seems that we stick to a small group of types anyway, so this ability may not be needed.
iii. Adding allowed datatypes to a ChunkDB is impossible without editing core drasil-database
files, and difficult when editing core drasil-database
files because it affects compilation of other examples.
Yes, we will be solving both problems here because the data registerable is nearly arbitrary. Through this, ChunkDB
should become mostly stable and unchanging.
iv. Working with similar maps requires redundant manual recreation (swhs and nopcm has a bit of this).
Yes, since ChunkDB
is now a single type, we will be able to use Data.Map
's abilities to allow for easier merging of ChunkDBs and such.
v. We have occurrences of conflicting UIDs because they are registered in different maps.
Yes. By merging in a single map, we ensure uniqueness is upheld in UIDs.
In general, I think this would decrease LOC while allowing us to generally get more out of our ChunkDBs.
Notable issues
I can only think of 2 issues, but I'm unsure of how impactful they are.
- We're using
Data.Typeable
. It appears that this library is largely unstable pre-GHC-8.2, and previously was designed differently. Additionally, since we are discarding type information and later re-gaining it, we will have some amount of runtime impact, but I'm unsure of how much. - Retrieving chunks en masse by a single TypeRep is tricky, but I think we can sufficiently manage it. In (5), I briefly mentioned this. However, with a bit of hard work, we can mitigate the overhead -- we should be able to create functions similar to
makeLenses
which hides the difficulty associated with creating TypeReps and casting chunks/lists of chunks.
I'm posting this ticket earlier rather than later because I think it's a good proof of concept, but it still has problems. I don't think the problems are too 'breaking', but it would be best to discuss the base before I spend more time on this design.
Prototype
I've made, and, lightly, tested this in my mini-development repo: https://github.com/balacij/ExprTests/blob/f7321d55611e7721a339188c79b66a3c6258ee8c/src/Lang/ChunkDB.hs
If you'd like to open up this code locally, I've currently setup stack run
to run through a few small tests with this ChunkDB prototype.
Metadata
Metadata
Assignees
Projects
Status