Merging chunk maps into a single map via `Data.Typeable`

# Background
Presently, `ChunkDB`s are built as an enumeration of "chunk" maps;
https://github.com/JacquesCarette/Drasil/blob/10f9f9ed0e332c7a31875f6e035db96549c45945/code/drasil-database/lib/Database/Drasil/ChunkDB.hs#L165-L183

# Problem
i. We are unable to add (almost) arbitrary data into the ChunkDBs for various different systems.
ii. We manually track chunks in lists (by a single type, for later further processing) in `SystemInformation` when we already have them registered in the ChunkDB.
https://github.com/JacquesCarette/Drasil/blob/10f9f9ed0e332c7a31875f6e035db96549c45945/code/drasil-database/lib/Database/Drasil/SystemInformation.hs#L45-L53
iii. Adding arbitrary data to a ChunkDB is impossible without editing core `drasil-database` files, and difficult when editing core `drasil-database` files because it affects compilation of other examples.
iv. Working with similar maps requires redundant manual recreation (swhs and nopcm has a bit of this).
v. We have occurrences of conflicting UIDs because they are registered in different maps.

# Potential Solution : Merging Maps into a single one
I think we can solve these 5 or so problems by merging our maps into a single Map, and using more of the core `Data.Map` functionality.

We will need to hide type information to create a homogeneous Chunk type, and then merge our maps into a single Chunk Map.

The solution would, approximately, be as follows:
(1) Assume we have the following UID type, and typeclasses:
```haskell
type UID = String

-- | An interface to get the UID of a chunk.
class HasUID a where
    uid :: a -> UID

-- | Dump data to Strings for debugging-purposes.
class DrasilDumpable a where
    dump :: a -> String
```

(2) We build a `Chunk` type intended to carry arbitrary data in our final `ChunkDB`:
```haskell
import Data.Typeable -- this is an important design decision (as opposed to Type.Reflection) for simplicity
...
data Chunk where  -- Personally, I find GADTs to be more readable, but this can be done without them via ExistentialQuantification
    CHUNK :: (HasUID t, DrasilDumpable t, Typeable t) => t -> Chunk

instance HasUID         Chunk where uid  (CHUNK t) = uid t
instance DrasilDumpable Chunk where dump (CHUNK t) = dump t
...
```
Here, we build a commonality between all chunks registered in Drasil (e.g., they ALL must have a UID [a design decision], they all must be Typeable [Thanks to GHC 8.2+, this is the new 'normal'], and they all must be `DrasilDumpable` [not really required, but I think this would help in debugging -- this could also be something that we mass-dump into a single JSON file for our stable examples to show all raw knowledge we use to build up softifacts]). Additionally, we discard the real type information because the Typeable allows us to retrieve it later, and it will be helpful in creating a homogeneous map type.

(3) Thanks to capturing all of our chunks in a single Chunk type, we can construct a single homogeneous `ChunkDB` type:
```haskell
type ChunkDB = M.Map UID Chunk
```

(4) We build a few functions to help with registering chunks into the ChunkDB:
```haskell
-- Registering a list of like-typed chunks
chunksToChunkDB :: (HasUID t, Typeable t, DrasilDumpable t) => [t] -> ChunkDB
chunksToChunkDB = M.fromList . map (\x -> (uid x, CHUNK x))

-- Registering
registerChunk :: (HasUID t, Typeable t, DrasilDumpable t) => t -> ChunkDB -> ChunkDB
registerChunk c = M.insert (uid c) (CHUNK c)

-- Retrieval (I prefer using `Maybe` for prototyping instead of forced errors, but this would likely be a strict value or an error in practice)
retrieveChunk :: Typeable a => ChunkDB -> UID -> Maybe a
retrieveChunk cdb u = do
    (CHUNK r) <- M.lookup u cdb
    cast r
```

(5) When we want to pull arbitrary lists of chunks from the db by their `TypeRep`s, we can via casting:
```haskell
retrieveChunksByType :: (HasUID t, DrasilDumpable t, Typeable t) => ChunkDB -> TypeRep -> [t]
retrieveChunksByType cdb tr = relevantChunks
    where
        allChunks = M.toList cdb
        relevantChunks = mapMaybe (\(_, CHUNK c) -> if typeOf c == tr then cast c else Nothing) allChunks
```
NOTE: There is one problem here (which I go into small discussion into in my prototype), but it's not too important here. Essentially, the `if typeOf c == tr then cast c else Nothing` seems to be too strict of a requirement, and we might _want_ to lessen to just a `cast c` (in which case, we can get rid of the then redundant `TypeRep` parameter), but I'm not entirely certain yet (I will have to do more research on this).

Of course, whenever we want to do this, it will be rather inefficient, so we can create a cache:
(6) A Map of ChunkDBs by their TypeReps:
```haskell
type ChunksByTypeRep = M.Map TypeRep ChunkDB

dumpChunkDBToTypeRepMap :: ChunkDB -> ChunksByTypeRep
dumpChunkDBToTypeRepMap cdb = allRegistered
    where
        -- gather a list of registered types
        knownChunkTypes :: [TypeRep]
        knownChunkTypes = nub . map (typeOf . snd) $ M.toList cdb

        -- gather types with a list of all registered chunks that are of that type
        chunksWithTypes :: [(TypeRep, [Chunk])]
        chunksWithTypes = map (\x -> (,) x $ retrieveChunksByTypeInChunkBox cdb x) knownChunkTypes

        -- reconcile everything found, into a single ChunksByTypeRep map
        allRegistered :: ChunksByTypeRep
        allRegistered = M.fromList $ map (second chunksToChunkDB) chunksWithTypes
```

This still might not be the right solution for us because it constructs the `ChunksByTypeRepMap` after having fully constructed a single `ChunkDB`, but we can easily create an alternative solution which registers chunks in both maps when registering them.

```haskell
type ChunkDB' = (ChunkDB, ChunksByTypeRep)

-- There is one small problem here w.r.t uniqueness of UIDs and ensuring that the TypeRepMap is an accurate `view` of the `ChunkDB`
-- To fix it, we would just need to add an extra lookup to the initial chunkDB, removing it from the TypeRepMap manually before re-adding it to both maps (not a large problem, but I noticed it while typing up this ticket)
registerChunk' :: (HasUID t, Typeable t, DrasilDumpable t) => t -> ChunkDB' -> ChunkDB'
registerChunk' c (cdb, trcdb) = (cdb', trcdb')
    where
        u      = uid c
        chk    = CHUNK c
        cdb'   = M.insert u chk cdb
        trcdb' = M.alter
                    (maybe
                        (Just $ M.singleton u chk)
                        (Just . M.insert u chk))
                    (typeOf c)
                    trcdb

retrieveChunk' :: Typeable a => ChunkDB' -> UID -> Maybe a
retrieveChunk' (cdb, _) u = do
    (CHUNK r) <- M.lookup u cdb
    cast r

retrieveChunksByType' :: Typeable a => ChunkDB' -> TypeRep -> [a]
retrieveChunksByType' (_, trcdb) tr = maybe [] (mapMaybe ((\(CHUNK c) -> cast c) . snd) . M.toList) (M.lookup tr trcdb)
```

## Does it solve all noted problems?

i. We are unable to add (almost) arbitrary data into the ChunkDBs for various different systems.
    **Yes**, so as long all all relevant data is typeable, has a UID, and (optionally) can be dumped to a String (`DrasilDumpable`).

ii. We manually track chunks in lists (by a single type, for later further processing) in `SystemInformation` when we already have them in nicely placed maps.
    **Yes**, we would have additional functionality for grabbing all chunks in a chunkdb by a particular TypeRep. However, we lose out on the ability of polymorphic types with type constraint restrictions, but it seems that we stick to a small group of types anyway, so this ability may not be needed.

iii. Adding allowed datatypes to a ChunkDB is impossible without editing core `drasil-database` files, and difficult when editing core `drasil-database` files because it affects compilation of other examples.
    **Yes**, we will be solving both problems here because the data registerable is nearly arbitrary. Through this, `ChunkDB` should become _mostly_ stable and unchanging.

iv. Working with similar maps requires redundant manual recreation (swhs and nopcm has a bit of this).
    **Yes**, since `ChunkDB` is now a single type, we will be able to use `Data.Map`'s abilities to allow for easier merging of ChunkDBs and such.

v. We have occurrences of conflicting UIDs because they are registered in different maps.
    **Yes**. By merging in a single map, we ensure uniqueness is upheld in UIDs.

In general, I think this would decrease LOC while allowing us to generally get more out of our ChunkDBs.

## Notable issues
I can only think of 2 issues, but I'm unsure of how impactful they are.

1. We're using `Data.Typeable`. It appears that this library is largely unstable pre-GHC-8.2, and previously was designed differently. Additionally, since we are discarding type information and later re-gaining it, we will have _some_ amount of runtime impact, but I'm unsure of how much.
2. Retrieving chunks en masse by a single TypeRep is tricky, but I think we can sufficiently manage it. In (5), I briefly mentioned this. However, with a bit of hard work, we can mitigate the overhead -- we should be able to create functions similar to `makeLenses` which hides the difficulty associated with creating TypeReps and casting chunks/lists of chunks.

I'm posting this ticket earlier rather than later because I think it's a good proof of concept, but it still has problems. I don't think the problems are too 'breaking', but it would be best to discuss the base before I spend more time on this design.

# Prototype
I've made, and, lightly, tested this in my mini-development repo: https://github.com/balacij/ExprTests/blob/f7321d55611e7721a339188c79b66a3c6258ee8c/src/Lang/ChunkDB.hs

If you'd like to open up this code locally, I've currently setup `stack run` to run through a few small tests with this ChunkDB prototype.

	-- \| Our chunk databases. \Must contain all maps needed in an example.\
	-- In turn, these maps must contain every chunk definition or concept
	-- used in its respective example, else an error is thrown.
	data ChunkDB = CDB { symbolTable :: SymbolMap
	, termTable :: TermMap
	, defTable :: ConceptMap
	, _unitTable :: UnitMap
	, _traceTable :: TraceMap
	, _refbyTable :: RefbyMap
	, _dataDefnTable :: DatadefnMap
	, _insmodelTable :: InsModelMap
	, _gendefTable :: GendefMap
	, _theoryModelTable :: TheoryModelMap
	, _conceptinsTable :: ConceptInstanceMap
	, _sectionTable :: SectionMap
	, _labelledcontentTable :: LabelledContentMap
	, _refTable :: ReferenceMap
	} -- TODO: Expand and add more databases
	makeLenses ''ChunkDB

	, _quants :: [e]
	, _concepts :: [f]
	, _instModels :: [InstanceModel]
	, _datadefs :: [DataDefinition]
	, _configFiles :: [String]
	, _inputs :: [h]
	, _outputs :: [i]
	, _defSequence :: [Block SimpleQDef]
	, _constraints :: [j] --TODO: Add SymbolMap OR enough info to gen SymbolMap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging chunk maps into a single map via `Data.Typeable` #2873

Background

Problem

Potential Solution : Merging Maps into a single one

Does it solve all noted problems?

Notable issues

Prototype

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Merging chunk maps into a single map via Data.Typeable #2873

Description

Background

Problem

Potential Solution : Merging Maps into a single one

Does it solve all noted problems?

Notable issues

Prototype

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Merging chunk maps into a single map via `Data.Typeable` #2873