Skip to content

Merging chunk maps into a single map via Data.Typeable #2873

Open
@balacij

Description

@balacij

Background

Presently, ChunkDBs are built as an enumeration of "chunk" maps;

-- | Our chunk databases. \Must contain all maps needed in an example.\
-- In turn, these maps must contain every chunk definition or concept
-- used in its respective example, else an error is thrown.
data ChunkDB = CDB { symbolTable :: SymbolMap
, termTable :: TermMap
, defTable :: ConceptMap
, _unitTable :: UnitMap
, _traceTable :: TraceMap
, _refbyTable :: RefbyMap
, _dataDefnTable :: DatadefnMap
, _insmodelTable :: InsModelMap
, _gendefTable :: GendefMap
, _theoryModelTable :: TheoryModelMap
, _conceptinsTable :: ConceptInstanceMap
, _sectionTable :: SectionMap
, _labelledcontentTable :: LabelledContentMap
, _refTable :: ReferenceMap
} -- TODO: Expand and add more databases
makeLenses ''ChunkDB

Problem

i. We are unable to add (almost) arbitrary data into the ChunkDBs for various different systems.
ii. We manually track chunks in lists (by a single type, for later further processing) in SystemInformation when we already have them registered in the ChunkDB.

, _quants :: [e]
, _concepts :: [f]
, _instModels :: [InstanceModel]
, _datadefs :: [DataDefinition]
, _configFiles :: [String]
, _inputs :: [h]
, _outputs :: [i]
, _defSequence :: [Block SimpleQDef]
, _constraints :: [j] --TODO: Add SymbolMap OR enough info to gen SymbolMap

iii. Adding arbitrary data to a ChunkDB is impossible without editing core drasil-database files, and difficult when editing core drasil-database files because it affects compilation of other examples.
iv. Working with similar maps requires redundant manual recreation (swhs and nopcm has a bit of this).
v. We have occurrences of conflicting UIDs because they are registered in different maps.

Potential Solution : Merging Maps into a single one

I think we can solve these 5 or so problems by merging our maps into a single Map, and using more of the core Data.Map functionality.

We will need to hide type information to create a homogeneous Chunk type, and then merge our maps into a single Chunk Map.

The solution would, approximately, be as follows:
(1) Assume we have the following UID type, and typeclasses:

type UID = String

-- | An interface to get the UID of a chunk.
class HasUID a where
    uid :: a -> UID

-- | Dump data to Strings for debugging-purposes.
class DrasilDumpable a where
    dump :: a -> String

(2) We build a Chunk type intended to carry arbitrary data in our final ChunkDB:

import Data.Typeable -- this is an important design decision (as opposed to Type.Reflection) for simplicity
...
data Chunk where  -- Personally, I find GADTs to be more readable, but this can be done without them via ExistentialQuantification
    CHUNK :: (HasUID t, DrasilDumpable t, Typeable t) => t -> Chunk

instance HasUID         Chunk where uid  (CHUNK t) = uid t
instance DrasilDumpable Chunk where dump (CHUNK t) = dump t
...

Here, we build a commonality between all chunks registered in Drasil (e.g., they ALL must have a UID [a design decision], they all must be Typeable [Thanks to GHC 8.2+, this is the new 'normal'], and they all must be DrasilDumpable [not really required, but I think this would help in debugging -- this could also be something that we mass-dump into a single JSON file for our stable examples to show all raw knowledge we use to build up softifacts]). Additionally, we discard the real type information because the Typeable allows us to retrieve it later, and it will be helpful in creating a homogeneous map type.

(3) Thanks to capturing all of our chunks in a single Chunk type, we can construct a single homogeneous ChunkDB type:

type ChunkDB = M.Map UID Chunk

(4) We build a few functions to help with registering chunks into the ChunkDB:

-- Registering a list of like-typed chunks
chunksToChunkDB :: (HasUID t, Typeable t, DrasilDumpable t) => [t] -> ChunkDB
chunksToChunkDB = M.fromList . map (\x -> (uid x, CHUNK x))

-- Registering
registerChunk :: (HasUID t, Typeable t, DrasilDumpable t) => t -> ChunkDB -> ChunkDB
registerChunk c = M.insert (uid c) (CHUNK c)

-- Retrieval (I prefer using `Maybe` for prototyping instead of forced errors, but this would likely be a strict value or an error in practice)
retrieveChunk :: Typeable a => ChunkDB -> UID -> Maybe a
retrieveChunk cdb u = do
    (CHUNK r) <- M.lookup u cdb
    cast r

(5) When we want to pull arbitrary lists of chunks from the db by their TypeReps, we can via casting:

retrieveChunksByType :: (HasUID t, DrasilDumpable t, Typeable t) => ChunkDB -> TypeRep -> [t]
retrieveChunksByType cdb tr = relevantChunks
    where
        allChunks = M.toList cdb
        relevantChunks = mapMaybe (\(_, CHUNK c) -> if typeOf c == tr then cast c else Nothing) allChunks

NOTE: There is one problem here (which I go into small discussion into in my prototype), but it's not too important here. Essentially, the if typeOf c == tr then cast c else Nothing seems to be too strict of a requirement, and we might want to lessen to just a cast c (in which case, we can get rid of the then redundant TypeRep parameter), but I'm not entirely certain yet (I will have to do more research on this).

Of course, whenever we want to do this, it will be rather inefficient, so we can create a cache:
(6) A Map of ChunkDBs by their TypeReps:

type ChunksByTypeRep = M.Map TypeRep ChunkDB

dumpChunkDBToTypeRepMap :: ChunkDB -> ChunksByTypeRep
dumpChunkDBToTypeRepMap cdb = allRegistered
    where
        -- gather a list of registered types
        knownChunkTypes :: [TypeRep]
        knownChunkTypes = nub . map (typeOf . snd) $ M.toList cdb

        -- gather types with a list of all registered chunks that are of that type
        chunksWithTypes :: [(TypeRep, [Chunk])]
        chunksWithTypes = map (\x -> (,) x $ retrieveChunksByTypeInChunkBox cdb x) knownChunkTypes

        -- reconcile everything found, into a single ChunksByTypeRep map
        allRegistered :: ChunksByTypeRep
        allRegistered = M.fromList $ map (second chunksToChunkDB) chunksWithTypes

This still might not be the right solution for us because it constructs the ChunksByTypeRepMap after having fully constructed a single ChunkDB, but we can easily create an alternative solution which registers chunks in both maps when registering them.

type ChunkDB' = (ChunkDB, ChunksByTypeRep)

-- There is one small problem here w.r.t uniqueness of UIDs and ensuring that the TypeRepMap is an accurate `view` of the `ChunkDB`
-- To fix it, we would just need to add an extra lookup to the initial chunkDB, removing it from the TypeRepMap manually before re-adding it to both maps (not a large problem, but I noticed it while typing up this ticket)
registerChunk' :: (HasUID t, Typeable t, DrasilDumpable t) => t -> ChunkDB' -> ChunkDB'
registerChunk' c (cdb, trcdb) = (cdb', trcdb')
    where
        u      = uid c
        chk    = CHUNK c
        cdb'   = M.insert u chk cdb
        trcdb' = M.alter
                    (maybe
                        (Just $ M.singleton u chk)
                        (Just . M.insert u chk))
                    (typeOf c)
                    trcdb

retrieveChunk' :: Typeable a => ChunkDB' -> UID -> Maybe a
retrieveChunk' (cdb, _) u = do
    (CHUNK r) <- M.lookup u cdb
    cast r

retrieveChunksByType' :: Typeable a => ChunkDB' -> TypeRep -> [a]
retrieveChunksByType' (_, trcdb) tr = maybe [] (mapMaybe ((\(CHUNK c) -> cast c) . snd) . M.toList) (M.lookup tr trcdb)

Does it solve all noted problems?

i. We are unable to add (almost) arbitrary data into the ChunkDBs for various different systems.
Yes, so as long all all relevant data is typeable, has a UID, and (optionally) can be dumped to a String (DrasilDumpable).

ii. We manually track chunks in lists (by a single type, for later further processing) in SystemInformation when we already have them in nicely placed maps.
Yes, we would have additional functionality for grabbing all chunks in a chunkdb by a particular TypeRep. However, we lose out on the ability of polymorphic types with type constraint restrictions, but it seems that we stick to a small group of types anyway, so this ability may not be needed.

iii. Adding allowed datatypes to a ChunkDB is impossible without editing core drasil-database files, and difficult when editing core drasil-database files because it affects compilation of other examples.
Yes, we will be solving both problems here because the data registerable is nearly arbitrary. Through this, ChunkDB should become mostly stable and unchanging.

iv. Working with similar maps requires redundant manual recreation (swhs and nopcm has a bit of this).
Yes, since ChunkDB is now a single type, we will be able to use Data.Map's abilities to allow for easier merging of ChunkDBs and such.

v. We have occurrences of conflicting UIDs because they are registered in different maps.
Yes. By merging in a single map, we ensure uniqueness is upheld in UIDs.

In general, I think this would decrease LOC while allowing us to generally get more out of our ChunkDBs.

Notable issues

I can only think of 2 issues, but I'm unsure of how impactful they are.

  1. We're using Data.Typeable. It appears that this library is largely unstable pre-GHC-8.2, and previously was designed differently. Additionally, since we are discarding type information and later re-gaining it, we will have some amount of runtime impact, but I'm unsure of how much.
  2. Retrieving chunks en masse by a single TypeRep is tricky, but I think we can sufficiently manage it. In (5), I briefly mentioned this. However, with a bit of hard work, we can mitigate the overhead -- we should be able to create functions similar to makeLenses which hides the difficulty associated with creating TypeReps and casting chunks/lists of chunks.

I'm posting this ticket earlier rather than later because I think it's a good proof of concept, but it still has problems. I don't think the problems are too 'breaking', but it would be best to discuss the base before I spend more time on this design.

Prototype

I've made, and, lightly, tested this in my mini-development repo: https://github.com/balacij/ExprTests/blob/f7321d55611e7721a339188c79b66a3c6258ee8c/src/Lang/ChunkDB.hs

If you'd like to open up this code locally, I've currently setup stack run to run through a few small tests with this ChunkDB prototype.

Metadata

Metadata

Labels

Projects

Status

To do

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions