Skip to content

Latest commit

 

History

History
117 lines (96 loc) · 4.34 KB

2013-12-17-24-days-of-hackage-unordered-containers.md

File metadata and controls

117 lines (96 loc) · 4.34 KB
title
24 Days of Hackage: unordered-containers

Last year we looked at the containers library - a library that provides some of the most essential data structures in programming. While containers are extremely efficient, sometimes you're just need to go the extra mile, and that's where unordered-containers comes in.

If you have a look at the insert methods in containers, you'll see type signatures like:

insert :: Ord k => k -> a -> Map k a -> Map k a
insert :: Ord a => a -> Set a -> Set a

The only requirement on insertions is that we can order the keys in a map, or the elements of a set, respectively. This gives us an indication of the underlying algorithm - the ordering is probably exploited to form some type of balanced tree. However, there's another way to build up these data structures, and that's via the familiar process of hashing.

Hashing is available to us in Haskell via the hashable library, currently maintained by Johan Tibell. hashable gives us a new type class - Hashable - which does what it says on the tin. Given some value that has a Hashable instance, we can turn those values into hashes. unordered-containers then builds on top of this to provide us with hash-maps and hash-sets.

Armed with Hashable, working with hash-maps and hash-sets in unordered-containers is a breeze. For example, we could easily store a hash-map associating children with a list of presents they want for Christmas. First, we define our data types:

data Child = Child { childName :: String
                   , childLocation :: String
                   } deriving (Eq, Generic, Show)

data Priority = Please | PrettyPlease | PleasePleasePlease
  deriving (Eq, Generic)

data Request = Request { requestPresent :: String
                       , requestPriority :: Priority
                       } deriving (Eq, Generic, Show)

Off the bat, we can't use these data types in any maps. They don't have Ord instances, and they also don't have Hashable instances. Thankfully, it's a doddle to add a Hashable instance thanks to GHC's generic deriving support1:

instance Hashable Child
instance Hashable Priority
instance Hashable Request

Now we can populate our hash-map:

olliesWishList :: HashMap Child [Request]
olliesWishList =
  let ollie = Child { childName = "ocharles"
                    , childLocation = "London"
                    }
  in fromList [(ollie, [ Request "Artisan Coffee" Please
              , Request "Dependent Types in Haskell" PleasePleasePlease
              , Request "Lambda Fridge Magnets" PrettyPlease
              ])]

And query it just as you would query a map in containers:

main = do
  void $ traverseWithKey showWishList olliesWishList
 where
  showWishList child wants = do
    putStrLn (childName child ++ " wants...")
    mapM_ (putStrLn . requestPresent) wants

Which produces the output:

ocharles wants...
Artisan Coffee
Dependent Types in Haskell
Lambda Fridge Magnets

A modest wishlist, I'm sure you'll agree.

The API provided by unordered-containers is not quite as extensive as what we have available to us in containers - but it's still perfectly usable. The real benefit from unordered-containers is, of course, in the numbers. Johan has already done a good deal of benchmarking and comparison, and wrote up his findings on his blog, and hash-maps consistently out-perform regular Ord-based maps.

As to which you should use, my personal preference is still containers as that API spoils me. However, sometimes these data structures are at the heart of what I'm building and entirely internal - in these cases I'm happy to know that a Hashable instance exists and I might as well make use of it! Another consideration is that containers requires Ord instances, and sometimes ordering just doesn't make sense for the data I'm working with. In these cases, I can usually write a Hashable instance, so in that situation the choice is clear.

Footnotes

  1. This requires hashable >= 1.2