Data.List.nub
is O(n²). This one is O(n log n) by requiring an Ord
instance.
-
Also contains a benchmark (
report.html
) that shows thatordNub
apparently is faster thannub
in all cases. -
PACKAGES_USING_NUB.txt
contains all packages which usenub
(made with a quick grep). It's not the most accurate since some packages define their ownnub
, but that's a minority.
This thing here is not a library. It is a benchmark suite. View results here.
If you are looking for a fast ordNub
function to use in your code, use:
nubOrd
fromcontainers
(stable link) which after many years was finally added (also in response to this repo).
If you don't have a new enough containers
or don't want to depend on it, use:
import qualified Data.Set as Set
ordNub :: (Ord a) => [a] -> [a]
ordNub l = go Set.empty l
where
go _ [] = []
go s (x:xs) = if x `Set.member` s then go s xs
else x : go (Set.insert x s) xs
\\
, union
, intersect
- they too are O(n²).
Also be aware that they don't work like sets. For example:
> [1,1,2] \\ [1]
[1,2]
> union [1,2,3,1] [1,4]
[1,2,3,1,4]
The current O(n log n) recommendation for \\
is:
import qualified Data.Map.Strict as Map
listDifference :: (Ord a) => [a] -> [a] -> [a]
listDifference a b = go initHist a
where
initHist = Map.fromListWith (+) [ (x, 1 :: Int) | x <- b ]
go _ [] = []
go hist (x:xs) = case Map.lookup x hist of
Just n | n > 0 -> go (Map.insert x (n-1) hist) xs
_ -> x : go hist xs
The current O(n log n) recommendation for union
is:
import qualified Data.Set as Set
listUnion :: (Ord a) => [a] -> [a] -> [a]
listUnion a b = a ++ ordNub (filter (`Set.notMember` aSet) b)
where
aSet = Set.fromList a
The current O(n log n) recommendation for intersect
is:
import qualified Data.Set as Set
listIntersect :: (Ord a) => [a] -> [a] -> [a]
listIntersect a b = filter (`Set.member` bSet) a
where
bSet = Set.fromList b
The file ordnub.hs
contains more functions you may be interested in, including:
ordNubByEq :: (Eq a, Ord b) => (a -> b) -> [a] -> [a]
ordNubBy :: (Ord b) => (a -> b) -> (a -> a -> Bool) -> [a] -> [a]
These functions have longer docs so they are not copied into this README.