-
Notifications
You must be signed in to change notification settings - Fork 73
add codepoint-based string functions as Data.String.CodePoints #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
428b995
dc0577c
25572de
8279da8
292e0de
fd91b0b
8387641
fb47387
5a6cfd0
3003c09
75117d2
ecfbf0b
d5b6d92
8860295
a6855b4
8c55257
a26afdf
c1ff8c5
04154a5
c798dfe
71cdcf2
2c2418a
46e9545
c59f340
71c5156
e8ca6f3
8d6d263
8e99c39
557186c
4ec116b
5490d46
af2db11
205838c
3b57fd4
7eac69e
cde0d26
4292a8b
0d81e0b
370af7c
cef521a
b38eb80
4f3d71d
e3cea19
db3eba3
3a24c8d
82a502f
085022e
6edb70f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,7 +6,7 @@ module CodePoints | |
, codePointToInt | ||
, count | ||
, drop | ||
--, dropWhile | ||
, dropWhile | ||
, fromCodePointArray | ||
--, indexOf | ||
--, indexOf' | ||
|
@@ -16,7 +16,7 @@ module CodePoints | |
, singleton | ||
, splitAt | ||
, take | ||
--, takeWhile | ||
, takeWhile | ||
, toCodePointArray | ||
, uncons | ||
) where | ||
|
@@ -31,6 +31,7 @@ import Data.Tuple (Tuple(Tuple)) | |
import Data.Unfoldable (unfoldr) | ||
import Prelude ((&&), (||), (*), (+), (-), (<$>), (<), (<=), (<<<)) | ||
|
||
|
||
newtype CodePoint = CodePoint Int | ||
|
||
codePointFromInt :: Int -> Maybe CodePoint | ||
|
@@ -42,11 +43,11 @@ codePointToInt (CodePoint n) = n | |
|
||
codePointFromSurrogatePair :: Int -> Int -> Maybe CodePoint | ||
codePointFromSurrogatePair lead trail | isLead lead && isTrail trail | ||
= Just (CodePoint (unsurrogate lead trail)) | ||
= Just (unsurrogate lead trail) | ||
codePointFromSurrogatePair _ _ = Nothing | ||
|
||
unsurrogate :: Int -> Int -> Int | ||
unsurrogate h l = (h - 0xD800) * 0x400 + (l - 0xDC00) + 0x10000 | ||
unsurrogate :: Int -> Int -> CodePoint | ||
unsurrogate h l = CodePoint ((h - 0xD800) * 0x400 + (l - 0xDC00) + 0x10000) | ||
|
||
isLead :: Int -> Boolean | ||
isLead cu = 0xD800 <= cu && cu <= 0xDBFF | ||
|
@@ -71,13 +72,25 @@ codePointAtFallback n s = Array.index (toCodePointArray s) n | |
|
||
|
||
count :: (CodePoint -> Boolean) -> String -> Int | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See how There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's just a really odd name for that function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think there was a discussion about it somewhere else. Maybe we should keep this name for now, and open an issue to rename both instances of it the next time we make a breaking change. |
||
count pred = Array.length <<< Array.filter pred <<< toCodePointArray | ||
count = _count isLead isTrail unsurrogate | ||
|
||
foreign import _count | ||
:: (Int -> Boolean) | ||
-> (Int -> Boolean) | ||
-> (Int -> Int -> CodePoint) | ||
-> (CodePoint -> Boolean) | ||
-> String | ||
-> Int | ||
|
||
|
||
drop :: Int -> String -> String | ||
drop n s = fromCodePointArray (Array.drop n (toCodePointArray s)) | ||
|
||
|
||
dropWhile :: (CodePoint -> Boolean) -> String -> String | ||
dropWhile p s = drop (count p s) s | ||
|
||
|
||
foreign import fromCodePointArray :: Array CodePoint -> String | ||
|
||
|
||
|
@@ -108,6 +121,10 @@ takeFallback :: Int -> String -> String | |
takeFallback n s = fromCodePointArray (Array.take n (toCodePointArray s)) | ||
|
||
|
||
takeWhile :: (CodePoint -> Boolean) -> String -> String | ||
takeWhile p s = take (count p s) s | ||
|
||
|
||
toCodePointArray :: String -> Array CodePoint | ||
toCodePointArray = _toCodePointArray toCodePointArrayFallback | ||
|
||
|
@@ -121,7 +138,7 @@ toCodePointArrayFallback s = unfoldr decode (fromFoldable (toCharCode <$> toChar | |
where | ||
decode :: List Int -> Maybe (Tuple CodePoint (List Int)) | ||
decode (Cons h (Cons l rest)) | isLead h && isTrail l | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the only use of |
||
= Just (Tuple (CodePoint (unsurrogate h l)) rest) | ||
= Just (Tuple (unsurrogate h l) rest) | ||
decode (Cons c rest) = Just (Tuple (CodePoint c) rest) | ||
decode Nil = Nothing | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every use of this function has an
isLead cu0 && isTrail cu1
check beforehand; do you think it might make more sense to include this check inside theunsurrogate
function and have it return aMaybe CodePoint
, returningNothing
if it is given arguments which don't form a surrogate pair?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only an internal function, it only has two usages, and neither would be more convenient to write that way, so I'm going to leave it as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then 👍