Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Natural ASCII Builders #368

Open
chris-martin opened this issue Feb 25, 2021 · 33 comments
Open

Natural ASCII Builders #368

chris-martin opened this issue Feb 25, 2021 · 33 comments

Comments

@chris-martin
Copy link
Contributor

Currently the bytestring library has an integerDec function, but no corresponding integerHex function.

integerDec :: Integer -> Builder   -- currently exists
integerHex :: Integer -> Builder   -- doesn't (probably shouldn't) exist

I think this makes sense. I presume the idea is that hex representations of unbounded signed integers don't make much sense. It would be unclear how to represent the negatives, since it isn't traditional to express hexadecimal numbers using a negation sign ('-') and two's complement requires a fixed size. Unfortunately, this presently leaves us without any means of hex-encoding unbounded numbers.

However, it seems to me that we could include support for unbounded unsigned integers -- the Natural type.

naturalHex :: Natural -> Builder   -- doesn't (should) exist

And (by the way) I think it would also make sense to include the decimal variant as well.

naturalDec :: Natural -> Builder   -- doesn't (should) exist

Natural appeared in base-4.8 and bytestring supports all the way back to base-4.3, so I think these would have to be conditionally included #if __GLASGOW_HASKELL__ >= 7101.

@Bodigrim
Copy link
Contributor

it isn't traditional to express hexadecimal numbers using a negation sign

It's rarely needed in computer science, but mathematics is unambiguous here.
https://en.wikipedia.org/wiki/Hexadecimal#Signs
I'm in favor of integerHex, ideally sharing most of its implementation with integerDec.

With regards to natural{Hex,Dec}, I'd rather keep API concise. If a user happened to have Natural, invoking toInteger is not that difficult.

@chris-martin
Copy link
Contributor Author

I would love to encourage using Natural where appropriate, but adding integerHex (and also intHex, then?) would certainly be enough to satisfy my present need.

@Bodigrim
Copy link
Contributor

Bodigrim commented Feb 25, 2021

Could you please tell more about your use case? How big are Integers?

@chris-martin
Copy link
Contributor Author

chris-martin commented Feb 25, 2021

My integers are not large, and I must confess that wordHex would technically suffice. I'm writing an intermediate-level Haskell book about HTTP, and I'm currently demonstrating how to encode chunked message bodies. The chunk sizes must be given in hex, and the length function for strict byte strings gives me an Int, so an Int -> Builder conversion is what I ultimately need.

If there were an intHex function, my demonstration would look something like this:

{-# language OverloadedStrings #-}

import qualified Data.ByteString         as BS
import qualified Data.ByteString.Builder as BSB

encodeChunk :: BS.ByteString -> BSB.Builder
encodeChunk x =
    BSB.intHex (BS.length x) <> "\r\n" <>
    BSB.byteString x <> "\r\n"

encodeLastChunk :: BSB.Builder
encodeLastChunk = BSB.intHex 0 <> "\r\n"

Without it, I'll be making a reluctant digression onto the subject of numeric conversions. Not a lot of code ...

intHex :: Int -> BSB.Builder
intHex = BSB.wordHex . fromIntegral

But the tedium comes in the explanation.

  • Explaining why we have to do this:
    • That the library includes hex conversions for most other integral types but this one.
    • Lamenting the fact that length functions return Int instead of Word even though lengths don't need to be signed.
  • Explaining why it works:
    • Remembering that the max bound for Word is greater than the max bound for Int.
    • The caveat that we've now written a function with an Int parameter, but it only produces sensible results on nonnegative inputs.

@Bodigrim
Copy link
Contributor

I suggest we add wordHexFixed and intHexFixed, choosing between 32 and 64 bits depending on architecture.

If you do not need an arbitrarily-long hexadecimal numbers at the moment, let's leave integerHex out of scope for a while.

@chris-martin
Copy link
Contributor Author

Why "fixed"? I would much rather be able to produce strings that aren't zero-padded, and definitely much rather not have results that visibly vary by architecture.

@Bodigrim
Copy link
Contributor

There is only one possible implementation for integerHex, but there are two for intHex. One of them is intHex = wordHex . fromIntegral, and so intHex (-1) = "ffffffff" or "ffffffffffffffff" depending on architecture. Another is mathematicaly correct one, but more error prone for certain applications, such that intHex (-1) = "-1". That's essentially the reason why we have intHexFixedN functions, but not intHexN ones.

I think that for book purposes you need to explain all this mess anyways: a new version of bytestring is not coming soon, and your readers will be using older versions for several years ahead.

How is it implemented in real Haskell packages dealing with HTTP encoding? Is there a wider problem, which we can solve?

@chris-martin
Copy link
Contributor Author

there are two for intHex

It seems strange to start backwards from a function name (intHex), envision two possible implementations for it, and therefore provide neither. Wouldn't it make more sense, when envisioning two useful-but-missing functions, to then come up with a distinguishing name for each of them?

How is it implemented in real Haskell packages dealing with HTTP encoding?

The warp library depends on bsb-http-chunked. I'm not entirely sure what's going on therein. It seems like quite an ordeal.

Is there a wider problem, which we can solve?

The wider problem we can solve is that, in general, when people trying to learn Haskell need to convert from one type to another, often the desired conversion function isn't there. I accept that bytestring's past choices present obstacles for today's literature, but I hope to keep writing for a long time, and I hope that someday I can write less about how to fill in the same old gaps in the core libraries and more about how to quickly develop Haskell applications.

@Bodigrim
Copy link
Contributor

Bodigrim commented Feb 25, 2021

The wider problem we can solve is that, in general, when people trying to learn Haskell need to convert from one type to another, often the desired conversion function isn't there.

I think we have a philosophical disagreement here. As a writer you'd like to have as many functions readily available as possible. But as a library developer I cannot reasonably agree: maintaining an exponential number of functions is not sustainable. In this particular case a wanted conversion function is readily available, it is fromIntegral.

That said, I'm in favor of exploring ways to improve ergonomics of Builder functions, and I'm not opposed to adding new functions. Let's wait for others to chime in.

@chris-martin
Copy link
Contributor Author

chris-martin commented Feb 26, 2021

To clarify: I am not requesting a trivial f = g . h sort of addition to the library. (wordHex . fromIntegral) is only a sort-of sufficient workaround that I'm considering putting into the book for brevity's sake. It produces an undesirable result for negative inputs.

λ> toLazyByteString (wordHex (fromIntegral (-30 :: Int)))
"ffffffffffffffe2"

What I would like to see is a conversion that produces either "-1e" or a conversion whose domain is restricted to Natural. Ideally, both. I believe these are both nontrivial additions that do not give way to "an exponential number of functions".

@sjakobi
Copy link
Member

sjakobi commented Feb 26, 2021

Unfortunately I don't see a sufficiently convincing use case for adding anything more than naturalDec in analogy to integerDec. In particular, I fail to see the connection between the apparent need for a hex encoder for (non-negative) Ints and the various suggestions for encoders for Integer and Natural.

How is it implemented in real Haskell packages dealing with HTTP encoding?

The warp library depends on bsb-http-chunked. I'm not entirely sure what's going on therein. It seems like quite an ordeal.

bsb-http-chunked is indeed quite complex, mostly because it encodes Builders where the chunk length is unknown until the Builder has been executed.

@Kleidukos
Copy link
Member

bsb-http-chunked is indeed quite complex, mostly because it encodes Builders where the chunk length is unknown until the Builder has been executed.

Shouldn't we provide a saner alternative then? Chris' request seems extremely reasonable if the implementation is non-trivial. Also, writing doctests would provide with testable examples that would certainly ease the weight of maintenance.

@sjakobi
Copy link
Member

sjakobi commented Feb 26, 2021

bsb-http-chunked is indeed quite complex, mostly because it encodes Builders where the chunk length is unknown until the Builder has been executed.

Shouldn't we provide a saner alternative then? Chris' request seems extremely reasonable if the implementation is non-trivial. Also, writing doctests would provide with testable examples that would certainly ease the weight of maintenance.

Could you clarify what you mean with this "saner alternative"?

@Kleidukos
Copy link
Member

@sjakobi Sorry, this came out wrong, I was thinking of an implementation that would not be coupled to the HTTP transfer encoding.

@sjakobi
Copy link
Member

sjakobi commented Feb 26, 2021

No worries @Kleidukos. I simply still don't see the obvious use-case for another hex encoder. bsb-http-chunked, for one, probably wouldn't be able to use a such a function since it doesn't compose Builders, but instead uses the Builder internals and writes to the output buffer directly.

I also suspect that hex-encoding unbounded Naturals or Integers might be a rather uncommon thing to do.

@Bodigrim
Copy link
Contributor

Currently there are:

  • wordDec, which produces the shortest decimal representation of an unsigned integer,
  • intDec, which produces the shortest decimal representation of a signed integer (so that intDec (-1) = "-1"),
  • no functions for a fixed-length decimal representation (because it makes no sense),
  • wordHex, which produces the shortest hexadecimal representation of an unsigned integer,
  • no signed counterpart for wordHex,
  • word64HexFixed, which produces the fixed-size hexadecimal representation of an unsigned integer,
  • int64HexFixed, which produces the fixed-size hexadecimal representation of a signed integer (so that intDec (-1) = "ffffffff".

The proposal, as far as I understand, is to add intHex, producing the shortest hexadecimal representation of a signed integer such that intHex (-1) = "-1" and not "ffffffff". @chris-martin, is it a correct rendering?

@chris-martin
Copy link
Contributor Author

@Bodigrim If my original idea to add Integer and Natural hex functions is off the table due to the difficulty of implementation - Then yes, that is a good summary of the modified proposal.

@Bodigrim
Copy link
Contributor

Bodigrim commented Mar 3, 2021

I wonder if we can attack the problem from another side introducing Data.ByteString.genericLength

@Bodigrim
Copy link
Contributor

@sjakobi @hsyl20 how do you feel about Data.ByteString.genericLength? It would be able to return Word, so that wordHex is applicable directly.

@sjakobi
Copy link
Member

sjakobi commented Mar 18, 2021

I'm not a fan of Data.List.genericLength because its result can silently overflow:

> genericLength @Int8 (replicate 300 True)
44

I wouldn't want to have this happen with Data.ByteString.genericLength.

I'd be somewhat more open to adding e.g. wordLength :: ByteString -> Word.

@sjakobi
Copy link
Member

sjakobi commented Mar 19, 2021

I'm not a fan of Data.List.genericLength because its result can silently overflow:

(I have opened https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5306 to document this)

@kindaro
Copy link
Contributor

kindaro commented Mar 19, 2021

Fan or not fan, genericLength already exists. The simplest way to decide on this sort of issues it to follow base. If someone managed to talk the base people into adding it, which I imagine is no easy feat, we can trust that the argument was solid enough without re-visiting it. Not that I have an opinion myself other than that consistency is good.

@hsyl20
Copy link
Contributor

hsyl20 commented Mar 19, 2021

@Bodigrim I would go even further and use Word internally to store the length of a ByteString. It would allow us to remove some silly assert (l >= 0).

I'm not a fan of genericLength because we could commit to use Word and let users convert as they want. I'm also not a fan of genericLength and wordLength as names because they are too long. As we can't really change length type for backward compat, we could perhaps expose size :: ByteString -> Word instead?

@kindaro genericLength has been here since forever (at least before GHC switched from darcs to git in 1996 afaict) so I don't think there has been a lot of discussion.

@Bodigrim
Copy link
Contributor

I actually like the idea to change internal representation to use Word for length, but this is a breaking change, because internal representation is exposed.

Not sure about size, is it attested in other packages/ecosystems?

@Kleidukos
Copy link
Member

@Bodigrim pardon me if I'm wrong, but isn't internal representation exposed with an unstability warning? Like "we make no guarantee of stability whatsoever"?

@Bodigrim
Copy link
Contributor

@Kleidukos PVP does not contain any special provisions for such kind of warnings.

@sjakobi
Copy link
Member

sjakobi commented Mar 19, 2021

I actually like the idea to change internal representation to use Word for length, but this is a breaking change, because internal representation is exposed.

I'm curious what the patch would look like. Especially I'm wondering what we'd do about the various functions that take Int paramaters.

Not sure about size, is it attested in other packages/ecosystems?

containers and unordered-containers offer size, but with return type Int of course.

@Bodigrim
Copy link
Contributor

The discussion seems stuck, and I'm tempted to close it as "won't do". I think that adding intHex such that intHex (-1) = "-1" will be confusing, as many users might expect "ffffffff" instead.

@chris-martin
Copy link
Contributor Author

chris-martin commented Mar 17, 2022

Circling back around to my original motivation, I have been reluctantly convinced to let my instructional model of the HTTP protocol use Word instead of Natural to describe an HTTP chunk length, since if we have to consider numbers anywhere near 229, we would have already committed some sin in allowing the possibility of a half-gigabyte chunk. Though I am still somewhat uncomfortable with asking people to consider architecture-dependent upper bounds that the domain does not strictly necessitate, and am in general frustrated with not being able to weave this example into a larger happy story about how it is simple to do correct math in Haskell, using unbounded types, disburdened of type conversions, writing code that is easy and obviously correct, before shifting gears into the uglier world of performance optimization and arithmetic overflow.

What still requires an awkward page of explanation and workaround is why the library gives lengths as signed ints. From where I sit, even just being able to say that this was a mistake that will be fixed in an upcoming release would be an improvement.

@Bodigrim
Copy link
Contributor

Realistically I cannot imagine how Haskell could migrate to length :: a -> Word in a finite amount of time.

@chris-martin
Copy link
Contributor Author

chris-martin commented Mar 17, 2022

I don't see any reason to change the existing function rather than add a new one. Names don't matter so much as having the feature there and well-documented.

@Bodigrim
Copy link
Contributor

IMHO such new function should be added to base first.

@chris-martin
Copy link
Contributor Author

chris-martin commented Mar 17, 2022

Oh, speaking of base, it does have a precedent for the one function I actually wanted in the first place.

λ> import Numeric
λ> import Numeric.Natural

λ> showHex (256 :: Natural) ""
"100"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants