-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable ordinary-index building without a page count bound #419
base: main
Are you sure you want to change the base?
Conversation
6bc89d6
to
1db7791
Compare
1db7791
to
db4f2f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just a few small suggestions
{-| | ||
Turns a growing vector into an ordinary vector, thereby invalidating the | ||
growing vector. Executing @unsafeFreeze vec@ is only safe when @vec@ is not | ||
used afterwards. | ||
-} | ||
unsafeFreeze :: GrowingVector s a -> ST s (Vector a) | ||
unsafeFreeze (GrowingVector bufferRef lengthRef) = do | ||
buffer <- readSTRef bufferRef | ||
length <- readPrimVar lengthRef | ||
force <$> Mutable.unsafeFreeze (Mutable.take length buffer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remember correctly, force
will always copy the contents into a new vector. As such, this implementation is actually a safe freeze
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had wondered about this already before sending you this pull request. I have checked now: From a quick look at the code, I get indeed the impression that force
will always copy the contents. However, the API documentation says this:
Yield the argument, but force it not to retain any extra memory, possibly by copying it.
So the specification leaves open the possibility of force
returning the original vector if this vector uses its underlying array in full. We might hope that future versions of the vector
package won’t make use of this opportunity, but I’d like to err on the safe side and continue considering the above function unsafe.
new initialBufferSize | ||
= do | ||
buffer <- Mutable.new initialBufferSize | ||
bufferRef <- newSTRef buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bufferRef <- newSTRef buffer | |
bufferRef <- newSTRef $! buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don’t think this is necessary. Since we’re using the strict version of ST
, the computation Mutable.new initialBufferSize
will definitely be performed before newSTRef buffer
, and I’d be very surprised if new
wouldn’t always yield a fully evaluated result. Thoughts?
|
||
makeRoom :: ST s () | ||
makeRoom = do | ||
length <- readPrimVar lengthRef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This length was just read in the do
block above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re absolutely right, and I was fully aware of that when I filed this pull request. 🙂 I’m not quite sure how to handle this situation. I would like to have exactly the code for “making room” be separated, so that it doesn’t clutter the main code. However, self-contained code for making room needs to contain the initial reading of the length, which fulfills a different purpose than the reading of the length in the main code, which is for saving the original length in order to use it later in a computation. An alternative would be to pass the length as an argument to makeRoom
, but this seems a bit awkward to me. Would you be fine with keeping the code as it is now? Do you perhaps have a different idea of how to deal with this problem?
let | ||
|
||
higherBufferSizes :: [Int] | ||
higherBufferSizes = tail (init ++ [last]) where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maxBound
should be part of the higher buffer sizes as well. last
could be much less than maxBound
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had considered this as well but then dropped this idea. The thing is that, while maxBound
could be as “low” as 2 ^ 27 - 1
according to the Haskell Report, in practice it is likely to be 2 ^ 63 - 1
and thus so high that the number of pages will probably never get anywhere close to it. With the current implementation, the last element of higherBufferSizes
is at least 2 ^ 62
and thus still so high that it will probably never be needed. I think that overflow of buffer sizes should be prevented even if it is unlikely to ever occur, simply to be safe and for the code to be correct. However, I don’t think we have to introduce the extra, albeit small, complexity that comes from adding maxBound
to higherBufferSizes
, given that doing so probably wouldn’t make any difference in practice. What are your thoughts on this?
-> Mutable.grow buffer (bufferSize' - bufferSize) >>= | ||
writeSTRef bufferRef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Mutable.grow buffer (bufferSize' - bufferSize) >>= | |
writeSTRef bufferRef | |
-> Mutable.grow buffer (bufferSize' - bufferSize) >>= | |
(writeSTRef bufferRef $!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that ($!)
is not needed here, for the same reasons as in the invocation of newSTRef
above.
So far, incremental construction of an ordinary index required users to provide an upper bound of the number of pages or equivalently the number of keys stored in the index. However, such an upper bound cannot be reliably computed.
This pull request introduces an implementation of “growing vectors” and adapts ordinary-index construction to make use of it. As a result, it is not necessary anymore to provide an upper bound of the number of keys but only an initial size of the key buffer. For performance reasons, it is good for this initial size to be an upper bound of the number of keys, but index construction will also work if it is not.
This pull request does not contain tests for growing vectors. These will be added later by resolving #420.