Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement preallocated buffer #996

Merged
merged 33 commits into from
Sep 11, 2019
Merged

Implement preallocated buffer #996

merged 33 commits into from
Sep 11, 2019

Conversation

ashish-goswami
Copy link
Contributor

@ashish-goswami ashish-goswami commented Aug 20, 2019

Ran this on both branch

go test -bench BenchmarkBuilder$ -run ^$ --count=5 -benchmem

Master

goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger/table
BenchmarkBuilder-16    	       3	 478921486 ns/op	264389517 B/op	 2673998 allocs/op
BenchmarkBuilder-16    	       3	 472378778 ns/op	264389565 B/op	 2673999 allocs/op
BenchmarkBuilder-16    	       3	 476744803 ns/op	264390378 B/op	 2674000 allocs/op
BenchmarkBuilder-16    	       3	 473738428 ns/op	264388901 B/op	 2673994 allocs/op
BenchmarkBuilder-16    	       3	 472036524 ns/op	264389282 B/op	 2673996 allocs/op
PASS
ok  	github.com/dgraph-io/badger/table	14.416s

This PR

goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger/table
BenchmarkBuilder-16    	       2	 527921308 ns/op	259690376 B/op	 2659254 allocs/op
BenchmarkBuilder-16    	       2	 522870453 ns/op	259689704 B/op	 2659255 allocs/op
BenchmarkBuilder-16    	       2	 523489554 ns/op	259689704 B/op	 2659254 allocs/op
BenchmarkBuilder-16    	       2	 526129561 ns/op	259689296 B/op	 2659250 allocs/op
BenchmarkBuilder-16    	       2	 520691506 ns/op	259689976 B/op	 2659259 allocs/op
PASS
ok  	github.com/dgraph-io/badger/table	7.914s

Benchstats

name        old time/op    new time/op    delta
Builder-16     475ms ± 1%     517ms ± 2%  +8.91%  (p=0.000 n=5+20)

name        old alloc/op   new alloc/op   delta
Builder-16     264MB ± 0%     260MB ± 0%  -1.78%  (p=0.000 n=5+18)

name        old allocs/op  new allocs/op  delta
Builder-16     2.67M ± 0%     2.66M ± 0%  -0.55%  (p=0.000 n=5+18)

This change is Reviewable

y/iterator.go Outdated
// this function exists is to avoid creating byte arrays per key-value pair in
// table/builder.go.
func (v *ValueStruct) EncodeToYBuf(buf *Buffer) {
buf.WriteByte(v.Meta)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of buf.WriteByte is not checked (from errcheck)

y/iterator.go Outdated
// table/builder.go.
func (v *ValueStruct) EncodeToYBuf(buf *Buffer) {
buf.WriteByte(v.Meta)
buf.WriteByte(v.UserMeta)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of buf.WriteByte is not checked (from errcheck)

y/iterator.go Outdated
buf.WriteByte(v.UserMeta)
var enc [binary.MaxVarintLen64]byte
sz := binary.PutUvarint(enc[:], v.ExpiresAt)
buf.Write(enc[:sz])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of buf.Write is not checked (from errcheck)

y/iterator.go Outdated
var enc [binary.MaxVarintLen64]byte
sz := binary.PutUvarint(enc[:], v.ExpiresAt)
buf.Write(enc[:sz])
buf.Write(v.Value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of buf.Write is not checked (from errcheck)

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r2, 9 of 9 files at r4.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @ashish-goswami and @jarifibrahim)


table/builder.go, line 225 at r4 (raw file):

+---------+------------+-----------+---------------+
*/
func (b *Builder) Finish() *y.Buffer {

Don't change the output from this. Keep it to []byte.


y/y.go, line 355 at r4 (raw file):

// in chunks(pages). Once a chunk(page) is full, it will allocate double the size of previous
// chunk. Its function are not thread safe.
type Buffer struct {

In the first PR, can you just introduce the Buffer: Call it PageBuffer and tests for it.


y/y.go, line 472 at r4 (raw file):

// TODO: reader can be removed.

func (b *Buffer) NewReader() io.Reader {

You can remove this, if not needed.

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 7 files reviewed, 7 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


table/builder.go, line 225 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Don't change the output from this. Keep it to []byte.

Done.


y/y.go, line 355 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

In the first PR, can you just introduce the Buffer: Call it PageBuffer and tests for it.

Done.


y/y.go, line 472 at r4 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You can remove this, if not needed.

Done.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


@ashish-goswami you can click here to see the review status or cancel the code review job.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 10 files at r5.
Reviewable status: 3 of 7 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami and @manishrjain)


y/y.go, line 352 at r5 (raw file):

// PageBuffer consists of many pages. A page is a wrapper over []byte. PageBuffer can act as a
// replacement of bytes.Buffer. Instead of having single underlying buffer, it has multiple
// underlying buffers. Hence it avoids any copy during relocation(as happenes in bytes.Buffer).

Typo: happenes => happens


y/y.go, line 353 at r5 (raw file):

// replacement of bytes.Buffer. Instead of having single underlying buffer, it has multiple
// underlying buffers. Hence it avoids any copy during relocation(as happenes in bytes.Buffer).
// Buffer allocates memory in pages. Once a page is full, it will allocate page with double the size

PageBuffer allocates memory in pages...


y/y.go, line 456 at r5 (raw file):

}

// WriteTo writes whole buffer to w. It returns number of bytes returned and any error encountered.

typo: returned => written

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 7 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


y/y.go, line 352 at r5 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Typo: happenes => happens

Done.


y/y.go, line 353 at r5 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

PageBuffer allocates memory in pages...

Done.


y/y.go, line 456 at r5 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

typo: returned => written

Done.

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes here look like a good optimization. Although left one comment that you may want to double check usage or add more documentation around the shared buffer used for ReadAt to avoid any issues in the future.


Reviewed with ❤️ by PullRequest

y/y.go Outdated
if length > cap(b.readBuf) {
b.readBuf = make([]byte, length) // Allocate whole buffer at start.
} else {
b.readBuf = b.readBuf[:length]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you mentioned on the PageBuffer that its functions aren't thread safe. Although it also seems like multiple calls to ReadAt will potentially mutate the data returned by previous calls since this b.readBuf's underlying data is shared between calls. This might be worth adding to the docstring for this method and PageBuffer because I could see this being pretty surprising and have data leakage/security issues if a buffer was ever shared across accounts or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point. Thanks for pointing it out. Will add to comments.

y/y.go Outdated
// NewPageBuffer returns a new PageBuffer with first page having size pageSize.
func NewPageBuffer(pageSize int) *PageBuffer {
b := &PageBuffer{curPageSize: pageSize}
b.pages = make([]*page, 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this line is redundant currently since the append on the line below will do this even if b.pages is nil, although it is fine to keep it if you prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it. Thanks.

@@ -340,3 +341,128 @@ func BytesToU32Slice(b []byte) []uint32 {
hdr.Data = uintptr(unsafe.Pointer(&b[0]))
return u32s
}

// page struct contains one underlying buffer.
type page struct {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to add more items to this struct or would it make sense to just do something like this instead:

type page []byte

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might add some functions to page.

y/iterator.go Outdated
// EncodeToPageBuf should be kept in sync with the Encode function above. The reason
// this function exists is to avoid creating byte arrays per key-value pair in
// table/builder.go.
func (v *ValueStruct) EncodeToPageBuf(buf *PageBuffer) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish there was a bytes.Buffer interface that could be used instead of having to duplicate the code and keep these in sync but otherwise. But makes sense and thanks for adding the comment to this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. PageBuffer is experimental as of now. Once we decide to use it in Builder, we can do that.

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the usage in table builder for now, as we had discussed. Let's get the library in. Then we can figure out if it is worth using it in various places.

Reviewable status: 2 of 8 files reviewed, 13 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @pullrequest[bot])

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 8 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @pullrequest[bot])

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 10 files at r5, 1 of 3 files at r7, 1 of 1 files at r8, 2 of 3 files at r9.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])


y/y.go, line 361 at r9 (raw file):

	length      int // Length of PageBuffer.
	curPageSize int // Size of last page allocated.

nextPageSize

Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to inactivity, PullRequest has cancelled this review job. You can reactivate the code review job from the PullRequest dashboard

Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just merge the y.PageBuffer one. :lgtm_strong:

Reviewable status: 4 of 6 files reviewed, 10 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Thank you for adding so many tests. 👍

Reviewed 3 of 10 files at r5, 2 of 3 files at r7, 1 of 1 files at r8, 2 of 3 files at r9, 2 of 2 files at r10.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])


y/y.go, line 381 at r10 (raw file):

			break
		}
		data = data[n:]

I don't think we should do this. The caller of write would never expect the passed sliced to be modified in any way. The documentation for io.Writer interface clearly states

 Write must not modify the slice data, even temporarily.

https://golang.org/pkg/io/#Writer


y/y.go, line 392 at r10 (raw file):

// WriteByte writes data byte to PageBuffer and returns any encountered error.
func (b *PageBuffer) WriteByte(data byte) error {
	_, err := b.Write([]byte{data}) // Can be changed later.

I don't understand the comment. What can be changed later?


y/y_test.go, line 42 at r10 (raw file):

	rand.Seed(time.Now().Unix())

	var bytesBuffer bytes.Buffer // This is just of verifying result.

just of for verifying the result.


y/y_test.go, line 144 at r10 (raw file):

	require.NoError(t, err, "unable to read error")
	require.True(t, n == len(rb), "length read should be equal")
	// Read same number of bytes using ReaderAt.

This comment looks outdated.


y/y_test.go, line 152 at r10 (raw file):

	require.NoError(t, err, "unable to read error")
	require.True(t, n == 10, "length read should be equal")
	// Read same number of bytes using ReaderAt.

This comment looks outdated.


y/y_test.go, line 158 at r10 (raw file):

	// Check if EOF is returned at end or not.
	n, err = reader.Read(rb[:10])
	require.Equal(t, err, io.EOF, "EOF should be returned at end")

Can you add another check

require.Zero(t, n, "read should return zero bytes")

@ashish-goswami ashish-goswami marked this pull request as ready for review September 10, 2019 14:08
@ashish-goswami ashish-goswami requested a review from a team September 10, 2019 14:08
Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 15 unresolved discussions (waiting on @ashish-goswami, @manishrjain, and @pullrequest[bot])


y/y.go, line 381 at r10 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

I don't think we should do this. The caller of write would never expect the passed sliced to be modified in any way. The documentation for io.Writer interface clearly states

 Write must not modify the slice data, even temporarily.

https://golang.org/pkg/io/#Writer

My bad. The underlying data slice is not being modified.

Copy link
Contributor Author

@ashish-goswami ashish-goswami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 6 files reviewed, 15 unresolved discussions (waiting on @ashish-goswami, @jarifibrahim, @manishrjain, and @pullrequest[bot])


y/y.go, line 361 at r9 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

nextPageSize

Done.


y/y.go, line 392 at r10 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

I don't understand the comment. What can be changed later?

Removed it.


y/y_test.go, line 42 at r10 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

just of for verifying the result.

Done.


y/y_test.go, line 144 at r10 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

This comment looks outdated.

Done.


y/y_test.go, line 152 at r10 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

This comment looks outdated.

Done.


y/y_test.go, line 158 at r10 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

Can you add another check

require.Zero(t, n, "read should return zero bytes")

Done.

@ashish-goswami ashish-goswami merged commit 86a77bb into master Sep 11, 2019
@ashish-goswami ashish-goswami deleted the mrjn/zbuffer branch September 11, 2019 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants