Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-locking, read-only opens #432

Merged
merged 10 commits into from
Mar 8, 2018
Merged

Conversation

allenluce
Copy link
Contributor

@allenluce allenluce commented Mar 7, 2018

This introduces a read-only option on the DB struct. The main effect is to force all transactions to be read-only. It also skips anything that might write to disk.

It will put a shared lock on the directory when it's opened read-only. This will allow other read-only opens but will deny read-write opens until all readers have closed. If the database is already open read-write and an attempt is made to open it again (read-only or read-write), an error occurs.

It also:

  • Fails if the manifest doesn't exist on read-only open
  • Does not attempt to truncate the manifest
  • Skips compactors and memtable
  • All vlogs are opened read-only

This change assumes that there will be no need to write anything during read-only operations, no need to do garbage collection or compaction or revising vlogs or anything. I'm not sure that this is a 100% valid assumption.

This addresses #372


This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Mar 7, 2018

CLA assistant check
All committers have signed the CLA.

This introduces a read-only option on the DB struct. The main effect
is to force all transactions to be read-only. It also skips anything
that might write to disk.

It will put a shared lock on the directory when it's opened
read-only. This will allow other read-only opens but will deny
read-write opens until all readers have closed. If the database is
already open read-write and an attempt is made to open it
again (read-only or read-write), an error occurs.

It also:

 - Fails if the manifest doesn't exist on read-only open
 - Does not attempt to truncate the manifest
 - Skips compactors and memtable
 - All vlogs are opened read-only
There shouldn't be any need anyways.
@kataras
Copy link
Contributor

kataras commented Mar 7, 2018

Truly, nice job @allenluce, is that tested on windows operating system as well?

@manishrjain
Copy link
Contributor

Really nice change! Got a few comments. Once you address them, I'll pull this change in.

@allenluce : You can use the -t flag. The -t flag instructs get to also download the packages required to build
the tests for the specified packages.


Reviewed 11 of 11 files at r1.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


db.go, line 261 at r1 (raw file):

	}

	if !opt.ReadOnly {

If multiple processes are using the same Badger directory, then none of them should write back anything. So, also ensure that value log isn't replayed on start.

This would mean that if the DB had crashed before, then a read-only open would appear to lose some data. We must warn users about that in the documentation.

Alternatively: If ReadOnly is set, then the number of memtables above L0 should be set to a very high value, to allow value log to be replayed and stored in memtables, but without the process getting blocked on them getting compacted. This might be a better option to avoid an apparent loss of data.

In any case, all these only take effect if the last run of read-write instance Badger had crashed, and not properly closed.


db.go, line 333 at r1 (raw file):

	// Now close the value log.
	if vlogErr := db.vlog.Close(db.opt.ReadOnly); err == nil {

Value log takes a copy of options on Open. So, it should already know about readonly flag and doesn't need to be passed explicitly.

https://github.com/dgraph-io/badger/blob/e8ce3e9d89507ec60f9b1f2f4c7539ae3efe61cd/db.go#L265


db_test.go, line 1573 at r1 (raw file):

	require.Contains(t, err.Error(), "No sets or deletes are allowed in a read-only transaction")
	err = txn.Commit(nil)
	require.NoError(t, err)

LGTM Nice test.


manifest.go, line 263 at r1 (raw file):

		return nil, 0, err
	}
	fp, err = y.OpenExistingSyncedFile(manifestPath, false, false)

Instead of 2 boolean params, how about using an int based flag. And do something like:

y.OpenExistingFile(manifestPath, y.Sync | y.ReadOnly)


options.go, line 90 at r1 (raw file):

	maxBatchSize  int64 // max batch size in bytes

	// Whether the DB allows writes. With ReadOnly set, multiple Opens

// Open the DB as read-only. With this set, multiple processes can open the same Badger DB.


transaction.go, line 514 at r1 (raw file):

//  // Call various APIs.
func (db *DB) NewTransaction(update bool) *Txn {
	// Is the DB read-only?

Convert this question into a statement.


value.go, line 558 at r1 (raw file):

		if fid == maxFid {
			if lf.fd, err = y.OpenExistingSyncedFile(vlog.fpath(fid),
				vlog.opt.SyncWrites, readOnly); err != nil {

Same as above. This should be y.OpenExistingFile(path, y.Sync|y.ReadOnly).

Note the function name has been changed to reflect that this function is no longer only about sync.


value.go, line 615 at r1 (raw file):

}

func (vlog *valueLog) Close(readOnly bool) error {

This argument passing shouldn't be required because value log holds an opt as well.


Comments from Reviewable

@allenluce
Copy link
Contributor Author

I don't have easy access to a Windows build environment. I'd expect this to fail on Windows: the signature for acquireDirectoryLock in dir_windows.go is out of sync and the locking is exclusive only.


Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


Comments from Reviewable

@manishrjain
Copy link
Contributor

The -t comment is for:

The checks fail because internal references to dgraph-io/badger pull up the code on master, not the code to be tested. This is a problem with Go packaging in general.

If there is data in a vlog to be replayed, opening the DB with
ReadOnly set will return an error.
The vlog structure already carries the opts from it's Open()
To reflect that it's not just for synced files.
@allenluce
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


db.go, line 261 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If multiple processes are using the same Badger directory, then none of them should write back anything. So, also ensure that value log isn't replayed on start.

This would mean that if the DB had crashed before, then a read-only open would appear to lose some data. We must warn users about that in the documentation.

Alternatively: If ReadOnly is set, then the number of memtables above L0 should be set to a very high value, to allow value log to be replayed and stored in memtables, but without the process getting blocked on them getting compacted. This might be a better option to avoid an apparent loss of data.

In any case, all these only take effect if the last run of read-write instance Badger had crashed, and not properly closed.

I've taken the first approach somewhat modified: if the DB is opened read-only and there's data to play back, Open() returns an error saying the database had not been properly closed.


db.go, line 333 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Value log takes a copy of options on Open. So, it should already know about readonly flag and doesn't need to be passed explicitly.

https://github.com/dgraph-io/badger/blob/e8ce3e9d89507ec60f9b1f2f4c7539ae3efe61cd/db.go#L265

Cool, I've corrected.


db_test.go, line 1573 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

LGTM Nice test.

Thanks!


manifest.go, line 263 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead of 2 boolean params, how about using an int based flag. And do something like:

y.OpenExistingFile(manifestPath, y.Sync | y.ReadOnly)

OK, I switched to that. Not sure the way the function is currently used makes the flags any less clunky.


options.go, line 90 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

// Open the DB as read-only. With this set, multiple processes can open the same Badger DB.

Changed.


transaction.go, line 514 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Convert this question into a statement.

Done.


value.go, line 558 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Same as above. This should be y.OpenExistingFile(path, y.Sync|y.ReadOnly).

Note the function name has been changed to reflect that this function is no longer only about sync.

Gottit. I've changed the function names.


value.go, line 615 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

This argument passing shouldn't be required because value log holds an opt as well.

Done.


Comments from Reviewable

These tests are useless on Windows until the placeholder in
dir_windows.go is removed.
@manishrjain
Copy link
Contributor

Thanks for addressing the comments. One last round of follow up comments, and then we should be good to go.


Reviewed 10 of 10 files at r2.
Review status: all files reviewed at latest revision, 8 unresolved discussions.


db.go, line 295 at r2 (raw file):

	go db.doWrites(replayCloser)

	if err = db.vlog.Replay(vptr, opt.ReadOnly, replayFunction(db)); err != nil {

You don't need to pass the opt here. vlog already has that opt.


db_test.go, line 1526 at r2 (raw file):

	_, err = Open(opts)
	require.Error(t, err)
	if err.Error() == "read-only mode is not supported on Windows" {

Instead of checking the wordings, create an error variable (errors.go), and check equality of that variable.


dir_windows.go, line 62 at r2 (raw file):

func acquireDirectoryLock(dirPath string, pidFileName string, readOnly bool) (*directoryLockGuard, error) {
	if readOnly {
		return nil, fmt.Errorf("read-only mode is not supported on Windows")

Use an error variable. You can define it in errors.go.


manifest.go, line 263 at r1 (raw file):

Previously, allenluce (Allen Luce) wrote…

OK, I switched to that. Not sure the way the function is currently used makes the flags any less clunky.

Thanks. This is more in line with how Linux flags are passed.


options.go, line 93 at r2 (raw file):

	// open the same Badger DB. Note: if the DB being opened had crashed
	// before and has vlog data to be replayed, ReadOnly will cause Open
	// to fail (with an appropriate message).

Remove the round brackets.


value.go, line 303 at r2 (raw file):

		if readOnly {
			return errors.Errorf("database was not properly closed, cannot open read-only")

Create an error variable in errors.go, and use that error.


value.go, line 672 at r2 (raw file):

// Replay replays the value log. The kv provided is only valid for the lifetime of function call.
func (vlog *valueLog) Replay(ptr valuePointer, readOnly bool, fn logEntry) error {

readOnly doesn't need to be passed. vlog should already know this via opt.


value_test.go, line 533 at r2 (raw file):

	kv, err = Open(opts)
	require.Error(t, err)
	require.Regexp(t, "database was not properly closed, cannot open read-only|read-only mode is not supported on Windows", err.Error())

You can check error variable equality instead.


Comments from Reviewable

Arg cleanups + house preferences.
@allenluce
Copy link
Contributor Author

Review status: 6 of 13 files reviewed at latest revision, 8 unresolved discussions.


db.go, line 295 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You don't need to pass the opt here. vlog already has that opt.

Done.


db_test.go, line 1526 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Instead of checking the wordings, create an error variable (errors.go), and check equality of that variable.

Done.


dir_windows.go, line 62 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Use an error variable. You can define it in errors.go.

Done.


options.go, line 93 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Remove the round brackets.

Done.


value.go, line 303 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Create an error variable in errors.go, and use that error.

Done.


value.go, line 672 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

readOnly doesn't need to be passed. vlog should already know this via opt.

Done.


value_test.go, line 533 at r2 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

You can check error variable equality instead.

The errors.Wrapf() in value.go makes this difficult.


Comments from Reviewable

@manishrjain
Copy link
Contributor

:lgtm: Thanks for the PR, @allenluce. I'll merge it in.


Review status: 6 of 13 files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@manishrjain manishrjain merged commit dc0df25 into hypermodeinc:master Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants