Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent migration jobs won't timeout because the Lock method does not respect the -lock-timeout value #269

Open
Voles opened this issue Aug 16, 2019 · 4 comments

Comments

@Voles
Copy link

Voles commented Aug 16, 2019

Note: written together with @hugoboos

Describe the Bug
When starting a migration job while another migration is still running, the new migration job keeps waiting on the first job and never times out.

Steps to Reproduce

  1. Create a long running (60 seconds in this case) migration job.

Filename
0001_initialize_schema.up.sql

Contents

SELECT pg_sleep(60);
  1. Start the long running migration job
$ migrate -source file://<path-to-migration-file> -database postgres://postgres:postgres@localhost/postgres up
  1. In another shell, start the same migration job (while the first one is still running).
$ migrate -source file://<path-to-migration-file> -database postgres://postgres:postgres@localhost/postgres up

Expected Behavior
The first job finishes in 60 seconds (according to the sleep in migration file). The second job fails after 15 seconds (this is the default value for -lock-timeout).

Migrate Version
v4.5.0, installed using Brew.
Obtained by running: migrate -version

Loaded Source Drivers
e.g. file
Obtained by running: migrate -help

Loaded Database Drivers
postgres
Obtained by running: migrate -help

Go Version
go version go1.12.5 darwin/amd64
Obtained by running: go version

Stacktrace
Not applicable.

Additional context
The timeout logic inside migrate.go:

func (m *Migrate) lock() error {
should also be implemented when ensuring the version table. Eg. for the Postgres driver, that would be in postgres.go:

if err := px.ensureVersionTable(); err != nil {

We suggest to implement the timeout logic inside the Lock method of the database driver. Eg. for Postgres that would be inside postgres.go:

func (p *Postgres) Lock() error {

@dhui
Copy link
Member

dhui commented Oct 14, 2019

Weird, I'd expect the 2nd migration job to timeout since the ErrLockTimeout should be sent to errchan.

errchan <- ErrLockTimeout

@dhui
Copy link
Member

dhui commented Jan 3, 2020

Adding context to the driver interface would help. See: #14

In the meanwhile, we could add a new option/config for lock timeouts in the postgres driver

@ynori7
Copy link

ynori7 commented May 6, 2020

I think adding context won't help. I just had the issue that a query got stuck for hours. After restarting all instances of the service, I ended up with multiple SELECT pg_advisory_lock($1) stuck in the background, all waiting for a lock even though the services which initiated those queries weren't even running anymore. If the service goes offline before the context is canceled then the lock will never be released, and attempts to obtain a lock will hang forever.

Is there any particular reason why we couldn't run the migrations within a transaction and set the lock timeout within the transaction?

_, err = tx.Exec("SELECT set_config('lock_timeout', $1, true);", "60s")
if nil != err {
    return fmt.Errorf("failed set lock_timeout: %w", err)
}
_, err = tx.Exec("SELECT pg_advisory_xact_lock($1);", hash(p.nameOfLock))
if nil != err {
    return fmt.Errorf("failed selecting lock: %w", err)
}

@dhui
Copy link
Member

dhui commented May 6, 2020

Is there any particular reason why we couldn't run the migrations within a transaction

See: #196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants