Concurrent migration jobs won't timeout because the Lock method does not respect the -lock-timeout value #269

Voles · 2019-08-16T13:52:36Z

Note: written together with @hugoboos

Describe the Bug
When starting a migration job while another migration is still running, the new migration job keeps waiting on the first job and never times out.

Steps to Reproduce

Create a long running (60 seconds in this case) migration job.

Filename
0001_initialize_schema.up.sql

Contents

SELECT pg_sleep(60);

Start the long running migration job

$ migrate -source file://<path-to-migration-file> -database postgres://postgres:postgres@localhost/postgres up

In another shell, start the same migration job (while the first one is still running).

$ migrate -source file://<path-to-migration-file> -database postgres://postgres:postgres@localhost/postgres up

Expected Behavior
The first job finishes in 60 seconds (according to the sleep in migration file). The second job fails after 15 seconds (this is the default value for -lock-timeout).

Migrate Version
v4.5.0, installed using Brew.
Obtained by running: migrate -version

Loaded Source Drivers
e.g. file
Obtained by running: migrate -help

Loaded Database Drivers
postgres
Obtained by running: migrate -help

Go Version
go version go1.12.5 darwin/amd64
Obtained by running: go version

Stacktrace
Not applicable.

Additional context
The timeout logic inside migrate.go:

migrate/migrate.go

Line 885 in b071731

func (m *Migrate) lock() error {

should also be implemented when ensuring the version table. Eg. for the Postgres driver, that would be in postgres.go:

migrate/database/postgres/postgres.go

Line 101 in 6c96ef0

if err := px.ensureVersionTable(); err != nil {

We suggest to implement the timeout logic inside the Lock method of the database driver. Eg. for Postgres that would be inside postgres.go:

migrate/database/postgres/postgres.go

Line 143 in 6c96ef0

func (p *Postgres) Lock() error {

The text was updated successfully, but these errors were encountered:

dhui · 2019-10-14T06:42:46Z

Weird, I'd expect the 2nd migration job to timeout since the ErrLockTimeout should be sent to errchan.

migrate/migrate.go

Line 913 in 3dc8182

errchan <- ErrLockTimeout

dhui · 2020-01-03T09:01:23Z

Adding context to the driver interface would help. See: #14

In the meanwhile, we could add a new option/config for lock timeouts in the postgres driver

ynori7 · 2020-05-06T14:43:28Z

I think adding context won't help. I just had the issue that a query got stuck for hours. After restarting all instances of the service, I ended up with multiple SELECT pg_advisory_lock($1) stuck in the background, all waiting for a lock even though the services which initiated those queries weren't even running anymore. If the service goes offline before the context is canceled then the lock will never be released, and attempts to obtain a lock will hang forever.

Is there any particular reason why we couldn't run the migrations within a transaction and set the lock timeout within the transaction?

_, err = tx.Exec("SELECT set_config('lock_timeout', $1, true);", "60s")
if nil != err {
    return fmt.Errorf("failed set lock_timeout: %w", err)
}
_, err = tx.Exec("SELECT pg_advisory_xact_lock($1);", hash(p.nameOfLock))
if nil != err {
    return fmt.Errorf("failed selecting lock: %w", err)
}

dhui · 2020-05-06T21:10:51Z

Is there any particular reason why we couldn't run the migrations within a transaction

See: #196

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrent migration jobs won't timeout because the Lock method does not respect the -lock-timeout value #269

Concurrent migration jobs won't timeout because the Lock method does not respect the -lock-timeout value #269

Voles commented Aug 16, 2019 •

edited

Loading

dhui commented Oct 14, 2019

dhui commented Jan 3, 2020

ynori7 commented May 6, 2020

dhui commented May 6, 2020

Concurrent migration jobs won't timeout because the Lock method does not respect the -lock-timeout value #269

Concurrent migration jobs won't timeout because the Lock method does not respect the -lock-timeout value #269

Comments

Voles commented Aug 16, 2019 • edited Loading

dhui commented Oct 14, 2019

dhui commented Jan 3, 2020

ynori7 commented May 6, 2020

dhui commented May 6, 2020

Voles commented Aug 16, 2019 •

edited

Loading