Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

script.testing.replication.tests_simple failing #1515

Open
mbutrovich opened this issue Mar 11, 2021 · 2 comments · Fixed by #1526
Open

script.testing.replication.tests_simple failing #1515

mbutrovich opened this issue Mar 11, 2021 · 2 comments · Fixed by #1526
Labels
bug Something isn't working (correctness). Mark issues with this.

Comments

@mbutrovich
Copy link
Contributor

Seen it on PRs and on master a few times since #1472 merged.

Example from most recent master build:

http://jenkins.db.cs.cmu.edu:8080/blue/organizations/jenkins/terrier/detail/master/921/pipeline

I know @lmwnshn is aware and watching it, but just want to track it here.

@mbutrovich mbutrovich added the bug Something isn't working (correctness). Mark issues with this. label Mar 11, 2021
@lmwnshn
Copy link
Contributor

lmwnshn commented Mar 11, 2021

This might be a race that in theory is fixable by #1511 once that's done.

Specifically, currently in DBMain initialization:

  1. Replication is disabled
  2. Catalog bootstrap runs
  3. Replication is enabled (but it is not guaranteed that the bootstrap has completed)

I am currently concerned that our approach to buffer-level replication just won't work at all / needs some major overhauling around some invalid assumptions, so I'm sitting on it for a while.

@lmwnshn
Copy link
Contributor

lmwnshn commented Apr 16, 2021

Oddly, http://jenkins.db.cs.cmu.edu:8080/blue/organizations/jenkins/terrier/detail/PR-1534/11/pipeline happening again.

This time, the replicas are synced, however, it looks like we hang trying to query replica2 -- did it crash, is it stuck on connection threads, unknown right now.

Trace:

04-16-2021 17:44:03,605 [db_server.py:236] INFO : Executing SQL on primary: CREATE TABLE foo (a INTEGER);
04-16-2021 17:44:03,902 [db_server.py:236] INFO : Executing SQL on primary: INSERT INTO foo VALUES (1);
04-16-2021 17:44:04,402 [db_server.py:236] INFO : Executing SQL on primary: SELECT replication_get_last_txn_id();
04-16-2021 17:44:04,435 [db_server.py:236] INFO : Executing SQL on replica1: SELECT replication_get_last_txn_id();
04-16-2021 17:44:04,468 [utils_sql.py:100] INFO : Syncing replica: [primary@1582] [replica1@1582]
04-16-2021 17:44:04,469 [db_server.py:236] INFO : Executing SQL on replica1: SELECT a FROM foo ORDER BY a ASC;
04-16-2021 17:44:04,503 [db_server.py:236] INFO : Executing SQL on primary: INSERT INTO foo VALUES (2);
04-16-2021 17:44:04,903 [db_server.py:236] INFO : Executing SQL on primary: SELECT replication_get_last_txn_id();
04-16-2021 17:44:06,322 [db_server.py:236] INFO : Executing SQL on replica1: SELECT replication_get_last_txn_id();
04-16-2021 17:44:06,925 [utils_sql.py:100] INFO : Syncing replica: [primary@2159] [replica1@2159]
04-16-2021 17:44:06,925 [db_server.py:236] INFO : Executing SQL on replica2: SELECT replication_get_last_txn_id();
04-16-2021 17:44:07,243 [utils_sql.py:100] INFO : Syncing replica: [primary@2159] [replica2@2159]

@lmwnshn lmwnshn reopened this Apr 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working (correctness). Mark issues with this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants