Synchronous Replication #1472

lmwnshn · 2021-02-15T08:11:46Z

Synchronous Replication

This PR adds support for synchronous replication to noisepage. Replication depends on the messenger and recovery manager being enabled.

Summary

primary: ./noisepage --messenger_enable=true --replication_enable=true --port=15721 --messenger_port=9022 --replication_port=15445 --network_identity=primary
replica1: ./noisepage --messenger_enable=true --replication_enable=true --port=15722 --messenger_port=9023 --replication_port=15446 --network_identity=replica1
replica2: ./noisepage --messenger_enable=true --replication_enable=true --port=15723 --messenger_port=9024 --replication_port=15447 --network_identity=replica2

To add more primaries/replicas, see build-support/data/replication.config. Note that this is copied to your build/bin folder as a post-build of noisepage.

You should be able to run queries on the primary and run basic read-only things on the replicas.

Note that there is currently no "I'm missing logs xyz" mechanism, so replicas have to have seen all the primary's logs as they were generated and shipped out.

Background

Replication builds on previous work by @GustavoAngulo and @tpan496, specifically, we still use the old recovery manager logic essentially as-is. We keep the recovery task running in a loop. However, ITP internal terrier protocol is removed (and if you're not sure what ITP did, that's fine).
Messages (which include the log buffers being replicated) are sent through the ZeroMQ-backed messenger, in what is essentially JSON/cbor format. This might prove to be a bottleneck someday, we can revisit the format if so. Since it is just JSON, buffer IDs, etc, are also tacked onto the JSON.

Description

Fix jumbotest_messenger_messenger_test timesout on jenkins #1319. I believe this was caused by the Messenger's ListenForConnection being a non-blocking add to the list of routers. This changes it so that the ListenForConnection waits on a cvar which the messenger's main loop will signal whenever it grabs and adds new routers.
Adds a new builtin replication_get_last_record_id(). On the primary, this returns the last record ID that was sent out, and on replicas, this returns the last record ID that was applied. This is used in testing to see if the replicas are up to date. This is implemented by pushing a pointer to the replication manager into the execution context.
Most of the PR is in replication/replication_manager and storage/replication_log_provider.
Adds a new test script/testing/replication/tests_simple, which will run under the simple unittests part of CI as well. No end to end oltpbench yet.
Adds the idea of a transaction-wide RetentionPolicy, which is however not currently able to distinguish between "serialize locally, don't replicate" and "serialize locally, also replicate". We probably have to revisit the implementation. The idea is that at a per-transaction level, you describe whether (1) you serialize and replicate logs (2) you only serialize logs or (3) you don't serialize logs at all.

Future work, future PR

RetentionPolicy is basically just a flag right now, we need to figure out more details on doing it properly.
Replication is now also on the critical path before a commit callback is invoked. This is probably inadvisable. Async replication has been buggy / racey, but that's the next step.
OLTPBench e2e testing for replication

Instead of sender and receiver callback IDs, which I got confused by, use source and destination callback IDs. This is helpful when writing callbacks that take in a ZmqResponse message, because now the source is clearly the remote host and the destination is clearly the local host, whereas previously sender is ambiguous since the receiver is now sending a message to the original sender. That last sentence is itself confusing.

…e (API).

…queue. Additionally, change replication to blocking to make it more reliable. However, this is unacceptably slow and should be undone at some point.

…d exec ctx.

…hould check with someone.

…p to 30.

More seriously, the messenger listen used to be the last message. Now it is back to the domain socket listen being the last message.

…ing all SendMessage-related callbacks, before the server loop gets a chance to operate on the message.

lmwnshn · 2021-03-08T21:58:27Z

@jkosh44 Sorry to keep bugging you, could you look at this commit?
649f051

This came up before in the ModelServerManager.
The problem is that the Messenger is the owner of all the callbacks that get passed in with a SendMessage(msg, ..., callback) call ("send message callback").
This is distinct from the ServerLoops that are created during a ListenForConnection ("server loop callback").

I refactored it so that the Messenger exposed its callbacks, and the server loop callback became responsible for invoking the send message callback.
This is why my modeling_test kept failing CI, because the model server callback wasn't updated accordingly.

But upon further reflection, I think it is preferable that the Messenger automagically handles it.
The main limitation is that, in the current implementation this means your send message callback will always be invoked before the model server callback.

jkosh44 · 2021-03-09T01:41:34Z

src/messenger/messenger.cpp

 }

 void Messenger::ListenForConnection(const ConnectionDestination &target, const std::string &identity,
                                    CallbackFn callback) {
-  std::lock_guard lock(routers_add_mutex_);
+  std::unique_lock lock(routers_add_mutex_);


I'm a little confused by this part. If we're taking the time to get a latch, why not just have the latch be for routers_ and insert it directly into that? Is there something I'm missing here?

Added a comment, if it makes sense?

…tead of destination callback ID, in line with the model server manager.

noisepage-checks · 2021-03-09T05:34:36Z

Major Decrease in Performance

STOP: this PR has a major negative performance impact

tps (%change)	benchmark_type	wal_device	details
0.74%	tpcc	RAM disk	Details master tps=8832.55, commit tps=8897.84, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=RAM disk, max_connection_threads=32
0.65%	tpcc	None	Details master tps=10133.65, commit tps=10199.84, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=None, max_connection_threads=32
-19.24%	tpcc	HDD	Details master tps=566.75, commit tps=457.72, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=HDD, max_connection_threads=32
2.0%	tatp	RAM disk	Details master tps=3781.11, commit tps=3856.87, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=RAM disk, max_connection_threads=32
-3.07%	tatp	None	Details master tps=4157.76, commit tps=4029.93, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=None, max_connection_threads=32
-23.07%	tatp	HDD	Details master tps=424.69, commit tps=326.73, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=HDD, max_connection_threads=32

codecov · 2021-03-09T05:34:37Z

Codecov Report

Merging #1472 (9868193) into master (6d7da24) will decrease coverage by 0.49%.
The diff coverage is 23.22%.

@@            Coverage Diff             @@
##           master    #1472      +/-   ##
==========================================
- Coverage   81.53%   81.04%   -0.50%     
==========================================
  Files         681      685       +4     
  Lines       48251    48640     +389     
==========================================
+ Hits        39340    39418      +78     
- Misses       8911     9222     +311

Impacted Files	Coverage Δ
src/execution/exec/execution_context.cpp	`80.17% <0.00%> (-1.41%)`	⬇️
src/execution/sema/sema_builtin.cpp	`61.10% <0.00%> (-0.46%)`	⬇️
src/execution/vm/bytecode_generator.cpp	`83.15% <0.00%> (-0.49%)`	⬇️
src/execution/vm/vm.cpp	`83.18% <0.00%> (-0.30%)`	⬇️
src/include/execution/ast/builtins.h	`100.00% <ø> (ø)`
src/include/execution/sema/sema.h	`100.00% <ø> (ø)`
src/include/execution/vm/bytecode_generator.h	`100.00% <ø> (ø)`
src/include/execution/vm/bytecode_handlers.h	`68.00% <0.00%> (-0.20%)`	⬇️
src/include/execution/vm/bytecodes.h	`96.42% <ø> (ø)`
src/include/messenger/messenger.h	`0.00% <0.00%> (ø)`
... and 42 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6d7da24...9868193. Read the comment docs.

mbutrovich · 2021-03-09T17:33:58Z

src/messenger/messenger.cpp

 }

 void Messenger::ListenForConnection(const ConnectionDestination &target, const std::string &identity,
                                    CallbackFn callback) {
-  std::lock_guard lock(routers_add_mutex_);
+  std::unique_lock lock(routers_add_mutex_);
  // TODO(WAN): all this copying is stupid.


Can you give a bit more on what you'd do to fix it (if you know) and what the restriction on implementing it now is (it's okay if it's just time).

Oh, just saw this. I think this was an old comment that wanted to avoid the indirection of

T1: invokes ListenForConnection trying to add a new connection endpoint, which enqueues the router to be added

dedicated messenger thread T2: grabs it off the queue, adds it

In addressing Joe's comment, I added documentation that notes why this is necessary (iirc zeromq requires that sockets are used on the thread where they are created)

lmwnshn · 2021-03-09T17:38:50Z

src/include/storage/write_ahead_log/log_manager.h

@@ -153,7 +166,7 @@ class LogManager : public common::DedicatedThreadOwner {
  std::vector<BufferedLogWriter> buffers_;
  // The queue containing empty buffers which the serializer thread will use. We use a blocking queue because the
  // serializer thread should block when requesting a new buffer until it receives an empty buffer
-  common::ConcurrentBlockingQueue<BufferedLogWriter *> empty_buffer_queue_;


@mbutrovich This got pulled out to DBMain so that other components, e.g., replication manager, can return buffers to it.

What do we believe the argument for this data structure is? Performance over malloc? Back-pressure on the serializer if the disk consumer falls behind?

*was, I believe it was backpressure. This is more or less leaving things as they are.

src/include/storage/write_ahead_log/log_io.h

lmwnshn · 2021-03-09T17:40:52Z

src/include/storage/write_ahead_log/log_io.h

+   *
+   * @param policy The retention policy that describes the destinations for this BufferedLogWriter.
+   */
+  void PrepareForSerialization(transaction::RetentionPolicy policy) {


@mbutrovich The current retention policy thing.

noisepage-checks · 2021-03-09T19:42:03Z

Minor Decrease in Performance

Be warned: this PR may have decreased the throughput of the system slightly.

tps (%change)	benchmark_type	wal_device	details
-0.49%	tpcc	RAM disk	Details master tps=8940.36, commit tps=8896.24, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=RAM disk, max_connection_threads=32
-0.17%	tpcc	None	Details master tps=10142.5, commit tps=10124.82, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=None, max_connection_threads=32
-1.99%	tpcc	HDD	Details master tps=539.43, commit tps=528.7, query_mode=extended, benchmark_type=tpcc, scale_factor=32.0000, terminals=32, client_time=60, weights={'Payment': 43, 'Delivery': 4, 'NewOrder': 45, 'StockLevel': 4, 'OrderStatus': 4}, wal_device=HDD, max_connection_threads=32
-1.17%	tatp	RAM disk	Details master tps=3789.09, commit tps=3744.81, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=RAM disk, max_connection_threads=32
-1.97%	tatp	None	Details master tps=4136.99, commit tps=4055.56, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=None, max_connection_threads=32
-4.65%	tatp	HDD	Details master tps=385.23, commit tps=367.31, query_mode=extended, benchmark_type=tatp, scale_factor=1.0000, terminals=16, client_time=60, weights={'GetAccessData': 35, 'UpdateLocation': 14, 'GetNewDestination': 10, 'GetSubscriberData': 35, 'DeleteCallForwarding': 2, 'InsertCallForwarding': 2, 'UpdateSubscriberData': 2}, wal_device=HDD, max_connection_threads=32

lmwnshn · 2021-03-09T22:10:51Z

All numbers on dev9.

This PR

/bin/noisepage  -connection_thread_count=32 -record_buffer_segment_size=10000000

---

Benchmark:     TPCC {com.oltpbenchmark.benchmarks.tpcc.TPCCBenchmark}
Configuration: ./config/tpcc_config_postgres.xml
Type:          NOISEPAGE
Driver:        org.postgresql.Driver
URL:           jdbc:postgresql://localhost:15721/noisepage
Isolation:     TRANSACTION_READ_COMMITTED
Scale Factor:  32.0

16:39:42,651 (DBWorkload.java:271) INFO  - ======================================================================
16:39:42,663 (DBWorkload.java:533) INFO  - Creating new TPCC database...
16:39:42,733 (DBWorkload.java:535) INFO  - Finished!
16:39:42,734 (DBWorkload.java:536) INFO  - ======================================================================
16:39:42,734 (DBWorkload.java:559) INFO  - Loading data into TPCC database with 80 threads...
16:42:46,477 (DBWorkload.java:563) INFO  - Finished!
16:42:46,478 (DBWorkload.java:564) INFO  - ======================================================================
16:42:46,480 (DBWorkload.java:849) INFO  - Creating 32 virtual terminals...
16:42:46,528 (DBWorkload.java:854) INFO  - Launching the TPCC Benchmark with 1 Phase...
16:42:46,541 (ThreadBench.java:341) INFO  - PHASE START :: [Workload=TPCC] [Serial=false] [Time=60] [WarmupTime=0] [Rate=10000] [Arrival=REGULAR] [Ratios=[45.0, 43.0, 4.0, 4.0, 4.0]] [ActiveWorkers=32]
16:42:46,542 (ThreadBench.java:492) INFO  - MEASURE :: Warmup complete, starting measurements.
16:43:46,544 (ThreadBench.java:447) INFO  - TERMINATE :: Waiting for all terminals to finish ..
16:43:46,547 (ThreadBench.java:508) INFO  - Attempting to stop worker threads and collect measurements
16:43:46,548 (ThreadBench.java:247) INFO  - Starting WatchDogThread
16:43:46,804 (DBWorkload.java:860) INFO  - ======================================================================
16:43:46,808 (DBWorkload.java:861) INFO  - Rate limited reqs/s: Results(nanoSeconds=60000799209, measuredRequests=600017) = 10000.15012983358 requests/sec

/bin/noisepage  -connection_thread_count=32 -record_buffer_segment_size=10000000 -wal-enable=false

---

16:46:43,860 (DBWorkload.java:270) INFO  - ======================================================================

Benchmark:     TPCC {com.oltpbenchmark.benchmarks.tpcc.TPCCBenchmark}
Configuration: ./config/tpcc_config_postgres.xml
Type:          NOISEPAGE
Driver:        org.postgresql.Driver
URL:           jdbc:postgresql://localhost:15721/noisepage
Isolation:     TRANSACTION_READ_COMMITTED
Scale Factor:  32.0

16:46:43,861 (DBWorkload.java:271) INFO  - ======================================================================
16:46:43,873 (DBWorkload.java:533) INFO  - Creating new TPCC database...
16:46:43,936 (DBWorkload.java:535) INFO  - Finished!
16:46:43,936 (DBWorkload.java:536) INFO  - ======================================================================
16:46:43,936 (DBWorkload.java:559) INFO  - Loading data into TPCC database with 80 threads...
16:49:06,981 (DBWorkload.java:563) INFO  - Finished!
16:49:06,981 (DBWorkload.java:564) INFO  - ======================================================================
16:49:06,984 (DBWorkload.java:849) INFO  - Creating 32 virtual terminals...
16:49:07,035 (DBWorkload.java:854) INFO  - Launching the TPCC Benchmark with 1 Phase...
16:49:07,048 (ThreadBench.java:341) INFO  - PHASE START :: [Workload=TPCC] [Serial=false] [Time=60] [WarmupTime=0] [Rate=10000] [Arrival=REGULAR] [Ratios=[45.0, 43.0, 4.0, 4.0, 4.0]] [ActiveWorkers=32]
16:49:07,048 (ThreadBench.java:492) INFO  - MEASURE :: Warmup complete, starting measurements.
16:50:07,050 (ThreadBench.java:447) INFO  - TERMINATE :: Waiting for all terminals to finish ..
16:50:07,052 (ThreadBench.java:508) INFO  - Attempting to stop worker threads and collect measurements
16:50:07,053 (ThreadBench.java:247) INFO  - Starting WatchDogThread
16:50:07,267 (DBWorkload.java:860) INFO  - ======================================================================
16:50:07,271 (DBWorkload.java:861) INFO  - Rate limited reqs/s: Results(nanoSeconds=60000973926, measuredRequests=600021) = 10000.187675953626 requests/sec

lmwnshn · 2021-03-09T22:10:54Z

Main branch

./bin/noisepage  -connection_thread_count=32 -record_buffer_segment_size=10000000 -wal_enable=false

---

16:54:21,461 (DBWorkload.java:270) INFO  - ======================================================================

Benchmark:     TPCC {com.oltpbenchmark.benchmarks.tpcc.TPCCBenchmark}
Configuration: ./config/tpcc_config_postgres.xml
Type:          NOISEPAGE
Driver:        org.postgresql.Driver
URL:           jdbc:postgresql://localhost:15721/noisepage
Isolation:     TRANSACTION_READ_COMMITTED
Scale Factor:  32.0

16:54:21,462 (DBWorkload.java:271) INFO  - ======================================================================
16:54:21,474 (DBWorkload.java:533) INFO  - Creating new TPCC database...
16:54:21,539 (DBWorkload.java:535) INFO  - Finished!
16:54:21,539 (DBWorkload.java:536) INFO  - ======================================================================
16:54:21,539 (DBWorkload.java:559) INFO  - Loading data into TPCC database with 80 threads...
16:57:21,793 (DBWorkload.java:563) INFO  - Finished!
16:57:21,793 (DBWorkload.java:564) INFO  - ======================================================================
16:57:21,796 (DBWorkload.java:849) INFO  - Creating 32 virtual terminals...
16:57:21,848 (DBWorkload.java:854) INFO  - Launching the TPCC Benchmark with 1 Phase...
16:57:21,860 (ThreadBench.java:341) INFO  - PHASE START :: [Workload=TPCC] [Serial=false] [Time=60] [WarmupTime=0] [Rate=10000] [Arrival=REGULAR] [Ratios=[45.0, 43.0, 4.0, 4.0, 4.0]] [ActiveWorkers=32]
16:57:21,860 (ThreadBench.java:492) INFO  - MEASURE :: Warmup complete, starting measurements.
16:58:21,862 (ThreadBench.java:447) INFO  - TERMINATE :: Waiting for all terminals to finish ..
16:58:21,866 (ThreadBench.java:508) INFO  - Attempting to stop worker threads and collect measurements
16:58:21,867 (ThreadBench.java:247) INFO  - Starting WatchDogThread
16:58:22,092 (DBWorkload.java:860) INFO  - ======================================================================
16:58:22,096 (DBWorkload.java:861) INFO  - Rate limited reqs/s: Results(nanoSeconds=60000207603, measuredRequests=600015) = 10000.21539875471 requests/sec

./bin/noisepage  -connection_thread_count=32 -record_buffer_segment_size=10000000

---

Benchmark:     TPCC {com.oltpbenchmark.benchmarks.tpcc.TPCCBenchmark}
Configuration: ./config/tpcc_config_postgres.xml
Type:          NOISEPAGE
Driver:        org.postgresql.Driver
URL:           jdbc:postgresql://localhost:15721/noisepage
Isolation:     TRANSACTION_READ_COMMITTED
Scale Factor:  32.0

17:05:25,011 (DBWorkload.java:271) INFO  - ======================================================================
17:05:25,024 (DBWorkload.java:533) INFO  - Creating new TPCC database...
17:05:25,098 (DBWorkload.java:535) INFO  - Finished!
17:05:25,098 (DBWorkload.java:536) INFO  - ======================================================================
17:05:25,098 (DBWorkload.java:559) INFO  - Loading data into TPCC database with 80 threads...
17:07:49,983 (DBWorkload.java:563) INFO  - Finished!
17:07:49,984 (DBWorkload.java:564) INFO  - ======================================================================
17:07:49,987 (DBWorkload.java:849) INFO  - Creating 32 virtual terminals...
17:07:50,038 (DBWorkload.java:854) INFO  - Launching the TPCC Benchmark with 1 Phase...
17:07:50,050 (ThreadBench.java:341) INFO  - PHASE START :: [Workload=TPCC] [Serial=false] [Time=60] [WarmupTime=0] [Rate=10000] [Arrival=REGULAR] [Ratios=[45.0, 43.0, 4.0, 4.0, 4.0]] [ActiveWorkers=32]
17:07:50,050 (ThreadBench.java:492) INFO  - MEASURE :: Warmup complete, starting measurements.
17:08:50,051 (ThreadBench.java:447) INFO  - TERMINATE :: Waiting for all terminals to finish ..
17:08:50,055 (ThreadBench.java:508) INFO  - Attempting to stop worker threads and collect measurements
17:08:50,057 (ThreadBench.java:247) INFO  - Starting WatchDogThread
17:08:50,284 (DBWorkload.java:860) INFO  - ======================================================================
17:08:50,290 (DBWorkload.java:861) INFO  - Rate limited reqs/s: Results(nanoSeconds=60000236384, measuredRequests=600005) = 10000.043935826905 requests/sec

lmwnshn · 2021-03-10T17:18:42Z

Corrected numbers since I forgot rate limits were a thing:

this branch no wal:
12:15:09,683 (DBWorkload.java:861) INFO - Rate limited reqs/s: Results(nanoSeconds=60000934377, measuredRequests=1738649) = 28977.032075461677 requests/sec

main no wal:
12:09:11,836 (DBWorkload.java:861) INFO - Rate limited reqs/s: Results(nanoSeconds=60000700282, measuredRequests=1693533) = 28225.220573101444 requests/sec

Within noise.

lmwnshn and others added 27 commits November 2, 2020 03:33

Checkpoint. New replication manager.

641bf6c

tmp unpush

c27168f

Checkpoint. Fix messenger_test.

3854236

Checkpoint.

6e13413

Checkpoint. Replication locally via noisepage. Pretty sure tests brok…

0de754e

…e (API).

Merge branch 'master' of github.com:cmu-db/noisepage into replication

afb1280

Switch to sanctionedsharedptr for logger.

5584bd4

Nuke dead code.

ea976b4

Eliminate potential race in messenger_test.

26c2e87

Checkpoint.

80f7767

Checkpoint.

375fb14

Checkpoint.

0d4a470

Checkpoint.

410cd96

Add Tianlei's ReplicationLogProvider port.

73cf888

Add support for simple replication.

ef64dd5

Checkpoint. Slight progress. ty matt

bd407cb

Fix the replica not returning serialized buffers to the empty buffer …

5ca79ba

…queue. Additionally, change replication to blocking to make it more reliable. However, this is unacceptably slow and should be undone at some point.

unpush me later.

4d83efd

Merge branch 'master' of github.com:cmu-db/noisepage into replication

380abb4

Checkpoint.

ed0c6bd

Checkpoint. Add dummy replication_get_last_record_id builtin.

e9803b6

Checkpoint. Nuke the ITP legacy stuff. Hook up replication manager an…

0acb22d

…d exec ctx.

Checkpoint. Shutdown, except we had to add some DBMain stuff that I s…

4739b65

…hould check with someone.

Merge branch 'master' of github.com:cmu-db/noisepage into replication

7d7d2ba

Checkpoint. recovery_test is failing?

730f62c

Minor cleanup.

3bf0b14

lmwnshn assigned lmwnshn and tpan496 Feb 15, 2021

lmwnshn changed the title ~~Replication~~ Synchronous Replication Feb 15, 2021

lmwnshn added 4 commits March 7, 2021 20:53

Nope. Didn't work. Back to having a 30 second sleep.

4a9936e

It might have been the SNDRCV timeout of 1 second. Try bumping that u…

e027b92

…p to 30.

I AM A MONKEY OF THE HIGHEST ORDER.

f6627fa

More seriously, the messenger listen used to be the last message. Now it is back to the domain socket listen being the last message.

Avoid signalling cvars if there are no routers to add.

4d24ade

lmwnshn added ready-to-merge This PR is ready to be merged. Mark PRs with this. and removed ready-for-review This PR passes all checks and is ready to be reviewed. Mark PRs with this. labels Mar 8, 2021

mbutrovich and others added 3 commits March 8, 2021 14:47

Merge branch 'master' into replication

bd80a73

Switch back to the model where the Messenger is responsible for invok…

649f051

…ing all SendMessage-related callbacks, before the server loop gets a chance to operate on the message.

Merge remote-tracking branch 'origin/replication' into replication

4560cf5

jkosh44 reviewed Mar 9, 2021

View reviewed changes

lmwnshn added 2 commits March 8, 2021 22:32

Rearchitect the server loop structure to rely on key_message_type ins…

fcba2ca

…tead of destination callback ID, in line with the model server manager.

Add some messenger documentation.

e85036f

Add documentation on why routers are added with a level of indirection.

8898968

mbutrovich reviewed Mar 9, 2021

View reviewed changes

lmwnshn commented Mar 9, 2021

View reviewed changes

src/include/storage/write_ahead_log/log_io.h Show resolved Hide resolved

Delete dead comment.

9868193

lmwnshn commented Mar 9, 2021

View reviewed changes

lmwnshn merged commit 98e877b into cmu-db:master Mar 9, 2021

lmwnshn deleted the replication branch March 9, 2021 22:12

This was referenced Mar 10, 2021

Fix NOISEPAGE_USE_LOGGING=OFF compilation #1512

Merged

script.testing.replication.tests_simple failing #1515

Open

lmwnshn mentioned this pull request Apr 10, 2021

Replication, except that this time it hopefully works. #1526

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronous Replication #1472

Synchronous Replication #1472

lmwnshn commented Feb 15, 2021 •

edited

Loading

lmwnshn commented Mar 8, 2021

jkosh44 Mar 9, 2021

lmwnshn Mar 9, 2021

noisepage-checks bot commented Mar 9, 2021

codecov bot commented Mar 9, 2021 •

edited

Loading

mbutrovich Mar 9, 2021 •

edited

Loading

lmwnshn Mar 9, 2021

lmwnshn Mar 9, 2021

mbutrovich Mar 9, 2021

lmwnshn Mar 9, 2021

lmwnshn Mar 9, 2021

noisepage-checks bot commented Mar 9, 2021

lmwnshn commented Mar 9, 2021

lmwnshn commented Mar 9, 2021

lmwnshn commented Mar 10, 2021

Synchronous Replication #1472

Synchronous Replication #1472

Conversation

lmwnshn commented Feb 15, 2021 • edited Loading

Synchronous Replication

Summary

Background

Description

Future work, future PR

lmwnshn commented Mar 8, 2021

jkosh44 Mar 9, 2021

Choose a reason for hiding this comment

lmwnshn Mar 9, 2021

Choose a reason for hiding this comment

noisepage-checks bot commented Mar 9, 2021

Major Decrease in Performance

codecov bot commented Mar 9, 2021 • edited Loading

Codecov Report

mbutrovich Mar 9, 2021 • edited Loading

Choose a reason for hiding this comment

lmwnshn Mar 9, 2021

Choose a reason for hiding this comment

lmwnshn Mar 9, 2021

Choose a reason for hiding this comment

mbutrovich Mar 9, 2021

Choose a reason for hiding this comment

lmwnshn Mar 9, 2021

Choose a reason for hiding this comment

lmwnshn Mar 9, 2021

Choose a reason for hiding this comment

noisepage-checks bot commented Mar 9, 2021

Minor Decrease in Performance

lmwnshn commented Mar 9, 2021

This PR

lmwnshn commented Mar 9, 2021

Main branch

lmwnshn commented Mar 10, 2021

lmwnshn commented Feb 15, 2021 •

edited

Loading

codecov bot commented Mar 9, 2021 •

edited

Loading

mbutrovich Mar 9, 2021 •

edited

Loading