Use immutable metadata in LedgerHandle #1646

ivankelly · 2018-09-04T19:52:05Z

Which means that for the two LedgerHandle operations that mutate the
metadata, ensemble change and closing, ensure that metadata is written
to the metadata store before the client ever uses it.

Master issue: #281

Which means that for the two LedgerHandle operations that mutate the metadata, ensemble change and closing, ensure that metadata is written to the metadata store before the client ever uses it. Master issue: apache#281

sijie · 2018-09-04T20:16:05Z

@jvrao @dlg99 @athanatos please spend some time on reviewing this PR.

…ilures

sijie · 2018-09-07T04:32:04Z

Ping @jvrao @athanatos @dlg99

…ilures

ivankelly · 2018-09-10T09:04:45Z

rerun integration tests

ivankelly · 2018-09-10T15:41:24Z

rerun integration tests

sijie · 2018-09-10T18:12:20Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/EnsembleUtils.java

+            int idx = entry.getKey();
+            BookieSocketAddress addr = entry.getValue();
+            if (LOG.isDebugEnabled()) {
+                LOG.debug("[EnsembleChange-L{}] replacing bookie: {} index: {}", ledgerId, addr, idx);


There used to have an ensembleChangeIdx in the logging message for debugging purpose. That was very useful on debugging ensemble change issues. It would be good if we can keep it.

I think that's idx here if I'm interpreting it correctly. This variant allows you to specify more than one.

No, the ensembleChangeIdx was something different. i'll readd it

suffix 'idx' confuses people. :)

ya, i'll change the name in any case.

added a logContext variable, that's used in the calling method and this method to tie the logs of the operation together.

sijie · 2018-09-10T18:18:11Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/EnsembleUtils.java

+                    break;
+                }
+            }
+        }


there is a logging message dropped. it would be great not to drop log messages related to ensemble changes. they exist for a reason.

Agreed, the final summary message is pretty handy.

Will readd. It's not a final summary however, it's the change that we wish to make. It's not final until the zookeeper write completes.

Agreed, but let us not drop any debug/log messages.

Readded, but in the calling method, as this method doesn't have all the context.

sijie · 2018-09-10T18:20:19Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandleAdv.java

            // the ledger isn't closed between checking and
            // updating lastAddPushed
-            if (getLedgerMetadata().isClosed()) {
+            if (getLedgerMetadata().isClosed() || closing) {


getLedgerMetadata().isClosed() || closing is used over multiple places, it would be good to have a function for it.

See #isHandleWrittable()

sijie · 2018-09-10T18:21:40Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadataBuilder.java

    }

+    LedgerMetadataBuilder withWriteQuorumSize(int writeQuorumSize) {
+        checkArgument(ensembleSize >= writeQuorumSize, "Write quorum must be less or equal to ensemble size");


check writeQuorumSize >= ackQuorumSize as well.

sijie · 2018-09-10T18:24:38Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookKeeperTest.java

-
+            // should still be able to close as long as recovery closed the ledger
+            // with the same last entryId and length as in the write handle.
+            writeLh.close();


hmm this changes the closing behavior. I am not sure how it would impact the applications. so I would suggest keep the original behavior if closing a ledger hit metadata version exception don't attempt. If you want to change the behavior, do a separate PR for it.

This seems like a desirable change to me. Moreover, it protects against the case where the racing update came from the same client due to ZooKeeperClient resending the update.

@sijie the previous behaviour is not documented, nor is it well defined. In some cases a metadata version exception allowed a close to succeed, and in others it did not. I would not expect any application is relying on this behaviour, and if they are, they are probably broken in many other ways.
I can revert this my putting throwing an exception in the Predicate part of the loop if the metadata is closed. There'd be no guarantee that behaviour is still exactly matching though, because it isn't well defined currently.

I understood it is not documented. but if we are changing any existing behavior, I would suggest either doing it in a separate PR, or if it is difficult to do it in a separate PR, then update the javadoc of this method to make things clear "this method will not throw any exceptions anymore if hitting metadata version exceptions"

It may still throw an exception on metadata version exception. However, it will only throw the exception if the length or last entry id in the conflicting write is different to what the caller of #close believed it to be. I strongly prefer making this change as part of this patch, as to do otherwise would be to insert arbitrary strange behaviour into the new implementation. I'll add a javadoc for this.

added javadoc

sijie · 2018-09-10T18:39:40Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java


        while ((pendingAddOp = pendingAddOps.peek()) != null
-               && blockAddCompletions.get() == 0) {
+               && !changingEnsemble) {


don't we need to make changingEnsemble volatile? or who is guaranteed this value is synchronized correctly ?

Yeah, sendAddSuccessCallbacks reads it without the lock and handleBookieFailure appears to rely on readers seeing it synchronously for correctness, so either you need the lock there or changingEnsemble would have to be volatile.

It doesn't need to be volatile. it is only ever accessed in the ordered executor thread for this ledgerhandle. Rather than throwing synchronized around everything, we should start asserting in methods that we are in fact running in the ordered executor thread.

The previous blockAddCompletion stuff would not have been safe if these weren't in the same thread.

Such an assert would certainly clarify matters.

Adding these asserts is out of scope for this change.

sijie · 2018-09-10T18:41:13Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+            if (delayedWriteFailedBookies.isEmpty()) {
+                return;
+            }
+            Map<Integer, BookieSocketAddress> toReplace = new HashMap<>();


the existing code to construct the hashmap from delayedWriteFailedBookies. it is one line code. any reason why do you split into 2 lines?

will change it back. not sure why i changed it in first place,

sijie · 2018-09-10T18:53:32Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

-                handleUnrecoverableErrorDuringAdd(rc);
+        synchronized (metadataLock) {
+            if (changingEnsemble) {
+                delayedWriteFailedBookies.putAll(failedBookies);


this change potentially has a side effect causing a longer pause during ensemble changes. I don't think it is a good idea to completely block on waiting previous ensemble to be done. The only matter is we need to ensure we only update local copy of metadata until all ensemble changes are completed, no? Can you explain the performance difference between current approach and previous approach?

I could be misunderstanding, but I think the main difference in behavior is that between when changingEnsemble gets set to true and when the ensemble change completes, new writes continue to use the old ensemble whereas with the old machinery they would have optimistically used the new one. In both cases, we have to delay acks until the ensemble change is complete, but with this variant we may have to wait to resend writes for entries written since the ensemble change began which didn't get aQ responses from unchanged indexes. Am I understanding that correctly @ivankelly ?

@jvrao Something to note here is that unsetSuccessAndSendWriteRequest will resend the write request to those bookies regardless of whether the entry already has aQ copies, so it shouldn't generate additional under replicated ledgers in the common case.

@sijie I think superficially it can look like it takes longer, but in practice it should only ever take less time.

The dominating latency in this operation is the write to zookeeper(LatWrite). In the case where we do not block other changes to the ensemble, and there are two failures, one of the updates will fail, have to reread (LatRead) and write again.

So the latency is (LatWrite + LatRead + LatWrite).

With the new code, it's just LatWrite + LatWrite. It gets better if there's more than 2 failures.

@athanatos your understanding is correct. We could modify this to unsetSuccessAndSendWriteRequest after each sucessful zk write. It would at least spread the outbound load a little.

I'm not really worried about that, and we can always add machinery for projecting the post-update ensemble for writes later.

make senses to me

sijie · 2018-09-10T18:56:30Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

-                    .lastEntry().getValue().get(replacedBookieIdx);
-                replaced &= !Objects.equal(replacedBookieAddr, failedBookieAddr);
+                List<BookieSocketAddress> origEnsemble = getCurrentEnsemble();
+                ensembleChangeLoop(origEnsemble, toReplace);


ensembleChangeLoop potentially will trigger callbacks. can we not call this method under metadataLock?

Yeah, both the initial error check branch and the unsetSuccessAndSendWriteRequest calls can call callbacks.

Sure, I can move it out. There shouldn't be a problem with callbacks under metadatalock though, as it's only protected a few members. Maybe deadlocks could be an issue down the line.

yes move the callbacks out of locks would be better in general.

Moved out of synchronized block.

sijie · 2018-09-10T19:31:32Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+                    List<BookieSocketAddress> lastEnsemble = metadata.getLastEnsembleValue();
+                    boolean failedBookieInEnsemble = failedBookies.entrySet().stream()
+                        .anyMatch((e) -> lastEnsemble.get(e.getKey()).equals(e.getValue()));
+                    return !metadata.isClosed() && !metadata.isInRecovery() && failedBookieInEnsemble;


!metadata.isClosed() && !metadata.isInRecovery() && failedBookies.entrySet().stream() .anyMatch((e) -> lastEnsemble.get(e.getKey()).equals(e.getValue()))

so we only compute failedBookieInEnsmeble when metadata is open.

athanatos · 2018-09-11T00:29:49Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

-                    // f) writing entry E+1 encountered LedgerFencedException which will enter ledger close procedure
-                    // g) it would find that ledger metadata is closed, then it callbacks immediately without erroring
-                    //    out any pendings
                    synchronized (LedgerHandle.this) {


drainPendingAddsToErrorOut is already synchronized.

the synchronization is around the other fields also, like the length.

athanatos · 2018-09-11T00:35:08Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

                final State prevState;
                List<PendingAddOp> pendingAdds;

                if (isClosed()) {


Is it possible for isClosed() but not closing here? If closing, drainPendingAddsToErrorOut must have already happened (and no new ones can be added due to the check in doAsyncAddEntry), so this must necessarily be a noop. I think we should be able to check closing instead and assert that pendingAddOps is already empty.

@athanatos if we're listening for metadata updates, it's possible someone else updated the metadata and closed it just before we close it. However, i think isClosed is wrong here as it confuses two things. What we care about in this check is whether the handle is closed or not, so using an enum for the handle state would work better.

Yeah, but I think what I'm getting at is that whatever discovered that the ledger is actually closed or closed it should really already have called drainPendingAddsToErrorOut -- this code shouldn't be discovering that for the first time.

It looks like the listener stuff isn't in yet (#1580). In any case, this close is only on the writing handle. So if the state of the ledger changes, that state change will be accompanied by the any new writes being fenced (the protocol is that if the state is changed from open by anyone put the writer, all bookies in last ensemble must be told to fence).

So the only way for the handle to get to closed is for a call to this 'close' method. So we should check that the handle is in the 'open' state and only then drain the pending adds. (i.e. the state of the pending adds and the handle open state are tied).

This discussion becomes moot with how I've changed it now. We do call drain() in all cases, but I've done it this way to only have one synchronized block. In the case that the ledger is already closed, the drain should just return an empty list.

athanatos · 2018-09-11T00:55:58Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+            // Original intent of this change is to do a best-effort ensemble change.
+            // But this is not possible until the local metadata is completely immutable.
+            // Until the feature "Make LedgerMetadata Immutable #610" Is complete we will use
+            // handleBookieFailure() to handle delayed writes as regular bookie failures.


This comment probably needs to be rewritten to explain precisely why we need to block acks and what would need to change.

The whole deferred handle failure needs to change, but I can add a comment to clarify why this was previously an issue.

Yep. That comment seems to imply that this refactor would fix it, so at least that bit should be updated.

The whole deferred handle failure needs to change,

@ivankelly can you flush out your thoughts on this?
I agree with modifying comments as per the discussion above.

I haven't fully fleshed it out in my head yet, but if we are to update the metadata we need to block completions while we do it, but if it turns out we can't, we shouldn't consider it an unrecoverable failure, unless we cannot get an ack quorum.

athanatos · 2018-09-11T01:19:55Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

    public static final long INVALID_LEDGER_ID = -0xABCDABCDL;

-    final AtomicInteger blockAddCompletions = new AtomicInteger(0);
+    final Object metadataLock = new Object();


I'm guessing that the reason for this lock is to prevent write completion events from contending on the LedgerHandle object with write initiation? Is that contention really enough of a problem for this to be a measurable win? If so, I think that the state protected by this lock (changingEnsemble, delayedWriteFailedBookies) should be moved into an actual object with descriptive methods. As it is, it's a bit tough to infer the update rules.

I think it's not actually needed because everything it protects is run on the orderedexecutor thread for this ledger. A lock was requested in a previous patch though until the threading model is clarified.
#1621 (comment)

Well, for that purpose, couldn't we just use the existing LedgerHandle object until we're sure about the single thread/ledger implementation (which I'd assume would mean pervasive assertions with testing)?

I'd prefer not to use the handle itself, as that's an object that's exposed to clients, so clients could try to lock on it in a callback.

The single thread/ledger thing would take a couple of forms, but the biggest pieces would be 1) making sure a ledger handle instance is only given one executor 2) making sure all callbacks from bookie client are run on that one executor & 3) asserting that Thread.currentThread().getId() matches the ID of the executor given to the handle.

Well, that ship would seem to have sailed since we lock it elsewhere, right?

Right now, all the writes, write responses and metadata changes on write are done through orderedexecutor (lid) right? Where is the need for lock serialization?

Well, that ship would seem to have sailed since we lock it elsewhere, right?

That's true, but I don't want to throw fuel on that fire.

Right now, all the writes, write responses and metadata changes on write are done through orderedexecutor (lid) right? Where is the need for lock serialization?

There isn't (I don't think), but I would like another mechanism in place to ensure safety before removing it.

athanatos · 2018-09-11T18:14:53Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+                                List<BookieSocketAddress> newEnsemble = getCurrentEnsemble();
+                                Set<Integer> replaced = EnsembleUtils.diffEnsemble(origEnsemble, newEnsemble);
+                                unsetSuccessAndSendWriteRequest(newEnsemble, replaced);
+                                changingEnsemble = false;


I think we have to call sendAddSuccessCallbacks after calling unsetSucccessAndSendWriteRequest. It's possible that every pendingAddOp had a bookie in the write set swapped, but nevertheless all have ackQuorum satisfied. In that case, PendingAddOp.unsetSuccessAndSendWriteRequest will not call sendAddSuccessCallbacks(), but we will still have writes ready to go. In fact, I think the call in PendingAddOp.unsetSuccessAndSendWriteRequest is entirely unnecessary and should be removed in favor of a single call here. As a side effect, you can call it outside of the metadataLock avoiding that problem as well.

I agree. Will change this.

I agree. Will change this.

PendingAddOp.unsetSuccessAndSendWriteRequest() will do a sendWriteRequest() even if the AQ satisfied. When that write request response comes back irrespective of a pass/fail it will do sendAddSuccessCallbacks().

@jvrao you're right. This stuff jumps far too much between LedgerHandle and PendingAddOp. I'll leave this for now.

athanatos · 2018-09-11T18:17:56Z

@sijie @ivankelly Ok, I'm done reviewing for now, I left some comments/questions.

sijie · 2018-09-11T18:33:49Z

thank you @athanatos

ivankelly

@sijie @athanatos Thanks for the reviews.

There's a few things that need to be settled before i push a new patch (see my comments)

ivankelly · 2018-09-12T14:04:54Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/EnsembleUtils.java

+                    break;
+                }
+            }
+        }


Will readd. It's not a final summary however, it's the change that we wish to make. It's not final until the zookeeper write completes.

ivankelly · 2018-09-12T14:05:45Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+                    List<BookieSocketAddress> lastEnsemble = metadata.getLastEnsembleValue();
+                    boolean failedBookieInEnsemble = failedBookies.entrySet().stream()
+                        .anyMatch((e) -> lastEnsemble.get(e.getKey()).equals(e.getValue()));
+                    return !metadata.isClosed() && !metadata.isInRecovery() && failedBookieInEnsemble;


ivankelly · 2018-09-12T14:07:06Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandleAdv.java

            // the ledger isn't closed between checking and
            // updating lastAddPushed
-            if (getLedgerMetadata().isClosed()) {
+            if (getLedgerMetadata().isClosed() || closing) {


ivankelly · 2018-09-12T14:07:33Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerMetadataBuilder.java

    }

+    LedgerMetadataBuilder withWriteQuorumSize(int writeQuorumSize) {
+        checkArgument(ensembleSize >= writeQuorumSize, "Write quorum must be less or equal to ensemble size");


ivankelly · 2018-09-12T14:23:30Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/client/BookKeeperTest.java

-
+            // should still be able to close as long as recovery closed the ledger
+            // with the same last entryId and length as in the write handle.
+            writeLh.close();


@sijie the previous behaviour is not documented, nor is it well defined. In some cases a metadata version exception allowed a close to succeed, and in others it did not. I would not expect any application is relying on this behaviour, and if they are, they are probably broken in many other ways.
I can revert this my putting throwing an exception in the Predicate part of the loop if the metadata is closed. There'd be no guarantee that behaviour is still exactly matching though, because it isn't well defined currently.

ivankelly · 2018-09-12T14:51:24Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

    public static final long INVALID_LEDGER_ID = -0xABCDABCDL;

-    final AtomicInteger blockAddCompletions = new AtomicInteger(0);
+    final Object metadataLock = new Object();


I think it's not actually needed because everything it protects is run on the orderedexecutor thread for this ledger. A lock was requested in a previous patch though until the threading model is clarified.
#1621 (comment)

ivankelly · 2018-09-12T14:54:08Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

    private LedgerMetadata metadata;
    final long ledgerId;
    long lastAddPushed;
+    boolean closing;


@sijie with mutable metadata, when close happens, we updated the state in the local metadata immediately. So any call to isClosed would be true if the client was currently closing.

@athanatos yes, I was thinking of adding an explicit handleState enum. Can do now.

ivankelly · 2018-09-12T14:57:43Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

                final State prevState;
                List<PendingAddOp> pendingAdds;

                if (isClosed()) {


@athanatos if we're listening for metadata updates, it's possible someone else updated the metadata and closed it just before we close it. However, i think isClosed is wrong here as it confuses two things. What we care about in this check is whether the handle is closed or not, so using an enum for the handle state would work better.

ivankelly · 2018-09-12T15:01:04Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

-                    .lastEntry().getValue().get(replacedBookieIdx);
-                replaced &= !Objects.equal(replacedBookieAddr, failedBookieAddr);
+                List<BookieSocketAddress> origEnsemble = getCurrentEnsemble();
+                ensembleChangeLoop(origEnsemble, toReplace);


Sure, I can move it out. There shouldn't be a problem with callbacks under metadatalock though, as it's only protected a few members. Maybe deadlocks could be an issue down the line.

ivankelly · 2018-09-12T15:04:12Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+                                List<BookieSocketAddress> newEnsemble = getCurrentEnsemble();
+                                Set<Integer> replaced = EnsembleUtils.diffEnsemble(origEnsemble, newEnsemble);
+                                unsetSuccessAndSendWriteRequest(newEnsemble, replaced);
+                                changingEnsemble = false;


I agree. Will change this.

jvrao · 2018-09-14T05:08:02Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/EnsembleUtils.java

+
+    static Set<Integer> diffEnsemble(List<BookieSocketAddress> e1,
+                                     List<BookieSocketAddress> e2) {
+        checkArgument(e1.size() == e2.size(), "Ensembles must be of same size");


How is this better than Assert?

asserts get turned off by default at runtime.

jvrao · 2018-09-15T21:06:01Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

-            }
-            metadata.addEnsemble(newEnsembleStartEntry, newEnsemble);
+    void notifyWriteFailed(int index, BookieSocketAddress addr) {
+        synchronized (metadataLock) {


What are we protecting it from? from ReadOnlyLedgerHandle??? In the current code, even that will go through the same orderedexecutor right?

From our own past and future misdeeds :)

We should replace this with checkState(Thread.currentThread().getId() == myThread), but that's a separate change. synchronization that doesn't hit contention is cheap anyhow, it only touches a thread local afair.

jvrao · 2018-09-15T21:39:46Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

-                LOG.info("[EnsembleChange-L{}-{}] : resolved ledger metadata conflict and writing to zookeeper,"
-                        + " local meta data is \n {} \n, zk meta data is \n {}.",
-                        ledgerId, ensembleChangeIdx, metadata, newMeta);
+                LOG.debug("Ledger {} reaches max allowed ensemble change number {}",


It will be nice to have this at INFO level as it gives good idea on why we are failing the write.

jvrao · 2018-09-15T21:52:40Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+                    LedgerMetadataBuilder builder = LedgerMetadataBuilder.from(metadata);
+                    long newEnsembleStartEntry = getLastAddConfirmed() + 1;
+                    checkState(lastEnsembleKey <= newEnsembleStartEntry,
+                               "New ensemble must either replace the last ensemble, or add a new one");


Please print the lastEnsembleKey and newEnsembleStartEntry here;
With this check state what happens? Does the write fail?
Also, when can this happen? If we are here we don't have any outstanding metadata update in the flight.
For the ledger L1 with Ensemble (B1, B2, B3) when LAC is at X; X+1 write failed on Bookie B1. The ensemble is changed to (B4,B2,B3) and then we noticed that B2 also failed. In that case, do we end up in this situation?
But that may be OK state right? why are you doing checkState?

I believe that this cannot be violated if the protocol is operating correctly (the only other way for an ensemble to be a fence, but that's rules out by the previous argument). The checkState call would appear to be a statement that this case is impossible? The == case happens when we got a new failure since we sent the ensemble change (which worked), so we have to replace it. @ivankelly Is that right?

@jvrao @athanatos's understanding is correct. The checkState is an assertion.

jvrao · 2018-09-15T21:54:03Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java

+                    checkState(lastEnsembleKey <= newEnsembleStartEntry,
+                               "New ensemble must either replace the last ensemble, or add a new one");
+                    if (lastEnsembleKey.equals(newEnsembleStartEntry)) {
+                        return builder.replaceEnsembleEntry(newEnsembleStartEntry, newEnsemble).build();


Hmm; so the above checkSate should be < instead of <=??

No, <, if we fail to write at all to an ensemble, it can be replaced.

ivankelly · 2018-10-01T19:38:46Z

rerun bookkeeper-server client tests

…ilures

ivankelly · 2018-10-02T22:17:25Z

@dlg99 this is still failing on MDC stuff. The problem is that if the runnable is submitted from a thread without MDC, such as the zookeeper callback threads, they won't get the context. Not sure of how to solve this without a huge amount of plumbing :/

Though now it only prints when the new ensemble has been persisted. Also disabled the MDC test check for this log because waiting for persistence breaks the MDC in a way that's non-trivial to fix.

ivankelly · 2018-10-02T22:30:52Z

@dlg99 actually, think about this more, it's a general problem with the MDC approach. It worked before because "New Ensemble" was being printed before actually trying to write to zookeeper, do the request hadn't lefts the ordered executor thread pools. Once the request goes to zookeeper, the MDC context is lost. For now, I've commented out that check.

ivankelly · 2018-10-03T08:21:43Z

rerun bookkeeper-server bookie tests

sijie · 2018-10-04T04:21:19Z

@athanatos @jvrao can you review @ivankelly 's latest change?

ivankelly · 2018-10-12T16:09:19Z

@sijie @athanatos @jvrao could you all take another look at this? I'm eager to get this changeset out of my queue

ivankelly · 2018-10-17T15:23:43Z

@sijie @athanatos @jvrao pinging again

ivankelly · 2018-10-17T15:24:26Z

rerun java8 tests
rerun integration tests

ivankelly · 2018-10-17T19:33:45Z

rerun integration tests

…ilures

ivankelly · 2018-10-17T21:32:52Z

rerun java8 tests

ivankelly · 2018-10-17T21:34:07Z

rebuild java8

ivankelly · 2018-10-17T22:17:19Z

rebuild java8

ivankelly · 2018-10-18T07:55:39Z

rebuild java8

ivankelly · 2018-10-22T12:36:27Z

@sijie @jvrao @athanatos weekly reminder to please take another look at this

athanatos · 2018-10-22T21:45:45Z

@ivankelly Looks like you've addressed my comments, LGTM

ivankelly · 2018-10-23T07:06:44Z

@athanatos thanks. if you're happy with the change could you mark it as approved?

jvrao

Great work. LGTM

ivankelly · 2018-10-25T12:42:34Z

@sijie @jvrao @athanatos thanks for the reviews guys. Now that this is in there's a bunch of small cleanup patches that also need to go in to enforce the immutability. They'll be much much smaller though.

Use immutable metadata in LedgerHandle

6817348

Which means that for the two LedgerHandle operations that mutate the metadata, ensemble change and closing, ensure that metadata is written to the metadata store before the client ever uses it. Master issue: apache#281

ivankelly self-assigned this Sep 4, 2018

ivankelly requested review from eolivelli, jvrao and sijie September 4, 2018 19:52

sijie requested review from athanatos, dlg99, jiazhai and merlimat September 4, 2018 20:14

sijie added the area/client label Sep 4, 2018

Merge remote-tracking branch 'origin/master' into immutable-handle-fa…

4f5e62d

…ilures

Merge remote-tracking branch 'origin/master' into immutable-handle-fa…

50a4840

…ilures

sijie reviewed Sep 10, 2018

View reviewed changes

athanatos reviewed Sep 11, 2018

View reviewed changes

ivankelly commented Sep 12, 2018

View reviewed changes

jvrao reviewed Sep 14, 2018

View reviewed changes

jvrao reviewed Sep 15, 2018

View reviewed changes

ivankelly mentioned this pull request Oct 2, 2018

Decorate OrderedExecutor threads rather than runnables #1729

Merged

ivankelly added 2 commits October 2, 2018 18:58

Merge remote-tracking branch 'origin/master' into immutable-handle-fa…

3245e76

…ilures

Fix test, due to a change in log statement

164efbc

Readd New Ensemble log message

0bc7e07

Though now it only prints when the new ensemble has been persisted. Also disabled the MDC test check for this log because waiting for persistence breaks the MDC in a way that's non-trivial to fix.

Merge remote-tracking branch 'origin/master' into immutable-handle-fa…

c102f4c

…ilures

jvrao approved these changes Oct 23, 2018

View reviewed changes

sijie approved these changes Oct 25, 2018

View reviewed changes

sijie added this to the 4.9.0 milestone Oct 25, 2018

sijie added release/4.9.0 type/improvement labels Oct 25, 2018

sijie merged commit 6bf6971 into apache:master Oct 25, 2018

ivankelly mentioned this pull request Dec 3, 2018

changingEnsemble should be negated before calling unset success #1857

Merged

liangyepianzhou mentioned this pull request Aug 15, 2025

Handling outdated and misleading comments in maybeHandleDelayedWriteBookieFailure #4653

Open

Use immutable metadata in LedgerHandle #1646

Use immutable metadata in LedgerHandle #1646

Uh oh!

Conversation

ivankelly commented Sep 4, 2018

Uh oh!

sijie commented Sep 4, 2018

Uh oh!

sijie commented Sep 7, 2018

Uh oh!

ivankelly commented Sep 10, 2018

Uh oh!

ivankelly commented Sep 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athanatos Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athanatos Sep 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

athanatos Sep 11, 2018 •

edited

Loading

athanatos Sep 11, 2018 •

edited

Loading