Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

ISSUE-2277: How to handle "not enough non-faulty Bookies" situation? #192

Open
sijie opened this issue Mar 3, 2020 · 0 comments
Open

ISSUE-2277: How to handle "not enough non-faulty Bookies" situation? #192

sijie opened this issue Mar 3, 2020 · 0 comments

Comments

@sijie
Copy link
Member

sijie commented Mar 3, 2020

Original Issue: apache#2277


QUESTION

We use Bookkeeper extensively in our project. While in general Bookkeeper provides good write performance, we noticed that under too much load, the Bookkeeper client may exhibit failures such as BKNotEnoughBookiesException: Not enough non-faulty bookies available.

As I understand, this problem may be caused due to the lack of throttling between the Bookkeeper Client (4.8.2) and Server (4.9.2), which may lead the client to queue up too many requests, and therefore overload the server. This is my conclusion given that the BKNotEnoughBookiesException is normally preceded by errors like ERROR o.a.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: LXXX EYYY, given that one of the Bookies has been "disconnected" during the high load period (e.g., INFO o.a.b.proto.PerChannelBookieClient - Disconnected from bookie channel and WARN o.a.b.c.RackawareEnsemblePlacementPolicyImpl - Failed to find 1 bookies : excludeBookies).

While I can understand that Bookies can be temporarily non-responsive due to high load reasons, my question is: how do we handle this situation? Apparently, the Bookkeeper Client tags the overloaded Bookies as "faulty" and they are left like this, right? Is there a way for the Bookkeeper Client to use again the Bookies classified as "faulty"? The reason is that, after inducing high load to a 3-Bookie ensemble and seeing this issue, Bookies can be used afterwards (they are not permanently crashed). However, the Bookkeeper Client is left in this state in which some of the Bookies are tagged as "faulty".

PS: I understand that "having more Bookies" could be a workaround, but my question is specifically on how to deal with the Bookkeeper Client when it quarantines a "faulty" Bookie and we want to use that Bookie later on.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant