Skip to content

PIP-212: Default reppDnsResolverClass to ZkBookieRackAffinityMapping #18012

Closed
@michaeljmarshall

Description

Motivation

The current Bookkeeper configuration defaults to using org.apache.bookkeeper.net.ScriptBasedMapping for the DNSToSwitchMapping implementation. However, this default configuration does not align with the Broker's default configuration, which is org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping. As such, the default configuration for a Pulsar cluster does not lead to ideal rack awareness when ledgers need to be recovered. The result is that a user can configure a cluster for rack awareness and the brokers will honor that configuration, but the autorecovery process will not because it does not have the correct bookkeeper cluster topology view.

I propose we configure bookkeeper to use the broker's ZkBookieRackAffinityMapping class. That way, autorecovery will honor the operator's configured rack awareness policies out of the box.

Goal

Ensure consistent rack awareness policies.

I propose this is a bug fix that requires patching all active versions of Pulsar.

API Changes

None

Implementation

See #15640.

Add default value for reppDnsResolverClass to the conf/bookkeeper.conf configuration. This change effectively switches the default from org.apache.bookkeeper.net.ScriptBasedMapping to org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping.

Alternatives

The tradeoff is that a user relying on the ScriptBasedMapping default might accidentally get switched to using the ZkBookieRackAffinityMapping implementation. Given that ScriptBasedMapping doesn't work out of the box, and that the broker's default to ZkBookieRackAffinityMapping, I think this is an acceptable tradeoff.

Anything else?

I manually verified that the ZkBookieRackAffinityMapping works by running some tests in a minikube cluster deployed with the DataStax helm chart for Apache Pulsar. I set up 3 racks, 4 bookies, and a topic with a E=2, Qw=2, and Qa=2. I then verified that the autorecovery pod correctly discovered the racks and then identified when an ensemble was not following the rack placement policy after two bookies were removed. I documented my testing a bit more here: datastax/pulsar-helm-chart#214.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions