Skip to content

OWLS-86407 - Fixed the logic to start clustered managed server pods when admin server pod not running #2093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@

import static java.lang.System.lineSeparator;
import static oracle.kubernetes.operator.helpers.PodHelper.hasClusterNameOrNull;
import static oracle.kubernetes.operator.helpers.PodHelper.isNotAdminServer;

/**
* Operator's mapping between custom resource Domain and runtime details about that domain,
Expand Down Expand Up @@ -101,27 +102,60 @@ private static <K, V> boolean removeIfPresentAnd(
*/
public long getNumScheduledServers(String clusterName) {
return getServersInNoOtherCluster(clusterName)
.filter(PodHelper::isScheduled)
.count();
}

/**
* Counts the number of unclustered managed servers and managed servers in the specified cluster that are scheduled.
* @param clusterName cluster name of the pod server
* @param adminServerName Name of the admin server
* @return Number of scheduled managed servers
*/
public long getNumScheduledManagedServers(String clusterName, String adminServerName) {
return getManagedServersInNoOtherCluster(clusterName, adminServerName)
.filter(PodHelper::isScheduled)
.count();
}

/**
* Counts the number of unclustered servers and servers in the specified cluster that are ready.
* Counts the number of unclustered servers (including admin) and servers in the specified cluster that are ready.
* @param clusterName cluster name of the pod server
* @return Number of ready servers
*/
public long getNumReadyServers(String clusterName) {
return getServersInNoOtherCluster(clusterName)
.filter(PodHelper::hasReadyServer)
.count();
}

/**
* Counts the number of unclustered managed servers and managed servers in the specified cluster that are ready.
* @param clusterName cluster name of the pod server
* @return Number of ready servers
*/
public long getNumReadyManagedServers(String clusterName, String adminServerName) {
return getManagedServersInNoOtherCluster(clusterName, adminServerName)
.filter(PodHelper::hasReadyServer)
.count();
}

@Nonnull
private Stream<V1Pod> getServersInNoOtherCluster(String clusterName) {
return getServers().values().stream()
.map(ServerKubernetesObjects::getPod)
.map(AtomicReference::get)
.filter(this::isNotDeletingPod)
.filter(p -> hasClusterNameOrNull(p, clusterName));
}

@Nonnull
private Stream<V1Pod> getManagedServersInNoOtherCluster(String clusterName, String adminServerName) {
return getServers().values().stream()
.map(ServerKubernetesObjects::getPod)
.map(AtomicReference::get)
.filter(this::isNotDeletingPod)
.filter(p -> isNotAdminServer(p, adminServerName))
.filter(p -> hasClusterNameOrNull(p, clusterName));
}

Expand Down Expand Up @@ -755,4 +789,4 @@ public int hashCode() {
.toHashCode();
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
import oracle.kubernetes.weblogic.domain.model.Shutdown;

import static oracle.kubernetes.operator.LabelConstants.CLUSTERNAME_LABEL;
import static oracle.kubernetes.operator.LabelConstants.SERVERNAME_LABEL;
import static oracle.kubernetes.operator.ProcessingConstants.SERVERS_TO_ROLL;

public class PodHelper {
Expand Down Expand Up @@ -114,6 +115,22 @@ private static String getClusterName(@Nonnull Map<String,String> labels) {
return labels.get(CLUSTERNAME_LABEL);
}

static boolean isNotAdminServer(@Nullable V1Pod pod, String adminServerName) {
return Optional.ofNullable(getServerName(pod)).map(s -> !s.equals(adminServerName)).orElse(true);
}

private static String getServerName(@Nullable V1Pod pod) {
return Optional.ofNullable(pod)
.map(V1Pod::getMetadata)
.map(V1ObjectMeta::getLabels)
.map(PodHelper::getServerName)
.orElse(null);
}

private static String getServerName(@Nonnull Map<String,String> labels) {
return labels.get(SERVERNAME_LABEL);
}

/**
* get if pod is in ready state.
* @param pod pod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,14 @@
import oracle.kubernetes.operator.helpers.ServiceHelper;
import oracle.kubernetes.operator.logging.LoggingFacade;
import oracle.kubernetes.operator.logging.LoggingFactory;
import oracle.kubernetes.operator.wlsconfig.WlsDomainConfig;
import oracle.kubernetes.operator.work.NextAction;
import oracle.kubernetes.operator.work.Packet;
import oracle.kubernetes.operator.work.Step;
import oracle.kubernetes.weblogic.domain.model.Domain;

import static oracle.kubernetes.operator.ProcessingConstants.DOMAIN_TOPOLOGY;

/**
* A step which will bring up the specified managed servers in parallel.
* Adds to packet:
Expand Down Expand Up @@ -168,22 +171,35 @@ public NextAction apply(Packet packet) {

if (startDetailsQueue.isEmpty()) {
return doNext(new ManagedServerUpAfterStep(getNext()), packet);
} else if (hasServerAvailableToStart(packet.getSpi(DomainPresenceInfo.class))) {
} else if (hasServerAvailableToStart(packet)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change. There was no need to expose the implementation as it had been.

numStarted.getAndIncrement();
return doForkJoin(this, packet, Collections.singletonList(startDetailsQueue.poll()));
} else {
return doDelay(this, packet, SCHEDULING_DETECTION_DELAY, TimeUnit.MILLISECONDS);
}
}

private boolean hasServerAvailableToStart(DomainPresenceInfo info) {
return ((numStarted.get() < info.getNumScheduledServers(clusterName))
&& (canStartConcurrently(info.getNumReadyServers(clusterName))));
private boolean hasServerAvailableToStart(Packet packet) {
DomainPresenceInfo info = packet.getSpi(DomainPresenceInfo.class);
String adminServerName = ((WlsDomainConfig) packet.get(DOMAIN_TOPOLOGY)).getAdminServerName();
return ((getNumServersStarted() <= info.getNumScheduledManagedServers(clusterName, adminServerName)
&& (canStartConcurrently(info.getNumReadyManagedServers(clusterName, adminServerName)))));
}

private boolean canStartConcurrently(long numReady) {
return ((this.maxConcurrency > 0) && (numStarted.get() < (this.maxConcurrency + numReady - 1)))
|| (this.maxConcurrency == 0);
return (ignoreConcurrencyLimits() || numNotReady(numReady) < this.maxConcurrency);
}

private long numNotReady(long numReady) {
return getNumServersStarted() - numReady;
}

private int getNumServersStarted() {
return numStarted.get();
}

private boolean ignoreConcurrencyLimits() {
return this.maxConcurrency == 0;
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ private static String getManagedServerName(int n) {
private final Step nextStep = new TerminalStep();
private final KubernetesTestSupport testSupport = new KubernetesTestSupport();
private final List<Memento> mementos = new ArrayList<>();
private final DomainPresenceInfo domainPresenceInfo = createDomainPresenceInfoWithServers();
private DomainPresenceInfo domainPresenceInfo = createDomainPresenceInfoWithAdminServer();
private final WlsDomainConfig domainConfig = createDomainConfig();
private final Collection<ServerStartupInfo> startupInfos = new ArrayList<>();

Expand All @@ -108,10 +108,9 @@ private static WlsDomainConfig createDomainConfig() {
.withCluster(clusterConfig);
}

private DomainPresenceInfo createDomainPresenceInfoWithServers(String... serverNames) {
private DomainPresenceInfo createDomainPresenceInfoWithAdminServer() {
DomainPresenceInfo dpi = new DomainPresenceInfo(domain);
addServer(dpi, ADMIN);
Arrays.asList(serverNames).forEach(serverName -> addServer(dpi, serverName));
return dpi;
}

Expand Down Expand Up @@ -249,6 +248,23 @@ public void whenConcurrencyLimitIs1_secondClusteredServerDoesNotStartIfFirstIsNo
assertThat(getStartedManagedServers(), hasSize(1));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get in the habit of writing your unit tests before you make production code changes. It not only guides what you then implement, it also leads to writing better tests.

Start with the name. It should explain what is going on, and this doesn't, as it is simply not the case that we always want to start a managed server pod when there is no admin server. The case, based on the JIRA, is that if we scale up a cluster, we will start a managed server even if the admin server is not running, and the test should reflect that. So maybe something like 'whileAdminServerStopped_canStartManagedServer' would be better.

In the body of the test the first call is to createDomainPresenceInfoWithServers(). It is really hard to tell what that is doing here. In the current test class, it was called exactly once: passing the name of the admin server, to indicate that was already running. Remember Robert Martin's admonition on names, I would probably create a new method that does what this call does, but is named createDomainPresenceInfoWithNoAdminServer. Perhaps we need, then, another call named createDomainPresenceInfoWithAdminServer to make it clearer. Since there are no invocations that actually pass in more servers, that should suffice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the unit test method name and added new method createDomainPresenceInfoWithNoAdminServer as suggested.

@Test
public void whileAdminServerStopped_canStartManagedServer() {
createDomainPresenceInfoWithNoAdminServer();
addWlsCluster(CLUSTER1, MS1);

invokeStepWithServerStartupInfos();

assertThat(getStartedManagedServers(), hasSize(1));
}

private void createDomainPresenceInfoWithNoAdminServer() {
domainPresenceInfo = new DomainPresenceInfo(domain);
testSupport
.addToPacket(ProcessingConstants.DOMAIN_TOPOLOGY, domainConfig)
.addDomainPresenceInfo(domainPresenceInfo);
}

@Test
public void whenConcurrencyLimitIs1_secondClusteredServerStartsAfterFirstIsReady() {
configureCluster(CLUSTER1).withMaxConcurrentStartup(1);
Expand Down