Skip to content

Conversation

@Miciah
Copy link
Contributor

@Miciah Miciah commented Sep 26, 2024

No description provided.

override:

```shell
oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"unsupportedConfigOverrides":{"dynamicConfigManager":"false"}}}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For upgrades, I assume you need to remove the unsupported override and re-enable it once the upgrade is complete. However, this raises the question: if DCM is re-enabled, does that imply the cluster may be broken from an ingress perspective until the update is complete and DCM can be disabled again? Is that going to be an acceptable stance for production clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For upgrades, I assume you need to remove the unsupported override and re-enable it once the upgrade is complete.

We should, but currently do not, block upgrades when using unsupportedConfigOverrides (reported as https://issues.redhat.com/browse/OCPBUGS-41366).

However, this raises the question: if DCM is re-enabled, does that imply the cluster may be broken from an ingress perspective until the update is complete and DCM can be disabled again? Is that going to be an acceptable stance for production clusters?

I guess you're saying that unsupportedConfigOverrides isn't a good escape hatch after all, right? You can probably force an upgrade using oc adm --force, but I understand that --force is a last resort.

I suppose we could tell people to turn off the featuregate. Is that a reasonable escape hatch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could tell people to turn off the featuregate. Is that a reasonable escape hatch?

Yes, I would say so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought, turning off the featuregate would be require using CustomNoUpgrade, which would prevent upgrades and could not be undone. So the featuregate is not a reasonable escape hatch; we can only offer unsupportedConfigOverrides or add an entirely new API field, which is something we really want to avoid doing.

@candita
Copy link
Contributor

candita commented Oct 9, 2024

/assign @alebedev87
/assign

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 7, 2024
@candita
Copy link
Contributor

candita commented Nov 8, 2024

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 8, 2024
Comment on lines 28 to 35
OpenShift 4.18 enables Dynamic Config Manager with 1 pre-allocated server per
backend, without blueprint routes, and without any configuration options. The
goal is to deliver a minimum viable product on which to iterate. This MVP
provides marginal value by avoiding reloading HAProxy for a single scale-out
event or subsequent scale-in event for a route, at minimal development and
operational cost. More importantly, the MVP gives us CI signal, enables us to
work out defects in DCM, and gives us a starting point from which to enhance DCM
in subsequent OpenShift releases. In the future, we intend to extend DCM with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Says the same thing, just reworded for clarity.

Suggested change
OpenShift 4.18 enables Dynamic Config Manager with 1 pre-allocated server per
backend, without blueprint routes, and without any configuration options. The
goal is to deliver a minimum viable product on which to iterate. This MVP
provides marginal value by avoiding reloading HAProxy for a single scale-out
event or subsequent scale-in event for a route, at minimal development and
operational cost. More importantly, the MVP gives us CI signal, enables us to
work out defects in DCM, and gives us a starting point from which to enhance DCM
in subsequent OpenShift releases. In the future, we intend to extend DCM with
OpenShift 4.18 delivers a Tech Preview version of Dynamic Config Manager as a minimum viable product that enables dynamic configuration without repeated HAProxy reloads. It will be offered with 1 pre-allocated server per backend, without blueprint routes, and without any configuration options. This MVP
provides marginal value by avoiding reloading HAProxy for a single scale-out
event or subsequent scale-in event for a route, at minimal development and
operational cost. More importantly, the MVP gives us CI signal, enables us to
work out defects in DCM, and gives us a starting point from which to enhance DCM
in subsequent OpenShift releases. In the future, we intend to extend DCM with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this isn't a special version of DCM; it's a specific configuration of DCM. However, I can add mention that we are shipping this as Tech Preview in 4.18.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HAProxy process through a Unix domain socket. This means that no forking is
necessary to update configuration and effectuate the changes.

However, DCM requires some work before it can be enabled. DCM was first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, DCM requires some work before it can be enabled. DCM was first
DCM requires substantial work to develop and verify it in order to make it
viable. DCM was first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had to fix several defects in DCM itself before we were comfortable shipping it even in this MVP configuration, and even as a Tech Preview feature. I don't know whether it is considered "viable" at this point, or whether we would consider it "viable" if further testing reveals no new defects. How about this?

Suggested change
However, DCM requires some work before it can be enabled. DCM was first
However, DCM requires development work to fix several known issues as well as extensive testing in order to ensure that enabling it does not introduce regressions. DCM was first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 69 to 71
does not take full advantage of the capabilities of newer HAProxy versions. In
sum, DCM requires substantial work to develop and verify it in order to make it
viable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the last sentence if you change the first sentence.

Suggested change
does not take full advantage of the capabilities of newer HAProxy versions. In
sum, DCM requires substantial work to develop and verify it in order to make it
viable.
does not take full advantage of the capabilities of newer HAProxy versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to follow the usual topic sentence, body, conclusion. Maybe it's a little too College English 101? I can tighten the paragraph up if it's too wordy.

Comment on lines 98 to 99
The degree to which DCM mitigates this issue is dependent on the nature of the
configuration changes: Some changes still require a configuration reload, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The degree to which DCM mitigates this issue is dependent on the nature of the
configuration changes: Some changes still require a configuration reload, but
The degree to which DCM mitigates this issue is dependent on the nature of the
configuration changes. Some changes will still require a configuration reload, but

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 126 to 127
_As a project administrator, I want HAProxy to balance traffic evenly over old
and new pods when I scale my application up._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ❤️ the explanations in this section.

Is this also true?

Suggested change
_As a project administrator, I want HAProxy to balance traffic evenly over old
and new pods when I scale my application up._
_As a project administrator, I want HAProxy to balance traffic correctly over old
and new pods when I scale my application up or down._

Copy link
Contributor Author

@Miciah Miciah Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I would say it's still true. I prefer "evenly" because it is more evocative of the problem we are trying to avoid: an excessive share of the traffic going to the first server in the backend. However, I can write "correctly" if that is clearer. Adding "or down" does make sense. (I should probably make it "out or in" since we are talking about adding or removing pod replicas, not increasing or decreasing the resources available to each pod.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "scale out or scale in" and "to balance traffic correctly (that is, evenly)" in https://github.com/openshift/enhancements/compare/15a3db2301114b9720c9966439a550c26618dbc7..88d317b9344acde23f3d57a585757ac08787f996.


Steps 5 and 8 are stretch goals and may be done in later OpenShift releases.
The other steps are hard requirements, and we intend to complete them in
OpenShift 4.18.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the GA step after this paragraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enhance DCM by using newer capabilities in HAProxy 2.y to add arbitrarily many
servers dynamically, add backends dynamically, and update certificates
dynamically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there anything DCM-related that we could add via the haproxy config template that might require an openshift/API change? Getting an API framework in place might be useful.

Are all the Unix socket commands listed in https://docs.haproxy.org/2.8/management.html#9.3 considered to be dynamic config options? So, not only can we add/del servers, but we could also add/del acls, maps, etc, and disable frontends, servers , healthchecks, and set maxconns/rate-limit, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there anything DCM-related that we could add via the haproxy config template that might require an openshift/API change? Getting an API framework in place might be useful.

Ideally not. We want DCM to be as transparent and dynamic as possible, requiring no ahead-of-time configuration.

Are all the Unix socket commands listed in https://docs.haproxy.org/2.8/management.html#9.3 considered to be dynamic config options? So, not only can we add/del servers, but we could also add/del acls, maps, etc, and disable frontends, servers , healthchecks, and set maxconns/rate-limit, etc.

Right, any configuration that the router performs using the Unix domain socket is dynamic configuration. We might indeed be able to use additional capabilities to avoid reloads in more cases.

Of course, we do need to do some cost-benefit analysis. It may be worth the engineering effort to implement dynamic configuration of, say certificates but not worth the effort to implement dynamic configuration of rate limiting, depending on how much effort is required and how often people actually update that particular kind of configuration.

https://gist.github.com/frobware/2b527ce3f040797909eff482a4776e0b for an
analysis of the potential memory impact for different choices of balancing
algorithm, number of threads, number of backends, and number of pre-allocated
server slots per backend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to see a brief summary that shows how fast memory grows for a typical use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean pulling in Andy's table, or do you mean gathering more data so we can plot some curves?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither - I meant summarizing the data in the table. You could keep the reference to the data set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

### Graduation Criteria

We will introduce the feature as TechPreviewNoUpgrade in OpenShift 4.18, with
the goal of graduating it to GA in the same release. Further improvements to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the "goal of graduating it to GA in the same release" still applicable given our 2025 priorities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how things go with the Tech Preview, and whether we discover any new major issues.


- OpenShift Enterprise 3.11 added DCM and the `ROUTER_HAPROXY_CONFIG_MANAGER` option ([release notes](https://docs.openshift.com/container-platform/3.11/release_notes/ocp_3_11_release_notes.html#ocp-311-haproxy-enhancements), [documentation](https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html#using-the-dynamic-configuration-manager)).
- OpenShift Container Platform 4.9 added the `dynamicConfigManager` config override, default off ([openshift/cluster-ingress-operator@6a8516a](https://github.com/openshift/cluster-ingress-operator/pull/628/commits/6a8516ab247b00b87a5d7b32e20d4cffcefe1c0f)).
- OpenShift Container Platform 4.18 enables DCM by default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, if we use a feature gate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alebedev87 alebedev87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some nit picks.

superseded-by:
---

# OpenShift Router Dynamic Config Manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the full name of the feature is "Dynamic Configuration Manager".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. We use "dynamic config manager" in the source code and colloquially, but https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html#using-the-dynamic-configuration-manager has "Dynamic Configuration Manager".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gigabytes per process, ultimately causing out-of-memory errors on the node host.

DCM addresses this issue by reducing the need to reload the configuration.
Instead, DCM configures HAProxy using its Unix domain control socket, which does
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "control socket" term doesn't seem to relate to any HAProxy or router technology.

Suggested change
Instead, DCM configures HAProxy using its Unix domain control socket, which does
Instead, DCM configures HAProxy using its Runtime API, which does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, "control socket" is a term we use colloquially within the team, and it doesn't appear in the HAProxy documentation, which uses the term "stats socket". However, "stats socket" is a bit of a misnomer, I don't see "Runtime API" in https://docs.haproxy.org/2.8/management.html, and "Unix domain control socket" is descriptive of what the socket is, the capabilities of the socket, and our usage of it. Would it be all right by you if I continued using the term "control socket" but defined it up front in the EP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Runtime API can be accessed via a UNIX socket or via IP:port. Ref: HAProxy Runtime API installation guide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that "Runtime API" really means commands on the stats socket. However, "Runtime API" seems to be part of the nomenclature from the commercial HAProxy product documentation whereas the project does not use this nomenclature. Personally, I find the phrase "Runtime API" confusing and unnecessary.

Would you be all right with using the phrase "Unix Socket commands" from the HAProxy documentation, with a reference to the same?

Suggested change
Instead, DCM configures HAProxy using its Unix domain control socket, which does
Instead, DCM configures HAProxy using [Unix Socket commands](https://docs.haproxy.org/2.8/management.html#9.3), which does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Manually verify that the router functions and passes E2E tests with DCM enabled.
2. Add a tech-preview featuregate in openshift/api for DCM.
3. Update cluster-ingress-operator to enable DCM if the featuregate is enabled.
4. Re-enable old E2E tests in openshift/origin for DCM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done if DCM will be in techpreview featureset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The openshift/origin tests create router pods directly, so the tests can enable DCM irrespective of the featuregate: https://github.com/openshift/origin/blob/203d2b124037a543d8719ba7e2a369f52f01686d/test/extended/router/config_manager.go#L328-L489


Dynamic Config Manager has some known defects:

- [OCPBUGS-7466 No prometheus haproxy metrics present for route created by dynamicConfigManager](https://issues.redhat.com/browse/OCPBUGS-7466)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still a risk considering the fact that we won't enable blueprint routes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, OCPBUGS-7466 pertains to blueprint routes. We can note this as irrelevant for our supported configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic Config Manager has some known defects:

- [OCPBUGS-7466 No prometheus haproxy metrics present for route created by dynamicConfigManager](https://issues.redhat.com/browse/OCPBUGS-7466)
- [OCPBUGSM-20868 Sticky cookies take long to start working if config manager is enabled](https://issues.redhat.com/browse/OCPBUGSM-20868)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, essentially this is what I discovered too (fix PR).


#### Can we use HAProxy's control socket to add backends and certificates dynamically?

Answer: **?**.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backends - no. Certificates - seems like yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it does seem that dynamic backends are are long discussed but never implemented possibility. Sadface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in https://github.com/openshift/enhancements/compare/15a3db2301114b9720c9966439a550c26618dbc7..88d317b9344acde23f3d57a585757ac08787f996. I split the question into two: one for certificates and one for backends.

Comment on lines 310 to 312
or for scale-out following scale-in. However, we found that DCM still requires
a reload for scale-in of a server that doesn't use a pre-allocated server slot.
(TODO: This needs to be double-checked.)
Copy link
Contributor

@alebedev87 alebedev87 Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did see stopped not pre-allocated servers. So, scale-ins should still be dynamic.

Copy link
Contributor

@alebedev87 alebedev87 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to scale a deployment from 4 replicas (3 static slots UP + 1 dynamic slot UP) to 1. Here is the output of show servers state:

# be_id be_name srv_id srv_name srv_addr srv_op_state srv_admin_state srv_uweight srv_iweight srv_time_since_last_change srv_check_status srv_check_result srv_check_health srv_check_state srv_agent_state bk_f_forced_id srv_f_forced_id srv_fqdn srv_port srvrecord srv_use_ssl srv_check_port srv_check_addr srv_agent_addr srv_agent_port
17 be_http:test:httpd-24 1 pod:httpd-24-8554dcd85d-rrsrl:httpd-24:8080-tcp:10.128.2.15:8080 10.128.2.15 0 1 1 1 156 6 3 0 14 0 0 0 - 8080 - 0 0 - - 0
17 be_http:test:httpd-24 2 pod:httpd-24-8554dcd85d-nvnl5:httpd-24:8080-tcp:10.129.2.22:8080 10.129.2.22 0 1 1 1 9 6 3 0 14 0 0 0 - 8080 - 0 0 - - 0
17 be_http:test:httpd-24 3 pod:httpd-24-8554dcd85d-m2hbj:httpd-24:8080-tcp:10.131.0.16:8080 10.131.0.16 2 0 1 1 490 6 3 4 6 0 0 0 - 8080 - 0 0 - - 0
17 be_http:test:httpd-24 4 _dynamic-pod-1 10.129.2.23 0 5 1 1 234 6 3 0 14 0 0 0 - 8080 - 0 0 - - 0


1 pod:httpd-24-8554dcd85d-rrsrl:httpd-24:8080-tcp:10.128.2.15:8080 10.128.2.15	STOP	MAINT	1	CHECK CONFENABLEPAUSED
2 pod:httpd-24-8554dcd85d-nvnl5:httpd-24:8080-tcp:10.129.2.22:8080 10.129.2.22	STOP	MAINT	1	CHECK CONFENABLEPAUSED
3 pod:httpd-24-8554dcd85d-m2hbj:httpd-24:8080-tcp:10.131.0.16:8080 10.131.0.16	UP	UP	1	CHECK CONFENABLED
4 _dynamic-pod-1 10.129.2.23	STOP	MAINT	1	CHECK CONFENABLEPAUSED

*At the top is the raw output, at the bottom is a simplified human readable view of raw data.

Same behavior if 4 replicas are enabled via 4 statis slots (dynamic slot is not used).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw even the route deletion doesn't lead to a reload if DCM is enabled. All of the servers of the deleted backend will be disabled but the backend will stay in show backend output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This raises the question of whether we need to log additional operational debug information somewhere. With DCM enabled, the haproxy.config becomes stale, so relying solely on its contents during debugging could lead to misleading conclusions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I had a user story for this but didn't have enough time to implement something for this iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to scale a deployment from 4 replicas (3 pre-allocated slots UP + 1 dynamic slot UP) to 1. Here is the output of show servers state:

I'm confused. I've been saying "pre-allocated" to describe the slots that we pre-allocate in haproxy.config for DCM to use to add new server endpoints to a backend. What do you mean when you say "3 pre-allocated slots"? Are these 3 pre-allocated slots for DCM, or are these 3 server lines for endpoints that are defined in haproxy.config?

Are you saying that I can create a route with 3 endpoints (which get written to haproxy.config), and then scale in to 1 endpoint, and no reload is required?

If I have a route with 1 pod endpoint (written to haproxy.config), delete the pod, then recreate the pod, is a reload required?

Btw even the route deletion doesn't lead to a reload if DCM is enabled. All of the servers of the deleted backend will be disabled but the backend will stay in show backend output.

This is excellent news!

Can you reproduce these results with non-TLS, TLS passthrough, edge-terminated, and reencrypt routes? Since there are some nuances (server versus server-template), I'm wary of subtle differences in behavior for different route types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. I've been saying "pre-allocated" to describe the slots that we pre-allocate in haproxy.config for DCM to use to add new server endpoints to a backend. What do you mean when you say "3 pre-allocated slots"? Are these 3 pre-allocated slots for DCM, or are these 3 server lines for endpoints that are defined in haproxy.config?

Sorry, my bad. I should have called them "static" slots. The pre-allocated are dynamic slots.

Are you saying that I can create a route with 3 endpoints (which get written to haproxy.config), and then scale in to 1 endpoint, and no reload is required?

Right, if DCM is enabled.

If I have a route with 1 pod endpoint (written to haproxy.config), delete the pod, then recreate the pod, is a reload required?

No. The deletion of the pod will set the corresponding server to maintenance mode. The recreation of a pod will use the dynamic (pre-allocated) server slot to set the new IP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the answer in https://github.com/openshift/enhancements/compare/15a3db2301114b9720c9966439a550c26618dbc7..88d317b9344acde23f3d57a585757ac08787f996. But we should consider now whether we should enable DCM by default with 0 slots to reduce the risk of impacting customers with the memory overhead of 1 pre-allocated server slot per backend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we should consider now whether we should enable DCM by default with 0 slots to reduce the risk of impacting customers with the memory overhead of 1 pre-allocated server slot per backend.

Do you want to make this decision so that it's recorded in this EP? Or it's a general question for the future (GA)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree about the problem statement, goals, and basic proposal, then I would like to get this PR merged. We can use follow-up PRs to resolve open questions or to update the design.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 7, 2025
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2025
@alebedev87
Copy link
Contributor

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 21, 2025
@Miciah Miciah changed the title ingress: Add dynamic-config-manager enhancement NE-1948: ingress: Add dynamic-config-manager enhancement Jan 27, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 27, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 27, 2025

@Miciah: This pull request references NE-1948 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.19.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2025
@candita
Copy link
Contributor

candita commented Feb 25, 2025

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 25, 2025
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 25, 2025
@candita
Copy link
Contributor

candita commented Mar 25, 2025

/remove-lifecycle stale
/lifecycle frozen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 30, 2025

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@candita
Copy link
Contributor

candita commented Aug 1, 2025

/reopen

@openshift-ci openshift-ci bot reopened this Aug 1, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2025

@candita: Reopened this PR.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 1, 2025

@Miciah: This pull request references NE-1948 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.20.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@candita
Copy link
Contributor

candita commented Aug 1, 2025

/lifecycle frozen

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 1, 2025

@candita: The lifecycle/frozen label cannot be applied to Pull Requests.

Details

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Aug 8, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2025

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Miciah
Copy link
Contributor Author

Miciah commented Oct 10, 2025

/remove-lifecycle rotten
/reopen

@openshift-ci openshift-ci bot reopened this Oct 10, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

@Miciah: Reopened this PR.

Details

In response to this:

/remove-lifecycle rotten
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 10, 2025

@Miciah: This pull request references NE-1948 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 10, 2025
@Miciah Miciah force-pushed the add-dynamic-config-manager-enhancement branch 4 times, most recently from 7f726e7 to f30648c Compare October 15, 2025 14:02
* enhancements/ingress/dynamic-config-manager.md: New file.
@Miciah Miciah force-pushed the add-dynamic-config-manager-enhancement branch from f30648c to 4e609cf Compare October 15, 2025 18:09
@Miciah
Copy link
Contributor Author

Miciah commented Oct 15, 2025

Several required sections were added, promoted, or renamed in the EP template since this PR was first posted. https://github.com/openshift/enhancements/compare/88d317b9344acde23f3d57a585757ac08787f996..4e609cf633ee329d72169c933c4e5d8056dc825a updates this EP to address failures from the markdownlint job that resulted from these template changes.

@alebedev87
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 20, 2025
@Miciah
Copy link
Contributor Author

Miciah commented Oct 21, 2025

ci/prow/markdownlint failed because the CI job failed to schedule.

/test markdownlint

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2025

@Miciah: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit e350c01 into openshift:master Oct 21, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants