Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logical optimization of provider kubernetes #920

Closed
8 tasks
zhshw opened this issue Jan 16, 2023 · 3 comments
Closed
8 tasks

Logical optimization of provider kubernetes #920

zhshw opened this issue Jan 16, 2023 · 3 comments
Labels
kind/enhancement New feature or request

Comments

@zhshw
Copy link

zhshw commented Jan 16, 2023

I have forked the latest code on github, I will use my experience to solve this problem...

Problems:

The root cause is that the SDK provided by the controller-runtime is not suitable for complex K8S controller projects, It is terrible logic to use a Single Reconcile for multiple resources. The logic is bloated and difficult to handle.

Todo List:

  • Replace Single Reconcile And Dynamic client , Use Informer event-handler
    • Different resources, different work queues
    • Split processing logic (EDS、CDS、RDS、LDS、VHDS、HDS), Solve repeated processing 
    • Use lister cache instead of remote request
  • Support EDS, K8S endpoints update And resovle service to endpoints (easy to implement
  • Support K8S Event Record
  • Support more data validation Necessary data check of envoy listener #854
  • Use proto message Equal to skip duplicate data

Reference :

@zhshw zhshw added the kind/enhancement New feature or request label Jan 16, 2023
@youngnick
Copy link
Contributor

Thanks for this issue, @zhshw.

The maintainers of this project have all built different Kubernetes controllers before, so I think that you're missing some context here.

The single reconciler pattern is very important for a complex, interrelated set of resources like Gateway API, as changes in one resource can mean that other resources also need to be reprocessed. Doing this with separate reconcilers actually creates lots of individual reconcile events (it's quite common for a Route update to require a Gateway re-reconcile, which can then trigger re-reconciles for other Route objects, for example).

We actually did start out with separate reconcilers, but have ended up folding them back into one because of issues like this.

I agree that speeding up endpoint reconciliation is an important goal for operating EG at scale, but to this point, we have been concentrating on getting the basic functionality working, rather than scale testing. If you have numbers about scale testing you can share, that would be excellent, and a great place to start this conversation.

Finally, I guess you didn't intend this, but the way that this issue is written, it is implying something like "the current maintainers are all stupid, we should do this the right way". As I said earlier, all the maintainers have built controllers before, and are solving the problems that they have had before. If you've had a different experience, that's valuable information, but an approach centered around asking why things are the way they are first might be better received in the future.

@arkodg
Copy link
Contributor

arkodg commented Jan 18, 2023

adding to what @youngnick said, here's a GH issue that introduced merging controllers #413 which has more info on the WHY

@zhshw
Copy link
Author

zhshw commented Jan 18, 2023

@youngnick @arkodg

There is a problem, we should solve it instead of worrying about why it was designed in the past. The data structure can always be optimized, and there is always a better way to solve problems, The starting point of everything is to solve problems

Of course, we can see that the test results are not ideal. My test is only based on a small amount of data, which causes problems (a lot of repeated processing). In fact, many resource data changes do not need to be notified at the top, such as route weight change, endpoint , lbpolicy and route timeout...

I will test larger data scale in the future. If there is no design verification of large-scale data, it is equivalent to building a high platform with drift sand.

Referring to other open source envoy-control-plane projects(istio 、contour、 gloo) and Kubernertes controllers, none of them adopts Single Reconcile

Another way:

  • Parent-child relationship can be split by bottom-up data assembly. assemble bottom data through multiple queues in parallel, Finally, push the top data and new the version
  • XDS snapshot Empty resource version will not push data

Years of experience is that only simpler logic can guarantee the maintainability and performance of the project.

  • One resource, one queue
  • Independent logic, unit processing

In this way, the code will be less and the logical unit will be simpler. The new data structure can support larger scale tests

@zhshw zhshw closed this as completed Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants