-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for isolation groups to matching simulator #6243
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted filessee 7 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
Tasks []SimulationTaskConfiguration | ||
} | ||
|
||
SimulationPollerConfiguration struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonblocking:
since onebox is used for both docker/local development and simulation, what do you think about pulling this simulation-specific stuff into a simulation.go file so that it's a bit easier to see at a glance if there's any significant changes to the more general onebox config or it's just simulation config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, when I do further refactoring I'll move it out.
decisionTask := newDecisionTask(domainID, tasklist, scheduleID) | ||
isolationGroup := "" | ||
if len(taskConfig.getIsolationGroups()) > 0 { | ||
isolationGroup = taskConfig.getIsolationGroups()[newTasksGenerated%len(taskConfig.getIsolationGroups())] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good happy path scenario, but you'll want to control this as a config parameter for more tricky 'happy-path' scenarios.
expected scenarios include things like:
- there being no pollers in a zone, therefore we should see the isolation group logic re-route traffic
- a poller was in a zone with a sticky tasklist and then went away
- The isolation-group is being flooded due to skew
as well as pathological scenarios like:
- traffic is dispatched to an isolation-group with too many partitions (eg partition > pollers per IG)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the scenarios use it yet, but we can configure multiple task generators to create scenarios with skew. For example, we could do something like this to send slight traffic between zones a and b while sending more significant traffic to c, which doesn't have any pollers.
simulationconfig:
tasklistwritepartitions: 2
tasklistreadpartitions: 2
forwardermaxoutstandingpolls: 1
forwardermaxoutstandingtasks: 1
forwardermaxratepersecond: 10
forwardermaxchildrenpernode: 20
localpollwaittime: 0ms
localtaskwaittime: 0ms
tasks:
- numtaskgenerators: 2
taskspersecond: 10
maxtasktogenerate: 3000
isolationgroups: ['a', 'b']
- numtaskgenerators: 2
taskspersecond: 80
maxtasktogenerate: 3000
isolationgroups: ['c']
pollers:
- isolationgroup: 'a'
taskprocesstime: 1ms
numpollers: 4
polltimeout: 60s
- isolationgroup: 'b'
taskprocesstime: 1ms
numpollers: 4
polltimeout: 60s
Sticky tasklists aren't something we've looked at in detail yet but otherwise I think all those scenarios are possible to create with the new structure introduced in this PR.
This fundamentally restructures how pollers and task generators are configured, but allows for granular control across isolation groups.
24fbe7f
to
7aab5d6
Compare
This fundamentally restructures how pollers and task generators are configured, but allows for granular control across isolation groups.
What changed?
Why?
How did you test it?
Potential risks
Release notes
Documentation Changes