Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nsqd: add topology region/zone aware message consumption #1301

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

jehiah
Copy link
Member

@jehiah jehiah commented Nov 21, 2020

This updates nsqd to have a --topology-region and --topology-zone config argument and prefer sending messages to same-zone and same-region consumers as proposed in #1300

@jehiah jehiah self-assigned this Nov 21, 2020
@jehiah
Copy link
Member Author

jehiah commented Nov 21, 2020

I think this is ready for a first round of review @mreiferson @ploxiln (along w/ it's pair nsqio/go-nsq#312

Some items to decide on -

  • Should -mem-queue-size=0 disable zone local and region local consumption? (i.e. should it continue to effectively write to disk first?)
  • Is there an approach to changing the consumption of the disk read chan so it prefers zoneLocal and regionLocal consumers? (this would probably mean a goroutine that consumes it and does put() instead of messagePump consuming the disk backend directly. If there is an approach is it important to do as part of this?

Once this is squared away and we are happy with it i'll follow up with documentation PR's and exposing region/zone in nsqadmin.

Copy link
Member

@ploxiln ploxiln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some ideas, for what I think are minor simplifications that leave the overall design the same. Overall looks good to me.

@@ -120,6 +120,8 @@ func nsqdFlagSet(opts *nsqd.Options) *flag.FlagSet {
flagSet.Var(&lookupdTCPAddrs, "lookupd-tcp-address", "lookupd TCP address (may be given multiple times)")
flagSet.Duration("http-client-connect-timeout", opts.HTTPClientConnectTimeout, "timeout for HTTP connect")
flagSet.Duration("http-client-request-timeout", opts.HTTPClientRequestTimeout, "timeout for HTTP request")
flagSet.String("topology-region", opts.TopologyRegion, "A region represents a larger domain, made up of one or more zones")
flagSet.String("topology-zone", opts.TopologyZone, "A zone represents a logical failure domain")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably mention something about "for preferring closer consumer"

nsqd/protocol_v2.go Outdated Show resolved Hide resolved
nsqd/channel.go Outdated Show resolved Hide resolved
nsqd/protocol_v2.go Outdated Show resolved Hide resolved
@jehiah jehiah force-pushed the topology_aware_msg_delivery_1301 branch from f390edc to 96adf24 Compare November 28, 2020 01:39
@zoemccormick zoemccormick force-pushed the topology_aware_msg_delivery_1301 branch 2 times, most recently from 91517a4 to fa8d2b4 Compare October 24, 2023 19:02
@zoemccormick zoemccormick force-pushed the topology_aware_msg_delivery_1301 branch from 2b90b29 to 42d28f5 Compare November 27, 2023 16:50
@zoemccormick zoemccormick force-pushed the topology_aware_msg_delivery_1301 branch from 88161cb to af1edde Compare January 2, 2024 17:37
@zoemccormick zoemccormick force-pushed the topology_aware_msg_delivery_1301 branch from af1edde to 4631064 Compare January 31, 2024 21:19
@zoemccormick zoemccormick force-pushed the topology_aware_msg_delivery_1301 branch 2 times, most recently from e93d2b5 to b84cfe2 Compare April 5, 2024 17:21
@zoemccormick zoemccormick force-pushed the topology_aware_msg_delivery_1301 branch 5 times, most recently from 83e5939 to d0cd516 Compare April 17, 2024 15:32
@zoemccormick
Copy link

zoemccormick commented May 15, 2024

@jehiah @ploxiln @mreiferson This PR is now officially ready for review - in conjunction with nsqio/go-nsq#312 and nsqio/nsqio.github.io#89.

We have written up an experience report based on our observations running these changes for 2 months in our production environment - it can be found here. It also lays out the changes to nsqadmin.

Please let me know if you have any other questions! Thanks in advance!

apps/nsqd/options.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
nsqadmin/static/js/views/channel.hbs Outdated Show resolved Hide resolved
nsqd/channel.go Outdated Show resolved Hide resolved
zoemccormick and others added 3 commits May 21, 2024 14:27
Co-authored-by: Jehiah Czebotar <jehiah@gmail.com>
Co-authored-by: Jehiah Czebotar <jehiah@gmail.com>
@mreiferson
Copy link
Member

@jehiah hello! what's the plan here for actually landing this? Are y'all waiting on me to review everything 😏?

@jehiah
Copy link
Member Author

jehiah commented Oct 17, 2024

@mreiferson sorry for the radio silence on this - I think we are ready to give more attention to this in Q4 so if you have any comments on this PR (or the related go-nsq changes) in the next few weeks please share them otherwise the topology changes are ready to land and we'll do that next month.

I think the general plan is to still land this behind the feature flag and get them in a 1.4 release and consider removing the FF in 1.5. We have been happy with this running at Bitly for a while now, but i'm keeping an eye on kubernetes/enhancements#4747 as it will make adopting in a K8s environment easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants