Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] server, ui: return replica-level QPS and leaseholder info; render in replica matrix #26906

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

vilterp
Copy link
Contributor

@vilterp vilterp commented Jun 21, 2018

image
image
(Video)

Example: bin/roachtest bench tpccbench/nodes=30/cpu=16 (30 node x 16 CPU cluster; ~58k ranges)

This uses a lot of memory and CPU in the browser (and on the server, probably), so needs a fair amount of perf investigation& work before merging.

Current perf concerns:

  • Server
    • New endpoint LeaseholdersAndQPS calls RaftDebug range internally, and then copies only a few things from that response into its own response. How much is to be gained by removing that intermediate call, and having LeaseholdersAndQPS iterating through stores and ranges itself?
    • RaftDebug iterates through all stores and all ranges they have. How much memory and CPU does this take? How much disk IO? (I assume those range descriptors aren't all stored in memory at all times)
    • returns about 3 megs of data for the example above (58k ranges, 30 nodes); usually takes around 3-10 seconds

TODO

  • fix sorting in UI
  • figure out why sizes (live bytes) are missing for some replicas

@vilterp vilterp requested review from a team June 21, 2018 21:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@vilterp vilterp changed the title server, ui: return replica-level QPS and leaseholder info; render in replica matrix [WIP] server, ui: return replica-level QPS and leaseholder info; render in replica matrix Jun 21, 2018
@vilterp vilterp added the do-not-merge bors won't merge a PR with this label. label Jun 21, 2018
@vilterp vilterp force-pushed the all-the-ranges-heatmap branch from d8ddaef to 7de7efd Compare June 21, 2018 22:10
@bdarnell
Copy link
Contributor

RaftDebug iterates through all stores and all ranges they have. How much memory and CPU does this take? How much disk IO? (I assume those range descriptors aren't all stored in memory at all times)

Range descriptors are stored in memory, so no disk IO. The CPU and memory cost can be substantial, though.


Reviewed 8 of 8 files at r2, 2 of 4 files at r3, 1 of 3 files at r5, 2 of 2 files at r6, 1 of 1 files at r7.
Review status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/server/status.go, line 1595 at r7 (raw file):

			replicaInfo.Stats = replInfo.Range.Stats
			// this seems complicated... each node has its own idea of the lease
			// they might disagree.

Leases have sequence numbers; this should use the one with the highest sequence.


pkg/server/serverpb/status.proto, line 613 at r7 (raw file):

message LeaseholdersAndQPSResponse {
  message ReplicaInfo {
    // TODO(vilterp): I am pretty confused about why this is called RangeStatistics

Writes per second is a range-scoped stat; reads per second is replica-scoped (for now it's zero on followers, so the leaseholder's replica stats tell you everything about the range, but this will change when we have follower reads).


Comments from Reviewable

@vilterp vilterp force-pushed the all-the-ranges-heatmap branch from 7de7efd to 3e396d2 Compare June 26, 2018 03:10
@vilterp vilterp force-pushed the all-the-ranges-heatmap branch from 3e396d2 to 86252e8 Compare July 13, 2018 22:10
@vilterp vilterp force-pushed the all-the-ranges-heatmap branch 2 times, most recently from 55e030c to cbf588b Compare August 15, 2018 06:02
@vilterp
Copy link
Contributor Author

vilterp commented Aug 30, 2018

Note to self: the UI should just hit each node independently to ask it for which replicas it has. That way, unhappy/slow nodes don't completely block the rendering of the page.

Pete Vilter added 12 commits November 2, 2018 16:17
Calls the RaftDebug endpoint and filters out everything but basic
information about a range: table id, leaseholder node id, and read and
write stats.

Release note: None
TODO: factor out into styl file

Release note: None
Release note: None
i.e. map values to color scale

Release note: None
featuring a very ugly sort state selector widget

Release note: None
Release note: None
@vilterp vilterp force-pushed the all-the-ranges-heatmap branch from 40bb0cd to 8ef44f5 Compare November 2, 2018 20:20
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label Jun 19, 2019
@bobvawter
Copy link
Contributor

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Pete Vilter seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge bors won't merge a PR with this label. X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants