-
Notifications
You must be signed in to change notification settings - Fork 292
Merge feature/vcpu-runnable to feature/numa8 #6540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gangj
wants to merge
314
commits into
feature/numa8
Choose a base branch
from
feature/vcpu-runnable
base: feature/numa8
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
As described in [#6451](#6451), a xapi event could prevent update_vm from pulling the latest Xenopsd metadata, overwriting it with stale information. In case of suspend, this would make the snapshot unresumable, raising an assert in xenopsd due to incongruities in memory values. Instead pull the xenopsd metadata right before updating DB.power_state in`Xapi_vm_lifecycle.force_state_reset_keep_current_operations`, eliminating the window for the race. Closes #6451
Better progress indication, CLI tunables, and the ability to dump raw data to a directory, that can be used by `ministat`. Example usage: ``` mkdir /tmp/pool dune exec ./bench_pool_field.exe --profile=release -- -d /tmp/pool --quota 20 ... dune exec ./bench_pool_field.exe --profile=release -- -d /tmp/pool --quota 20 for i in 'Db.Pool.get_all_records' 'Rpc.t -> pool_t' 'pool_t -> Rpc.t'; do ~/git/ministat/ministat -s "/tmp/pool/${OLD}/${i}.dat" "/tmp/pool/${NEW}/$i.dat" -c 99.5; done ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Write directly to the buffer, instead of writing to a buffer, returning a string, and writing to a buffer again. Escaping shows up in performance profiles, because `trusted_on_first_use` field in the pool contains a JSON, which contains \" characters that need escaping. Reduces memory allocations from 324.4308 mnw/run to 196.3633 mnw/run. Slight change in performance: ``` Db.Pool.get_all_records: N Min Max Median Avg Stddev x 384 81858.452 513398.88 87109.044 88314.45 21963.255 + 389 78006.031 559052.12 83431.429 84780.535 24315.561 Difference at 95.0% confidence -3533.91 +/- 3267.84 -4.00151% +/- 3.63493% (Student's t, pooled s = 23176.9) ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The non-standard S-expression serializer wraps all strings in `'`. The lexer also ignores every non-escaped character after a `'`, until it sees the next `'` or escape char. So it should be safe to avoid escaping `"`. Unescaping is unchanged: any character can be escaped with `\`, and it returns it unchanged after removing the escape char, so this preserves backwards compatibility when loading an old database. Escaping shows up in performance profiles, because `trusted_on_first_use` field in the pool contains a JSON, which contains \" characters that needed escaping. With this change it won't anymore. `ministat` confirms that there is an improvement: ``` sexpr_of_json_string: N Min Max Median Avg Stddev x 806 1438.7128 63663.127 1490.3661 1594.5095 2192.0548 + 850 911.23529 48528.173 967.25054 1037.7674 1632.6855 Difference at 95.0% confidence -556.742 +/- 185.531 -34.9162% +/- 9.28192% (Student's t, pooled s = 1925.34) str_of_sexp_json: N Min Max Median Avg Stddev x 792 1622.9135 49591.388 1719.5377 1786.3412 1702.6472 + 893 605.37329 3354.8035 626.51812 636.34457 107.24734 Difference at 95.0% confidence -1150 +/- 111.92 -64.3772% +/- 2.26583% (Student's t, pooled s = 1169.88) ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
For backwards compatibility the serialized form looks like 20250319T04:16:24Z, which had to go through several transformations before it was parsed: 2 sscanf, 2 sprintf, and then the Ptime parser. Since we're parsing the format with sscanf anyway, add a fastpath that builds a Ptime.t directly without going through reformatting the string and reparsing it. This speeds up API replies that contain dates, like pool which contains 'telemetry_next_collection' as a date. `ministat` confirms: ``` Date.of_iso8601: N Min Max Median Avg Stddev x 786 1703.462 98255.061 1796.5826 2031.7502 3858.1277 + 905 525.1954 73923.347 558.17972 711.02195 2732.7547 Difference at 95.0% confidence -1320.73 +/- 315.725 -65.0045% +/- 10.3523% (Student's t, pooled s = 3303.82) Db.Pool.get_all_records: N Min Max Median Avg Stddev x 390 76966.273 498179.5 82995.374 84536.667 21266.749 + 401 69709.657 546133 74811.568 76379.246 23782.821 Difference at 95.0% confidence -8157.42 +/- 3147.12 -9.64957% +/- 3.57009% (Student's t, pooled s = 22577.4) Rpc.t -> pool_t : N Min Max Median Avg Stddev x 554 16945.375 267477.23 17620.482 18195.16 10648.914 + 594 11432.375 251226.67 12011.373 12493.367 9824.9986 Difference at 95.0% confidence -5701.79 +/- 1184.38 -31.3369% +/- 5.53746% (Student's t, pooled s = 10230.9) str_of_sexp_json: N Min Max Median Avg Stddev x 893 605.37329 3354.8035 626.51812 636.34457 107.24734 + 911 510.23412 3298.9936 523.73684 532.25916 98.007142 Difference at 95.0% confidence -104.085 +/- 9.47756 -16.3568% +/- 1.36325% (Student's t, pooled s = 102.685) ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We called the wrong finally here. In general we mark the backtrace in finally as important (which involves formatting it, this could be optimized separately). However Mutex.unlock doesn't raise (in well-behaved code, unless you double unlock), so we can use Fun.protect instead. Hashtbl.find in xapi_local_session.ml raises every time in the RBAC checks, this change avoids the costly backtrace formatting (which was discarded by a try/with later anyway). Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We can use Hashtbl.mem instead of catching the exception from Hashtbl.find. `ministat` confirms: ``` local_session_hook : N Min Max Median Avg Stddev x 1057 117.49169 15160.913 123.5201 142.19665 464.12468 + 1166 38.851635 14365.19 41.619012 57.978805 423.21452 Difference at 95.0% confidence -84.2178 +/- 36.8873 -59.2263% +/- 19.5015% (Student's t, pooled s = 443.137) ``` Although there are also some unexplained, but reproducible slowdowns in code that isn't touched by this commit at all: ``` str_of_sexp_json : N Min Max Median Avg Stddev x 911 510.16149 3075.0192 523.44043 531.48578 97.085991 + 892 605.73462 2951.0948 632.27053 652.10806 95.96889 Difference at 95.0% confidence 120.622 +/- 8.91245 22.6953% +/- 1.88103% (Student's t, pooled s = 96.5349) ``` Perhaps this is due to code layout changes? Signed-off-by: Edwin Török <edwin.torok@cloud.com>
read_record was marshaling the maps all the time, and some of these maps can be quite big. Cache the marshaled form in the row itself. This is a trade-off between memory usage and performance (caching increases global memory usage, but avoids repeated allocations every time get_record is called, which may decrease memory usage/GC pressure under load). Eventually we may be able to drop the marshaled form, but we need to change the db_rpc remote protocol for that (it currently cannot marshal/unmarshal values on its own, because it doesn't have access to the type information). `ministat` confirms: ``` Db.Pool.get_all_records : N Min Max Median Avg Stddev x 400 71008.769 489829.75 75874.584 76957.497 20852.389 + 871 51638.867 469264.11 54400.946 55511.462 19845.992 Difference at 95.0% confidence -21446 +/- 2387.53 -27.8674% +/- 2.84131% (Student's t, pooled s = 20167.8) sexpr_of_json_string : N Min Max Median Avg Stddev x 862 819.74359 44758.053 851.53894 915.0725 1496.4717 + 1688 970.16146 51923.24 1024.1987 1097.3771 1748.4532 Difference at 95.0% confidence 182.305 +/- 136.827 19.9224% +/- 15.8188% (Student's t, pooled s = 1667.57) str_of_sexp_json : N Min Max Median Avg Stddev x 1896 605.00587 2932.3931 627.51199 640.73142 89.48524 + 1812 537.56839 3105.1991 550.25275 558.89761 93.633252 Difference at 95.0% confidence -81.8338 +/- 5.89411 -12.7719% +/- 0.864486% (Student's t, pooled s = 91.5357) ``` Slight increase in 'sexpr_of_json_string'. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
This doesn't have feature flags, they are optimizations. The pool object has gained a JSON field recently, which slowed it down significantly, because the `"` was getting escaped (so the fastpath that checks whether there are any escapable characters or not couldn't be triggerred). The date field is also very slow to handle due to various backward compatibility format transformations. Add a direct parser for the format used in memory (it'd be good to skip all these serializing/deserializing, but that is a bigger change). Handling the database requires several `finally` calls, which perform a lot of backtrace formatting, and some code keeps raising exceptions even on happy paths (e.g. Hashtbl.find). We need to fix those too, but we should also speed up `finally`. Xapi_local_session was also raising exceptions all the time, replace this with Hashtbl.mem
Refix with some review comments addressed. To join a host into a pool with cluster enabled, the host must have one and only one IP configured on the joining cluster network. If not, after the host joinied the pool, GFS2 SR cannot be plugged on the joined host because an IP is required in the cluster network. Pool join in this scenario has been blocked in XenCenter, here we will block it inside xapi. Signed-off-by: Gang Ji <gang.ji@cloud.com>
…ameter Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Used 'dune format-dune-file dune >dune && mv dune.tmp dune' No functional change. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
ocaml-lsp-server's 'Action->Infer interface' was used for this, and then removed internal/unused entries. Having an mli file is useful: * can find dead code more easily if it is clear which functions are internal to a module and which aren't * the impact of refactoring changes is more obvious (did we have to change an mli at all?) * makes it easier to understand what a module does There are some .ml files which only contain types, these are instead renamed to .mli and dune's `modules_without_implementation` feature is used. Executables get an empty .mli (recent versions of dune already do the equivalent of this, but this makes it more obvious). Drop code from .ml files that now show up as unused. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…ameter (#6459) This would eventually lead to compilation errors with newer versions of GCC, best fix it now by renaming the parameter.
ocaml-lsp-server's 'Action->Infer interface' was used for this, and then removed internal/unused entries. Having an mli file is useful: * can find dead code more easily if it is clear which functions are internal to a module and which aren't * the impact of refactoring changes is more obvious (did we have to change an mli at all?) * makes it easier to understand what a module does There are some .ml files which only contain types, these are instead renamed to .mli and dune's `modules_without_implementation` feature is used. Executables get an empty .mli (recent versions of dune already do the equivalent of this, but this makes it more obvious). Drop code from .ml files that now show up as unused. Eventually we can also add some documentation, this will be done in followup PRs for new code.
On XS9 a previiously available SM type might become unavailable. Make sure we remove it on upgrade. The previius fix in CA-408048 did not work. Signed-off-by: Christian Lindig <christian.lindig@cloud.com>
The current logic in storage_migrate.ml for mirror failure check is specific to tapdisk, hence multiplex it. `pre_deactivate_hook` also has something similar to check for mirror failure, so do something similar there. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Previously this was deleted in commit 1fe6389 as it was not multiplexed, but looks like we still need to keep it in storage_mux because sm-cli needs to make rpc calls to storage_mux when trying to list all the mirrors to make it work properly, due to the fact that the sr plugins are stored in the address space of the xapi process. There are other invocations in sm-cli such as `Storage_migrate.start` which may have similar problems. But I have left them alone as I don't any reasonable way of calling them from the cli. Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Do the same as DATA.MIRROR.list Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
No functional change Signed-off-by: Edwin Török <edwin.torok@cloud.com>
No functional change Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Use an attribute to change the name of the fields instead of serializing with one name, and then mapping to the other. `ministat` confirms an improvement on `rpc_of_event`: ``` N Min Max Median Avg Stddev x 922 440.79126 46094.975 467.83727 563.79259 1676.0843 + 1131 51.346821 33407.866 58.715393 115.98006 1105.4178 Difference at 95.0% confidence -447.813 +/- 120.966 -79.4286% +/- 13.1488% (Student's t, pooled s = 1390.95) ``` Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Atomic has lower overhead, and can be used for simple situations (incrementing an integer, setting a boolean). ``` Mutex+incr (ns): { monotonic-clock per run = 32.272091 (confidence: 32.365531 to 32.170138); r² = Some 0.999441 } Atomic.incr (ns): { monotonic-clock per run = 2.938678 (confidence: 2.944486 to 2.933688); r² = Some 0.999857 } ``` No functional change Signed-off-by: Edwin Török <edwin.torok@cloud.com>
No measurable performance change, but this makes it clearer for deadlock detection algorithms that we are waiting to acquire a lock. It'll also make it clearer for OCaml 5's thread sanitizer that values are modified with a lock held. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
No functional change Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The double ref is not needed, replace it with an Atomic.t No functional change. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Instruments `process`/`Consumers` spans of message-switch service in xenopsd. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Intruments functions in `xapi_xenops.ml` that carry the traceparent/tracecontext through dbg. `Debug_info.with_dbg` now accepts setting up attributes for spans. Also, instruments `Events_from_xenopsd` to capture the event spans: `subscribe`/`settle`. It's nto straight forward to link them on the same trace. For now the only way they are connnected is having the same `message.id` attribute. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Intruments the functions in `xapi_xenops.ml` that carry the traceparent/tracecontext through `context`. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
This function is called recursively and the context span is only closed on failure. This makes it hard to read the trace. Therefore, I am resetting the context tracing each recursion step and now each recursion step will have its own trace. Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
Signed-off-by: Gabriel Buica <danutgabriel.buica@cloud.com>
events_from_xenopsd thread is critical as it sync up VM status in case of bootstorm, this thread is flood as lots of events comes from xenopsd waiting for process. During processing of the events, VM/VDI/VBD update_allowed_operations will be called to refresh the allowed operations. However, for each ops (start/suspend,etc) for the same object(VM), the object info is always the same no matter what the ops is. Thus, it is not necessary to query the object information over and over again. Disclosure is used to resovle the issue. Query once and the disclosure will just remember the query result. The performance test for starting 500 VM on 4 hosts improve around 10% performance for both XS8 and XS9
1. WLB request will raise `wlb_authentication_failed` when catching `Http_client.Http_error`. But actually only error code 401 and 403 should raise this type exception. For other error code, raise `wlb_connection_reset`. Also print the detail error code and message. 2. `message_forwarding` raises same error for `Http_request_rejected` and `Connection_reset` so we don't know which exception actually be raised. Print detailed logs for these 2 exceptions. Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
…xenops.ml` (#6527) As promised, more tracing instrumentation around message switch in xenopsd. There is also some instrumentation of `Events_from_xenops` around waiting/waking up for events. Currently these are not straight forward linkable as they have different traces, e.g. one is on the vm op and the other in the xapi event watch thread. It is now possible to search for for "`subscribe`/`settle` <queue_name>" spans that have the same attribute value for "messaging.message.id". This should help debugging.  
"data/updated" is not read or used anywhere in xenopsd or xapi: * xapi_guest_agent's last_updated field is just Unix.gettimeofday (). * xapi_xenops removes "data/updated" from the guest agent state altogether before checking if it's changed: ``` let ignored_keys = ["data/meminfo_free"; "data/updated"; "data/update_cnt"] ``` So there is no need to watch this path at all. This greatly reduces unnecessary traffic between xapi and xenopsd, since any VM with a guest agent would write to data/updated once every 60 seconds, which would generate a Dynamic.Vm event, making xapi call xenopsd's VM.stat to rescan the domain's xenstore tree and perform several hypercalls. Almost always, this would be completely unnecessary as nothing else about the VM would change, but a lot of work would be done anyhow. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
1. WLB request will raise `wlb_authentication_failed` when catching `Http_client.Http_error`. But actually only error code 401 and 403 should raise this type exception. For other error code, raise `wlb_connection_reset`. Also print the detail error code and message. 2. `message_forwarding` raises same error for `Http_request_rejected` and `Connection_reset` so we don't know which exception actually be raised. Print detailed logs for these 2 exceptions.
Drop the first member of the (X.t * X.state) tuple coming from X.stat immediately as it's never used. This removes the need for several 'snd info', 'Option.iter (fun (_, state) ->) info' constructs. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Rename the "u" function used across storage and observer_helpers to make its meaning more obvious and use __FUNCTION__ instead of hardcoding the function name. Signed-off-by: Steven Woods <steven.woods@cloud.com>
Some refactoring, but the most important commit is this one-line change that greatly improves things on its own: --- xenopsd: Remove `data/updated` from the list of watched paths "data/updated" is not read or used anywhere in xenopsd or xapi: * xapi_guest_agent's last_updated field is just Unix.gettimeofday (). * xapi_xenops removes "data/updated" from the guest agent state altogether before checking if it's changed: ``` let ignored_keys = ["data/meminfo_free"; "data/updated"; "data/update_cnt"] ``` So there is no need to watch this path at all. This greatly reduces unnecessary traffic between xapi and xenopsd, since any VM with a guest agent would write to data/updated once every 60 seconds, which would generate a Dynamic.Vm event, making xapi call xenopsd's VM.stat to rescan the domain's xenstore tree and perform several hypercalls. Almost always, this would be completely unnecessary as nothing else about the VM would change, but a lot of work would be done anyhow.
Rename the "u" function used across storage and observer_helpers to make its meaning more obvious and use __FUNCTION__ instead of hardcoding the function name.
The `ref` parameter within the `location` attribute of the console can refer to either the VM's ref or the console's own ref. Currently, the console's location uses the VM's ref by default. This causes an issue: when executing xs console, the requested location contains the VM's ref. If the ref points to the VM, xapi will attempt to use the RFB console (which is graphical) by default, rather than the VT100 console (which is text-based). As a result, the xs console command fails to open the console and hangs. **Solution:** Update the default ref in the console's `location` to the console's own ref. With this change, whether accessing the RFB or the VT100 console, the ref in the `location` will always point to the respective console itself. Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Add details on specifying image format for VDI and VM migration. In particular This revision explains how to choose the destination image format during VDI creation and migration, including VM migration scenarios. Also fixes minor typos in the document. Signed-off-by: Guillaume <guillaume.thouvenin@vates.tech>
When shutdown a VM, xapi will check SM API version to decide if to call `post_deactivate_hook`. But if the SR has already been unplugged, the checking will fail. Solution: Check if the dp of the SR still exists. If not, skip the SM API checking. Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
Add details on specifying image format for VDI and VM migration. In particular This revision explains how to choose the destination image format during VDI creation and migration, including VM migration scenarios. Also fixes minor typos in the document.
This reverts commit 9e6fb15 Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
On xen versions that don't support this call yet, xenctrlext will simply fail and continue to behave like before. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
This interface is not yet available in xen, so fail before doing the hypercall. This patch is meant to be reverted on system that provide the new interface for easily test it. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
These commits allow to minimize the differences between the versions that use a xen with the NUMA-enabled claim_pages, and the ones that are not. Now it's a single patch with 4 lines of code: - 2 lines to change the C binding for domain_claim_pages (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c) - 2 lines to remove the raising of exception to try to use the C binding with a single NUMA node (ocaml/xenopsd/xc/xenctrlext.ml)
The `ref` parameter within the `location` attribute of the console can refer to either the VM's ref or the console's own ref. Currently, the console's location uses the VM's ref by default. This causes an issue: when executing xe console, the requested location contains the VM's ref. If the ref points to the VM, xapi will attempt to use the RFB console (which is graphical) by default, rather than the VT100 console (which is text-based). As a result, the xs console command fails to open the console and hangs. **Solution:** Update the default ref in the console's `location` to the console's own ref. With this change, whether accessing the RFB or the VT100 console, the ref in the `location` will always point to the respective console itself.
Most of these have been unused for almost 10 years since c4ccc56 ("CA-217842: Replace instances of vm_lacks_feature_x with vm_lacks_feature") Drop them. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Most of these have been unused for almost 10 years since c4ccc56 ("CA-217842: Replace instances of vm_lacks_feature_x with vm_lacks_feature") Drop them.
When shutdown a VM, xapi will check SM API version to decide if to call `post_deactivate_hook`. But if the SR has already been unplugged, the checking will fail. Solution: Check if the plugin of the SR still exists. If not, skip the SM API checking.
I think it's worth repeating again, I don't think it's worth maintaining a long-lived branch for what is effectively 4 commits. I say this because there are conflicts in the merge |
after the discussions we had in the last couple of weeks, I think the changes are simple enough for runnable_any and runnable_vcpus that they can be merged directly to master without using this feature branch feature/numa8. |
mg12
approved these changes
Jun 27, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merge feature/vcpu-runnable as there were some little tweaks for RRD1: #6530