- 
                Notifications
    You must be signed in to change notification settings 
- Fork 292
Update feature/perf from master #6000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
        
      
            edwintorok
  merged 491 commits into
  xapi-project:feature/perf
from
edwintorok:feature/perf
  
      
      
   
  Sep 18, 2024 
      
    
                
     Merged
            
            Update feature/perf from master #6000
                    edwintorok
  merged 491 commits into
  xapi-project:feature/perf
from
edwintorok:feature/perf
  
      
      
   
  Sep 18, 2024 
              
            Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
- Removed templates debian and debug from Makefile Signed-off-by: Ashwinh <ashwin.h@cloud.com>
…working-covert-tabs-to-spaces CP-49931 - Convert tabs to spaces and fix whitespace for scripts/xe-reset-networking:
…tadata-2to3-and-warningfixes CP-50100: `backup-sr-metadata,restore-sr-metadata`: `2to3`, fix `pytype`, `pyright`
In `XenAPIPlugin.py`'s `dispatch()` function, SystemExit does not need to be caught and raised because both other exceptions are subclasses of Exception: By design, SystemExit is a subclass of BaseException and because we are not catching BaseException and also not use a bare `except:` here, we can cleanup catching and re-raising `SystemExit()` here. Reference: https://docs.python.org/3/library/exceptions.html#SystemExit Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
….py-ditch-superflous-raise-SystemExit
…c directory - Modified python3/Makefile to include this change. - Removed backup-sr-metadata.py from scripts/Makefile Signed-off-by: Ashwinh <ashwin.h@cloud.com>
…ec directory - Modified python3 Makefile to include this change - Removed restore-sr-metadata.py from scripts/Makefile - Fixed bare-except exception pylint issue Signed-off-by: Ashwinh <ashwin.h@cloud.com>
…sions Original (code supplied) by Bernhard Kaindl: - Declare missing methods to python3/stubs/XenAPI.py for pyright - Initialize variables to fix pyright:reportPossiblyUnboundVariable - Applied isort Signed-off-by: Ashwinh <ashwin.h@cloud.com> Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
…P-49919 CP-49919: mv scripts/extensions/pool_update.precheck to python3/extensions
…P-50100 mv restore-sr-metadata.py & backup-sr-metadata.py to python3/
- Modified python3/Makefile to include this change - Removed static-vdis from scripts/Makefile - Modified test_static_vdis.py to include new location of the static-vdis Signed-off-by: Ashwinh <ashwin.h@cloud.com>
…ix-byte-logging-found-by-mypy
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
These example scripts were imported in 2009 and are obsoleted by samples like: https://github.com/xapi-project/xen-api-sdk/blob/master/python/samples/powercycle.py - The sample xen-api-sdk/python/samples/powercycle.py is much better. - They are not installed and not otherwise mentioned in the repository. Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Fix pyright:
    - Rework unclean exception handling using a contextmanager.
    - Declare address and master in case of an Exception as well.
    - Fix warning on unsupported escape sequence using a raw string.
    - Removed python2 script check from pyproject.toml
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Ashwinh <ashwin.h@cloud.com>
    …P-49931 CP-49931: mv `scripts/xe-reset-networking` to `python3/bin`
…ete-ocaml/idl/ocaml_backend/python CP-47869: Remove ocaml/idl/ocaml_backend/python: remove obsolete example scripts: - Could not find any use of installation of them
…P-49900 CP-49900: Removed templates folder from python3/
…est_mail_alarm-py3 Py3: Cleanup `test_mail-alarm.py` to use python3 test helpers and move it
…generate-iscsi-iqn `scripts/generate-iscsi-iqn`: Update the inline Python script for Python3
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Signed-off-by: Bernhard Kaindl <bernhard.kaindl@cloud.com>
Tracing shouldn't use locks, because we use it to find hot spots in the application. And introducing a single global lock might introduce lock contention that wouldn't otherwise be present. Use atomic operations on immutable data structures instead. Although a Map is O(log n), and not O(1) like a hashtable, it doesn't require holding any locks to traverse it, or update it. We only need to do an atomic compare-and-set operation once we've finished updating it, and if we raced with anyone (unlikely on OCaml4, unless you got interrupted by the tick thread), then try again with a backoff. We shouldn't hand roll atomic data structures like this, but instead use Saturn (Skip lists), or Kcas (which has a generic [update] function that implements the above method and also works on OCaml 5). before: 3827.092437 ns/run (confidence: 4275.705106 to 3550.511099); after: 2727.247326 ns/run (confidence: 3019.854167 to 2582.316754); Note: when benchmarking ensure that /sys/devices/system/clocksource/clocksource0/current_clocksource is set to TSC. If set to Xen, then reading timestamps is very slow. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
A small improvement, but could be more on deep span trees. before:2727.247326 ns/run (confidence: 3019.854167 to 2582.316754) after: 2590.000000 ns/run(confidence: 3362.115942 to 2068.393072); Signed-off-by: Edwin Török <edwin.torok@cloud.com>
And print the failed ID. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
The following environment variables can now be used by tests: * QCHECK_LONG_FACTOR (default 10): multiplies the default iteration count of 100 * QCHECK_SEED: an integer seed (the tests print their current seed on startup otherwise, can be used to reproduce a failure) These environment variables are already supported by the QCheck framework, we just need to use them. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Required for merge queues, so we have some tests that run in the merge queue: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Required for merge queues, so we have some tests that run in the merge queue: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue#triggering-merge-group-checks-with-github-actions
On behalf of Gang Ji <gang.ji@citrix.com>, I, Pau Ruiz Safont <pau.ruizsafont@cloud.com>, hereby add my Signed-off-by to this commit: e40fe60 Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
There's a commit on master that makes the DCO check to fail on all master commits. Since the issue seems to be an email mismatch, we can fix this easily.
The following environment variables can now be used by tests: * QCHECK_LONG_FACTOR (default 10): multiplies the default iteration count of 100 * QCHECK_SEED: an integer seed (the tests print their current seed on startup otherwise, can be used to reproduce a failure) These environment variables are already supported by the QCheck framework, we just need to use them.
) Last year, an argument to determine the group id of the certificates created was added. This broke test clients, and there's no need no make it compulsory since there's a default value: -1. Modify the binary so it can accept only 3 arguments like before, and change the help output to mention the group id. This makes quite a few internal certificate tests to pass compare the previous suite run 204647 to the new 204840
This allows to maintain sub-second precision, and only convert to datetimes when converting to strings. Now timezone conversion is more explicit. Datetimes without timezone are assumed to be UTC. While rare in practice, this is not safe in general, so print a warning to stdout whenever this happens. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Datetimes received as call parameters can lack timezone. In that case the host assumed the datetime is in UTC. This is not always safe to do, but returning an error will break clients. Instead keep doing the assumption and enforce proper documentation of the parameters in the datamodel. Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
Previously the dates without timezone were assumed to use UTC when being converted to unix time, but the datatype maintained the lack of timezone when being printed. This means that the problematic timezoneless timestamps were kept and could be produced by the hosts. Since the dates are assumed to be UTC, add the timezone when they are parsed for the first time. Now their timezone is displayed at all times. The output of Localtime is kept without a timezone because clients are not prepared to accept arbitrary timezones in dates. (Both the SDK and XO(Lite)) Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
OCamlformat will correctly format new features only when it's aware the OCaml version it's outputting provides them. This allows to improve formatting for let-punning, punned labelled arguments, and more: https://github.com/ocaml-ppx/ocamlformat/blob/main/CHANGES.md#added-8 Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com>
…ion (xapi-project#5950) The Date module has a very convoluted way to deal with timezones because of its historic clients. While we can't remove all the issues, we can remove most of them. - Dates (timestamps, really) without timezones do not contain a frame of reference, and hence placing them in the UTC timeline can't be done accurately in the general case. This means that comparing timestamps gives incorrect results. - XMLRPC enforces dates to be encoded in ISO8601format. This means timestamps can lack a timezone. - Xapi uses xmlrpc's dates, adding this unfortunate limitation. - While Date allows these timestamps, it assumes that these are in the UTC timezone. On top of that, it refuses to process any timestamps that are in any timezone other than UTC (`Z`) - The Date module tries really hard to hide the timezone of timestamps that lack it when printing them. This means that timezoneless timestamps can persist in the database, for no good reason, as they are treated as UTC timestamps. - Host.localtime is the only call that returns timezoneless timestamps, all the other calls correctly return timestamps in the UTC timezone. Because the call on purpose does not want to return the current time in UTC, changing this might break clients not ready to process any timezone, mainly SDK-built ones xapi-project#5802 - Dates are stored as a tuple of date, hour, minutes, seconds. This has very limited precision, which might be unexpected. This PR does the following mitigation / fixes: - Document all calls with Datetimes as parameters that the timestamps will be interpreted as UTC ones if they miss the timezone. - Remove the limitation to process any valid timezone - Timezoneless timstamps are immediately processed as UTC timestamps, as refusing these timestamps can break clients. Issues that the PR does not fix - Host.localtime produces timestamps without timezones, this is needed as adding non-zero timestamps breaks the SDK clients. - The server does not reject timezoneless timestamp, this might break migrations, RPUs, or normal clients, so I've held back on this change. - Printed timestamp do _not_ retain sub-second precision Drafting as tests are ongoing
OCamlformat will correctly format new features only when it's aware the OCaml version it's outputting provides them. This allows to improve formatting for let-punning, punned labelled arguments, and more: https://github.com/ocaml-ppx/ocamlformat/blob/main/CHANGES.md#added-8
max-spans=1000 is small for a VM.start, now that we have more instrumentation. With the other optimizations in this series of commits it should now be safe to increase the number of spans, we no longer perform O(n^2) operations on them. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
clock now uses xapi-log, so it needs to have the dependency declared in opam Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
clock now uses xapi-log, so it needs to have the dependency declared in opam
We notice that required CI checks cancel themselves even when attempting to merge a single PR at a time. That is probably because there are CI jobs run on both 'push' and 'merge_group'. Try to make the concurrency group more unique by adding the github event name to the group key. Signed-off-by: Edwin Török <edwin.torok@cloud.com>
We notice that required CI checks cancel themselves even when attempting to merge a single PR at a time. That is probably because there are CI jobs run on both 'push' and 'merge_group'. Try to make the concurrency group more unique by adding the github event name to the group key.
We've noticed some delays when tracing is enabled, and suspected it is
due to lock contention or GC activity created by the tracing module
itself.
I've added some benchmarks to measure the overhead of tracing, and made
some improvements (more improvements are possible).
The benchmarks test both a situation with no workload, and a situation
with 2 other threads: one running the same workload as the function
under test, and another that runs a very GC intensive workload.
Improved:
* when max spans limit is hit: don't spam syslog. Time goes down from
202 *milli*seconds to 14 *micro*seconds per span
* reduce lock contention: eliminate one global lock, replace with atomic
ops
* reduce allocation
We are now at ~9µs/nested span on this particular machine, but further
improvements are possible by delaying work to the exporter thread and
finishing really quickly otherwise (e.g. upstream ocaml-trace claims
~60ns/span).
When tracing is disabled the overhead is very small, just like before
(on the order of 20ns without a workload).
Before:
```
Running benchmarks (no workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │             0.0000 mjw/run│          1187.0000 mnw/run│       96214066.4652 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            14.0000 mnw/run│             19.0061 ns/run│
│  tracing/overhead(on, create span)  │             2.9250 mjw/run│           496.3888 mnw/run│          25431.5210 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            14.0000 mnw/run│             18.8063 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯
tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 18.806282 (confidence: 20.597860 to 16.986898);
   r² = Some 0.63699 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 96214066.465201 (confidence: 125336685.666667 to 70136316.380165);
   r² = Some 0.518256 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 19.006133 (confidence: 20.619355 to 17.390492);
   r² = Some 0.667542 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 25431.521030 (confidence: 37607.254799 to 15151.227932);
   r² = Some 0.0329005 }
Running benchmarks (workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │         80934.7363 mjw/run│      10580588.4725 mnw/run│      202271848.5714 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            16.5479 mnw/run│            212.0958 ns/run│
│  tracing/overhead(on, create span)  │             0.0000 mjw/run│           503.4400 mnw/run│          15633.2400 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            18.3256 mnw/run│            326.9568 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯
tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 326.956811; r² = Some -9.40099 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 202271848.571429; r² = Some 0.065489 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 212.095829; r² = Some -4.46794 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 15633.240000; r² = Some 0.805726 }
```
After:
```
Running benchmarks (no workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │             0.0964 mjw/run│           379.0196 mnw/run│          14479.0371 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            14.0000 mnw/run│             18.2174 ns/run│
│  tracing/overhead(on, create span)  │             0.0000 mjw/run│           397.3440 mnw/run│          14712.1307 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            14.0000 mnw/run│             18.1771 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯
tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 18.177093 (confidence: 19.990180 to 16.374104);
   r² = Some 0.600249 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 14479.037101 (confidence: 15625.053708 to 13293.667720);
   r² = Some 0.635766 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 18.217390 (confidence: 19.845841 to 16.442527);
   r² = Some 0.642373 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 14712.130670 (confidence: 23416.444172 to 7426.891454);
   r² = Some 0.0476263 }
Running benchmarks (workloads)
╭─────────────────────────────────────┬───────────────────────────┬───────────────────────────┬───────────────────────────╮
│name                                 │  major-allocated          │  minor-allocated          │  monotonic-clock          │
├─────────────────────────────────────┼───────────────────────────┼───────────────────────────┼───────────────────────────┤
│  tracing/max span overflow          │             0.0000 mjw/run│           380.4429 mnw/run│           7658.2207 ns/run│
│  tracing/overhead(off)              │             0.0000 mjw/run│            15.3007 mnw/run│            102.2479 ns/run│
│  tracing/overhead(on, create span)  │             0.0000 mjw/run│           400.5094 mnw/run│           8657.5585 ns/run│
│  tracing/overhead(on, no span)      │             0.0000 mjw/run│            18.1333 mnw/run│            373.9794 ns/run│
╰─────────────────────────────────────┴───────────────────────────┴───────────────────────────┴───────────────────────────╯
tracing/overhead(on, no span) (ns):
 { monotonic-clock per run = 373.979447 (confidence: 435.400802 to 336.056569);
   r² = Some -1.84338 }
tracing/max span overflow (ns):
 { monotonic-clock per run = 7658.220695 (confidence: 7952.878804 to 7396.117711);
   r² = Some 0.950954 }
tracing/overhead(off) (ns):
 { monotonic-clock per run = 102.247932 (confidence: 119.364768 to 89.830417);
   r² = Some -0.607146 }
tracing/overhead(on, create span) (ns):
 { monotonic-clock per run = 8657.558458 (confidence: 8904.596470 to 8435.216348);
   r² = Some 0.956299 }
```
Draft PR, this was only unit tested so far.
    
            
                  lindig
  
            
            approved these changes
            
                
                  Sep 18, 2024 
                
            
            
          
          
            
                  mg12
  
            
            approved these changes
            
                
                  Sep 18, 2024 
                
            
            
          
          
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There is a conflict in 'dune' files, have to make PR from fork to fix, and I cannot edit the origin repo on the other PR.