Skip to content

Conversation

@ydirson
Copy link
Contributor

@ydirson ydirson commented Feb 9, 2024

No description provided.

@codecov
Copy link

codecov bot commented Feb 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e17089c) 49.07% compared to head (0defb73) 49.07%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5447   +/-   ##
=======================================
  Coverage   49.07%   49.07%           
=======================================
  Files          18       18           
  Lines        2319     2319           
=======================================
  Hits         1138     1138           
  Misses       1181     1181           
Flag Coverage Δ
python2.7 53.38% <ø> (ø)
python3.11 55.85% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vbd-xl-script=${LIBEXECDIR}/block
qemu-vif-script=${LIBEXECDIR}/qemu-vif-script
setup-vif-rules=${LIBEXECDIR}/setup-vif-rules
vif-script=${XENOPSD_LIBEXECDIR}/vif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather stop using the custom XENOPSD_LIBEXECDIR and use the standard LIBEXEC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do that. Maybe @robhoes could give some insight on why that was done initially?

Copy link
Contributor Author

@ydirson ydirson Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, there is a /usr/libexec/xapi/ hardcoded install location in the Makefile too, with that path reflected in caml code and conf files. Might warrant a separate PR too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we first fix the discrepancy, and then go for a separate PR to switch to LIBEXEC?

Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
@ydirson ydirson force-pushed the xenops-libexec-conf-fix branch from 36ba63d to 0defb73 Compare February 9, 2024 16:20
@ydirson ydirson requested a review from psafont February 12, 2024 10:38
@psafont
Copy link
Member

psafont commented Mar 4, 2024

Checked the script, it's chock-full of outdated facts :/ let's begin by merging this

@psafont psafont merged commit acf72f2 into xapi-project:master Mar 4, 2024
@psafont psafont deleted the xenops-libexec-conf-fix branch March 4, 2024 12:46
@github-actions
Copy link

github-actions bot commented Mar 4, 2024

pytype_reporter extracted 50 problem reports from pytype output

pytype_reporter: Only "Revert" commits on this PR.

Checking the revert diff:

gh pr checkout https://github.com/xapi-project/xen-api/pull/<PR-number>
REF=$(git rev-list -n 1 --before='4 weeks ago' HEAD)
git diff $REF 

diff:

diff --git a/.codecov.yml b/.codecov.yml
new file mode 100644
index 000000000..8380434a2
--- /dev/null
+++ b/.codecov.yml
@@ -0,0 +1,244 @@
+# For more configuration details:
+# https://docs.codecov.io/docs/codecov-yaml
+
+# After making edits, check if this file is valid by running:
+# curl -X POST --data-binary @.codecov.yml https://codecov.io/validate
+
+#
+# Coverage configuration
+# ----------------------
+#
+codecov:
+  #
+  # Show the Codecov status without waiting for other status to pass:
+  #
+  require_ci_to_pass: no
+  notify:
+    wait_for_ci: no
+
+github_checks:
+  #
+  # Disable adding coverage annotations to the code in the GitHub
+  # Code Review for now:
+  #
+  # - The annotations consume a lot of space in the PR code review,
+  #   and can make it hard to review files that are not covered yet.
+  #
+  # - The coverage can be visited using the Codecov link at all times.
+  #   https://app.codecov.io/gh/xapi-project/xen-api/pulls
+  #
+  # - The annotations can be hidden in GitHub PR code review by
+  #   pressing the "a" key or by deselecting the "Show comments"
+  #   checkbox but they are shown by default.
+  #
+  # - The Codecov Chrome and Firefox extension is a much nicer
+  #   way to indicate coverage:
+  #
+  #   Link: https://github.com/codecov/codecov-browser-extension
+  #
+  #   - How to enable: You need to log in to Codecov using Github.
+  #     For Firefox, enable the needed permissions:
+  #     https://github.com/codecov/codecov-browser-extension/issues/50
+  #
+  # Reference:
+  # http://docs.codecov.com/docs/common-recipe-list#disable-github-check-run-annotations
+  #
+  annotations: false
+
+
+#
+# Pull request comments:
+# ----------------------
+# This feature adds the code coverage summary as a comment on each PR.
+# See https://docs.codecov.io/docs/pull-request-comments
+# This same information is available from the Codecov checks in the PR's
+# "Checks" tab in GitHub even when this feature is disabled.
+#
+comment:
+  #
+  # Legend:
+  # "diff" is the Coverage Diff of the pull request.
+  # "files" are the files impacted by the pull request
+  # "flags" are the coverage status of the pull request
+  #
+  # For an even shorter layout, this may be used:
+  # layout: "condensed_header, diff, files, flags"
+  #
+  layout: "header, diff, files, flags"
+
+  #
+  # Only add the Codecov comment to the PR when coverage changes
+  #
+  require_changes: true
+  #
+  # The overall project coverage is secondary to the individual coverage
+  # and it is always shown in the repository at:
+  # - https://app.codecov.io/gh/xapi-project/xen-api
+  #
+  hide_project_coverage: true
+
+
+#
+# Coverage limits and display details:
+# ------------------------------------
+#
+coverage:
+
+  #
+  # Number of precision digits when showing coverage percentage e.g. 82.1%:
+  #
+  precision: 1
+
+  #
+  # Commit status checks and display:
+  # ---------------------------------
+  # https://docs.codecov.io/docs/commit-status
+  #
+  # target: Fail the PR if coverage is below that
+  # threshold: Allow reducing coverage by this amount
+  #
+  # - The values added are a very generous, friendly limit to not block most PRs
+  #
+  # - XAPI maintainers may tighten these screws more to require better tests
+  #
+  status: # global coverage status and limits
+
+    #
+    # Patch limits
+    # ------------
+    # These checks look at only the diff of the PR as basis for them.
+    #
+    patch:
+      scripts:
+
+        #
+        # The scripts limit applies to:
+        # -----------------------------
+        #
+        # - scripts/**
+        # - excluding: **/test_*.py
+        #
+        paths: ["scripts/**", "!**/test_*.py"]
+
+        #
+        # For scripts/** (excluding tests):
+        #
+        # For scripts, coverage should not be reduced compared to its base:
+        #
+        target: auto
+
+        #
+        # Exception: the threshold value given is allowed
+        #
+        # Allows for not covering 20% if the changed lines of the PR:
+        #
+        threshold: 20%
+
+      ocaml:
+        #
+        # The ocaml limit applies to:
+        # -----------------------------
+        #
+        # - ocaml/**
+        # - excluding: **/test_*.py
+        #
+        paths: ["ocaml/**", "!**/test_*.py"]
+
+        #
+        # For scripts/** (excluding tests):
+        #
+        # For scripts, coverage should not be reduced compared to its base:
+        #
+        target: auto
+
+        #
+        # Exception: the threshold value given is allowed
+        #
+        # Allows for not covering 20% if the changed lines of the PR:
+        #
+        threshold: 20%
+
+      # Checks each Python version separately:
+      python-3.11:
+        flags: ["python3.11"]
+      python-2.7:
+        flags: ["python2.7"]
+
+    #
+    # Project limits
+    # --------------
+    # These checks are relative to all code, not the changes (not the diff of the PR)
+    #
+    project:
+
+      #
+      # Python modules and scripts below scripts/ (excluding tests)
+      #
+      scripts:
+        target: 48%
+        threshold: 2%
+        paths: ["scripts/**", "!**/test_*.py"]
+
+      #
+      # Python modules and scripts below ocaml/
+      #
+      ocaml:
+        paths: ["ocaml/**", "!**/test_*.py"]
+        target: 51%
+        threshold: 3%
+
+      #
+      # Test files
+      #
+      tests:
+        # Ensure that all tests are executed (tests themselves must be 100% covered)
+        target: 98%
+        paths: ["**/test_*.py"]
+
+
+#
+# Components:
+# -----------
+# Components can be selected in the Codecov Web interface then looking at one PR:
+# https://app.codecov.io/gh/xapi-project/xen-api/pulls
+#
+component_management:
+
+  default_rules:  # default rules that will be inherited by all components
+    statuses:
+
+      - type: project
+        # `auto` will use the coverage from the base commit (pull request base
+        # or parent commit) coverage to compare against.
+        target: auto
+        threshold: 2%
+
+      - type: patch
+        target: auto
+        threshold: 10%
+
+  individual_components:
+
+    - component_id: scripts  # this is an identifier that should not be changed
+      name: scripts  # this is a display name, and can be changed freely
+      # The list of paths that should be in- and excluded in this component:
+      paths: ["scripts/**", "!scripts/examples/**", "!**/test_*.py"]
+
+    - component_id: scripts/examples
+      name: scripts/examples
+      paths: ["scripts/examples/**", "!scripts/**/test_*.py"]
+
+    - component_id: ocaml
+      name: ocaml
+      paths: ["ocaml/**", "!**/test_*.py"]
+
+    - component_id: ocaml/xapi-storage
+      name: ocaml/xapi-storage
+      paths:
+        - "ocaml/xapi-storage/**"
+        - "ocaml/xapi-storage-script/**"
+        - "!**/test_*.py"
+
+    - component_id: test_cases
+      name: test_cases
+      paths: ["**/test_*.py"]
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index cb1e69e78..7f52439cd 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -12,6 +12,28 @@ concurrency:  # On new push, cancel old workflows from the same PR, branch or ta
   cancel-in-progress: true
 
 jobs:
+  # https://www.shellcheck.net/wiki/GitHub-Actions
+  # https://github.com/redhat-plumbers-in-action/differential-shellcheck?tab=readme-ov-file#usage
+  shell-test:
+    name: Differential ShellCheck
+    runs-on: ubuntu-latest
+
+    permissions:
+      security-events: write
+    
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+        with:
+            fetch-depth: 0
+
+# If needed severity levels can be controlled here
+#            severity: warning
+      - name: Differential ShellCheck
+        uses: redhat-plumbers-in-action/differential-shellcheck@v5
+        with:
+            token: ${{ secrets.GITHUB_TOKEN }}
+    
   python-test:
     name: Python tests
     runs-on: ubuntu-22.04
@@ -54,11 +76,13 @@ jobs:
         run: pip install pandas pytype toml
 
       - name: Install common dependencies for Python ${{matrix.python-version}}
-        run: pip install mock pytest-coverage pytest-mock
+        run: pip install future mock pytest-coverage pytest-mock
 
-      - name: Run Pytest tests for Python ${{matrix.python-version}}
+      - name: Run Pytest and get code coverage for Codecov
         run: >
-          pytest --cov scripts scripts/ -vv -rA
+          pytest
+          --cov=scripts --cov=ocaml/xcp-rrdd
+          scripts/ ocaml/xcp-rrdd -vv -rA
           --junitxml=.git/pytest${{matrix.python-version}}.xml
           --cov-report term-missing
           --cov-report xml:.git/coverage${{matrix.python-version}}.xml
@@ -89,6 +113,8 @@ jobs:
         run: ./pytype_reporter.py
         env:
           PR_NUMBER: ${{ github.event.number }}
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          PYTYPE_REPORTER_DEBUG: True
 
       # Try to add pytype_report.py's summary file as a comment to the PR:
       # Documentation: https://github.com/marketplace/actions/add-pr-comment
@@ -201,9 +227,6 @@ jobs:
       - name: quality-gate
         run: make quality-gate
 
-      - uses: reviewdog/action-actionlint@v1
-        name: GitHub Action linter from https://github.com/reviewdog/action-actionlint
-
       - name: pyflakes
         uses: reviewdog/action-pyflakes@master
         with:
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 9ad3b5059..0ca5ef37f 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -41,6 +41,7 @@ repos:
     -   id: pytype
         name: pytype
         entry: python3 pytype_reporter.py
+        pass_filenames: false
         types: [python]
         stages: [push]
         verbose: true
diff --git a/Makefile b/Makefile
index f30a39513..bcfc5b9eb 100644
--- a/Makefile
+++ b/Makefile
@@ -62,7 +62,9 @@ test:
 	 trap "kill $${PSTREE_SLEEP_PID}" SIGINT SIGTERM EXIT; \
 	 timeout --foreground $(TEST_TIMEOUT2) \
 		 dune runtest --profile=$(PROFILE) --error-reporting=twice -j $(JOBS)
+ifneq ($(PY_TEST), NO)
 	dune build @runtest-python --profile=$(PROFILE)
+endif
 
 stresstest:
 	dune build @stresstest --profile=$(PROFILE) --no-buffer -j $(JOBS)
@@ -115,6 +117,10 @@ sdksanity: sdk
 	sed -i 's/FriendlyErrorNames.ResourceManager/null/g' ./_build/install/default/xapi/sdk/csharp/src/Failure.cs
 	cd _build/install/default/xapi/sdk/csharp/src && dotnet add package Newtonsoft.Json && dotnet build -f netstandard2.0
 
+.PHONY: sdk-build-java
+
+sdk-build-java: sdk
+	cd _build/install/default/xapi/sdk/java && mvn -f xen-api/pom.xml -B clean package install -Drevision=0.0
 
 python:
 	$(MAKE) -C scripts/examples/python build
diff --git a/doc/content/toolstack/high-level/daemons.md b/doc/content/toolstack/high-level/daemons.md
index e00c2fc28..103798bb0 100644
--- a/doc/content/toolstack/high-level/daemons.md
+++ b/doc/content/toolstack/high-level/daemons.md
@@ -34,6 +34,9 @@ xapi-storage-script
 message-switch
 : exchanges messages between the daemons on a host
 
+xapi-guard
+: forwards uefi and vtpm persistence calls from domains to xapi
+
 v6d
 : controls which features are enabled.
 
diff --git a/doc/content/xapi-guard/_index.md b/doc/content/xapi-guard/_index.md
new file mode 100644
index 000000000..433f92f9d
--- /dev/null
+++ b/doc/content/xapi-guard/_index.md
@@ -0,0 +1,98 @@
++++
+title = "Xapi-guard"
+weight = 50
++++
+
+The `xapi-guard` daemon is the component in the xapi toolstack that is responsible for handling persistence requests from VMs (domains).
+Currently these are UEFI vars and vTPM updates.
+
+The code is in `ocaml/xapi-guard`.
+When the daemon managed only with UEFI updates it was called `varstored-guard`.
+Some files and package names still use the previous name.
+
+Principles
+----------
+1. Calls from domains must be limited in privilege to do certain API calls, and
+   to read and write from their corresponding VM in xapi's database only.
+2. Xenopsd is able to control xapi-guard through message switch, this access is
+   not limited.
+3. Listening to domain socket is restored whenever the daemon restarts to minimize disruption of running domains.
+4. Disruptions to requests when xapi is unavailable is minimized.
+   The startup procedure is not blocked by the availability of xapi, and write requests from domains must not fail because xapi is unavailable.
+
+
+Overview
+--------
+
+Xapi-guard forwards calls from domains to xapi to persist UEFI variables, and update vTPMs.
+To do this, it listens to 1 socket per service (varstored, or swtpm) per domain.
+To create these sockets before the domains are running, it listens to a message-switch socket.
+This socket listens to calls from xenopsd, which orchestrates the domain creation.
+
+To protect the domains from xapi being unavailable transiently, xapi-guard provides an on-disk cache for vTPM writes.
+This cache acts as a buffer and stores the requests temporarily until xapi can be contacted again.
+This situation usually happens when xapi is being restarted as part of an update.
+SWTPM, the vTPM daemon, reads the contents of the TPM from xapi-guard on startup, suspend, and resume.
+During normal operation SWTPM does not send read requests from xapi-guard.
+
+Structure
+---------
+
+The cache module consists of two Lwt threads, one that writes to disk, and another one that reads from disk.
+The writer is triggered when a VM writes to the vTPM.
+It never blocks if xapi is unreachable, but responds as soon as the data has been stored either by xapi or on the local disk, such that the VM receives a timely response to the write request.
+Both try to send the requests to xapi, depending on the state, to attempt write all the cached data back to xapi, and stop using the cache.
+The threads communicate through a bounded queue, this is done to limit the amount of memory used.
+This queue is a performance optimisation, where the writer informs the reader precisely which are the names of the cache files, such that the reader does not need to list the cache directory.
+And a full queue does not mean data loss, just a loss of performance; vTPM writes are still cached.
+
+This means that the cache operates in three modes:
+- Direct: during normal operation the disk is not used at all
+- Engaged: both threads use the queue to order events
+- Disengaged: A thread dumps request to disk while the other reads the cache
+  until it's empty
+
+```mermaid
+---
+title: Cache State
+---
+stateDiagram-v2
+    Disengaged
+    note right of Disengaged
+        Writer doesn't add requests to queue
+        Reader reads from cache and tries to push to xapi
+    end note
+    Direct
+    note left of Direct
+        Writer bypasses cache, send to xapi
+        Reader waits
+    end note
+    Engaged
+    note right of Engaged
+        Writer writes to cache and adds requests to queue
+        Reader reads from queue and tries to push to xapi
+    end note
+
+    [*] --> Disengaged
+
+    Disengaged --> Disengaged : Reader pushed pending TPMs to xapi, in the meantime TPMs appeared in the cache
+    Disengaged --> Direct : Reader pushed pending TPMs to xapi, cache is empty
+
+    Direct --> Direct : Writer receives TPM, sent to xapi
+    Direct --> Engaged : Writer receives TPM, error when sent to xapi
+
+    Engaged --> Direct : Reader sent TPM to xapi, finds an empty queue
+    Engaged --> Engaged : Writer receives TPM, queue is not full
+    Engaged --> Disengaged : Writer receives TPM, queue is full
+```
+
+Startup
+------
+
+At startup, there's a dedicated routine to transform the existing contents of the cache.
+This is currently done because the timestamp reference change on each boot.
+This means that the existing contents might have timestamps considered more recent than timestamps of writes coming from running events, leading to missing content updates.
+This must be avoided and instead the updates with offending timestamps are renamed to a timestamp taken from the current timestamp, ensuring a consistent
+ordering.
+The routine is also used to keep a minimal file tree: unrecognised files are deleted, temporary files created to ensure atomic writes are left untouched, and empty directories are deleted.
+This mechanism can be changed in the future to migrate to other formats.
diff --git a/ocaml/forkexecd/lib/forkhelpers.ml b/ocaml/forkexecd/lib/forkhelpers.ml
index 0477a93c9..d55901c3c 100644
--- a/ocaml/forkexecd/lib/forkhelpers.ml
+++ b/ocaml/forkexecd/lib/forkhelpers.ml
@@ -184,7 +184,7 @@ let safe_close_and_exec ?env stdin stdout stderr
         List.fold_left maybe_add_id_to_fd_map dest_named_fds predefined_fds
       in
 
-      let env = match env with Some e -> e | None -> default_path_env_pair in
+      let env = Option.value ~default:default_path_env_pair env in
       let syslog_stdout =
         match syslog_stdout with
         | NoSyslogging ->
diff --git a/ocaml/idl/datamodel_cluster_host.ml b/ocaml/idl/datamodel_cluster_host.ml
index af10edd59..5a4522804 100644
--- a/ocaml/idl/datamodel_cluster_host.ml
+++ b/ocaml/idl/datamodel_cluster_host.ml
@@ -31,7 +31,10 @@ let create =
     ()
 
 let destroy =
-  call ~name:"destroy" ~doc:"Remove a host from an existing cluster."
+  call ~name:"destroy"
+    ~doc:
+      "Remove the host from an existing cluster. This operation is allowed \
+       even if a cluster host is not enabled."
     ~params:
       [
         ( Ref _cluster_host
@@ -117,10 +120,8 @@ let t =
            ~default_value:(Some (VBool true))
            "Whether the cluster host has joined the cluster. Contrary to \
             enabled, a host that is not joined is not considered a member of \
-            the cluster, and hence no operations (e.g. enable/disable) can be \
-            performed on this host. This field can be altered by calling leave \
-            or destroy on a cluster host. It can also be set automatically if \
-            cluster stack believes that this node is not part of the cluster. "
+            the cluster, and hence enable and disable operations cannot be \
+            performed on this host."
        ; field ~qualifier:DynamicRO ~lifecycle:[] ~ty:Bool "live"
            ~default_value:(Some (VBool false))
            "Whether the underlying cluster stack thinks we are live. This \
diff --git a/ocaml/idl/datamodel_common.ml b/ocaml/idl/datamodel_common.ml
index 00392141d..709cb5eb0 100644
--- a/ocaml/idl/datamodel_common.ml
+++ b/ocaml/idl/datamodel_common.ml
@@ -10,7 +10,7 @@ open Datamodel_roles
               to leave a gap for potential hotfixes needing to increment the schema version.*)
 let schema_major_vsn = 5
 
-let schema_minor_vsn = 774
+let schema_minor_vsn = 775
 
 (* Historical schema versions just in case this is useful later *)
 let rio_schema_major_vsn = 5
@@ -134,6 +134,10 @@ let yangtze_release_schema_major_vsn = 5
 
 let yangtze_release_schema_minor_vsn = 602
 
+let nile_release_schema_major_vsn = 5
+
+let nile_release_schema_minor_vsn = 775
+
 (* List of tech-preview releases. Fields in these releases are not guaranteed to be retained when
  * upgrading to a full release. *)
 let tech_preview_releases =
diff --git a/ocaml/idl/datamodel_lifecycle.ml b/ocaml/idl/datamodel_lifecycle.ml
index 11abd8f9c..28ff931ec 100644
--- a/ocaml/idl/datamodel_lifecycle.ml
+++ b/ocaml/idl/datamodel_lifecycle.ml
@@ -39,6 +39,12 @@ let prototyped_of_field = function
       Some "22.26.0"
   | "VTPM", "persistence_backend" ->
       Some "22.26.0"
+  | "host", "last_update_hash" ->
+      Some "24.10.0"
+  | "host", "pending_guidances_full" ->
+      Some "24.10.0"
+  | "host", "pending_guidances_recommended" ->
+      Some "24.10.0"
   | "host", "numa_affinity_policy" ->
       Some "24.0.0"
   | "host", "latest_synced_updates_applied" ->
@@ -49,6 +55,10 @@ let prototyped_of_field = function
       Some "22.27.0"
   | "host", "last_software_update" ->
       Some "22.20.0"
+  | "VM", "pending_guidances_full" ->
+      Some "24.10.0"
+  | "VM", "pending_guidances_recommended" ->
+      Some "24.10.0"
   | "VM", "recommended_guidances" ->
       Some "23.18.0"
   | "VM", "actions__after_softreboot" ->
@@ -109,6 +119,8 @@ let prototyped_of_message = function
       Some "22.26.0"
   | "VTPM", "create" ->
       Some "22.26.0"
+  | "host", "emergency_clear_mandatory_guidance" ->
+      Some "24.10.0"
   | "host", "apply_recommended_guidances" ->
       Some "23.18.0"
   | "host", "set_https_only" ->
diff --git a/ocaml/idl/datamodel_types.ml b/ocaml/idl/datamodel_types.ml
index 346e5bd41..364eafb42 100644
--- a/ocaml/idl/datamodel_types.ml
+++ b/ocaml/idl/datamodel_types.ml
@@ -97,6 +97,10 @@ let rel_stockholm = "stockholm"
 
 let rel_stockholm_psr = "stockholm_psr"
 
+let rel_nile_preview = "nile-preview"
+
+let rel_nile = "nile"
+
 type api_release = {
     code_name: string option
   ; version_major: int
@@ -334,6 +338,20 @@ let release_order_full =
     ; branding= "Citrix Hypervisor 8.2 Hotfix 2"
     ; release_date= Some "November 2020"
     }
+  ; {
+      code_name= Some rel_nile_preview
+    ; version_major= 2
+    ; version_minor= 20
+    ; branding= "XenServer 8 Preview"
+    ; release_date= Some "August 2023"
+    }
+  ; {
+      code_name= Some rel_nile
+    ; version_major= 2
+    ; version_minor= 21
+    ; branding= "XenServer 8"
+    ; release_date= None
+    }
   ]
 (* When you add a new release, use the version number of the latest release, "Unreleased"
    for the branding, and Some "" for the release date, until the actual values are finalised. *)
diff --git a/ocaml/idl/datamodel_vm.ml b/ocaml/idl/datamodel_vm.ml
index 826d7fb46..c1a6b9a7d 100644
--- a/ocaml/idl/datamodel_vm.ml
+++ b/ocaml/idl/datamodel_vm.ml
@@ -1687,8 +1687,9 @@ let set_NVRAM_EFI_variables =
     ~hide_from_docs:true ~allowed_roles:_R_LOCAL_ROOT_ONLY ()
 
 let restart_device_models =
-  call ~flags:[`Session] ~name:"restart_device_models" ~lifecycle:[]
+  call ~name:"restart_device_models" ~lifecycle:[]
     ~params:[(Ref _vm, "self", "The VM")]
+    ~doc:"Restart device models of the VM"
     ~errs:
       [
         Api_errors.vm_bad_power_state
diff --git a/ocaml/idl/schematest.ml b/ocaml/idl/schematest.ml
index b3f86ab2d..c8e5972c9 100644
--- a/ocaml/idl/schematest.ml
+++ b/ocaml/idl/schematest.ml
@@ -3,7 +3,7 @@ let hash x = Digest.string x |> Digest.to_hex
 (* BEWARE: if this changes, check that schema has been bumped accordingly in
    ocaml/idl/datamodel_common.ml, usually schema_minor_vsn *)
 
-let last_known_schema_hash = "5b597309a69b2c7b9053446fa338c701"
+let last_known_schema_hash = "186131ad48f40dff30246e8e0c0dbf0a"
 
 let current_schema_hash : string =
   let open Datamodel_types in
diff --git a/ocaml/libs/xapi-rrd/lib/rrd_updates.ml b/ocaml/libs/xapi-rrd/lib/rrd_updates.ml
index e1e3a98f8..d9de5b045 100644
--- a/ocaml/libs/xapi-rrd/lib/rrd_updates.ml
+++ b/ocaml/libs/xapi-rrd/lib/rrd_updates.ml
@@ -194,9 +194,9 @@ let json_of_t t =
             ; ("rows", int (Array.length t.data))
             ; ("columns", int (Array.length t.legend))
             ; ("legend", array (map_to_list string t.legend))
-            ; ("data", array (map_to_list data_record t.data))
             ]
         )
+      ; ("data", array (map_to_list data_record t.data))
       ]
   in
   Yojson.to_string meta
diff --git a/ocaml/networkd/bin/network_server.ml b/ocaml/networkd/bin/network_server.ml
index 2aa2ac94f..d31d256ef 100644
--- a/ocaml/networkd/bin/network_server.ml
+++ b/ocaml/networkd/bin/network_server.ml
@@ -103,7 +103,7 @@ let set_dns_interface _dbg name =
  * constitutes adding a VLAN0 Linux device to strip those headers again.
  *)
 let need_enic_workaround () =
-  !backend_kind = Bridge && List.mem "enic" (Sysfs.list_drivers ())
+  !backend_kind = Bridge && List.mem "enic" (Sysfs.list_pci_drivers ())
 
 module Sriov = struct
   open S.Sriov
diff --git a/ocaml/networkd/lib/network_utils.ml b/ocaml/networkd/lib/network_utils.ml
index 9fe4be994..1c7479e83 100644
--- a/ocaml/networkd/lib/network_utils.ml
+++ b/ocaml/networkd/lib/network_utils.ml
@@ -143,10 +143,10 @@ let fork_script ?on_error ?log script args =
   check_n_run ?on_error ?log fork_script_internal script args
 
 module Sysfs = struct
-  let list_drivers () =
+  let list_pci_drivers () =
     try Array.to_list (Sys.readdir "/sys/bus/pci/drivers")
     with _ ->
-      warn "Failed to obtain list of drivers from sysfs" ;
+      warn "Failed to obtain list of PCI drivers from sysfs" ;
       []
 
   let getpath dev attr = Printf.sprintf "/sys/class/net/%s/%s" dev attr
@@ -225,15 +225,15 @@ module Sysfs = struct
 
   let get_pci_ids name =
     let read_id_from path =
-      try
-        let l = read_one_line path in
+      let l = path |> Unixext.string_of_file |> String.trim in
       (* trim 0x *)
       String.sub l 2 (String.length l - 2)
-      with _ -> ""
     in
+    try
       ( read_id_from (getpath name "device/vendor")
       , read_id_from (getpath name "device/device")
       )
+    with _ -> ("", "")
 
   (** Returns the name of the driver for network device [dev] *)
   let get_driver_name dev =
@@ -1278,25 +1278,29 @@ module Ovs = struct
 
     let get_bond_link_status name =
       try
+        (* Note: bond links are called "members" by the OVS. In old OVS
+           versions, the term "slaves" was used, which is also used by the
+           Linux kernel and in xapi. The terms bond link/slave/member are
+           used interchangably. *)
         let raw = appctl ~log:false ["bond/show"; name] in
         let lines = Astring.String.cuts ~empty:false ~sep:"\n" raw in
         List.fold_left
-          (fun (slaves, active_slave) line ->
-            let slaves =
+          (fun (members, active_member) line ->
+            let members =
               try
-                Scanf.sscanf line "slave %s@: %s" (fun slave state ->
-                    (slave, state = "enabled") :: slaves
+                Scanf.sscanf line "member %s@: %s" (fun member state ->
+                    (member, state = "enabled") :: members
                 )
-              with _ -> slaves
+              with _ -> members
             in
-            let active_slave =
+            let active_member =
               try
-                Scanf.sscanf line "active slave %s@(%s@)" (fun _ slave ->
-                    Some slave
+                Scanf.sscanf line "active member %s@(%s@)" (fun _ member ->
+                    Some member
                 )
-              with _ -> active_slave
+              with _ -> active_member
             in
-            (slaves, active_slave)
+            (members, active_member)
           )
           ([], None) lines
       with _ -> ([], None)
diff --git a/ocaml/sdk-gen/csharp/autogen/src/JsonRpc.cs b/ocaml/sdk-gen/csharp/autogen/src/JsonRpc.cs
index 4b3641227..71c9ea81f 100644
--- a/ocaml/sdk-gen/csharp/autogen/src/JsonRpc.cs
+++ b/ocaml/sdk-gen/csharp/autogen/src/JsonRpc.cs
@@ -181,6 +181,8 @@ namespace XenAPI
         public bool PreAuthenticate { get; set; }
         public CookieContainer Cookies { get; set; }
         public RemoteCertificateValidationCallback ServerCertificateValidationCallback { get; set; }
+        public Dictionary<string, string> RequestHeaders { get; set; }
+        public Dictionary<string, string> ResponseHeaders { get; set; }
 
         public string Url { get; private set; }
 
@@ -285,14 +287,32 @@ namespace XenAPI
             webRequest.CookieContainer = Cookies ?? webRequest.CookieContainer ?? new CookieContainer();
             webRequest.ServerCertificateValidationCallback = ServerCertificateValidationCallback ?? ServicePointManager.ServerCertificateValidationCallback;
 
+            if (RequestHeaders != null)
+            {
+                foreach (var header in RequestHeaders)
+                    webRequest.Headers.Add(header.Key, header.Value);
+            }
+
             using (var str = webRequest.GetRequestStream())
             {
                 postStream.CopyTo(str);
                 str.Flush();
             }
 
-            using (var webResponse = (HttpWebResponse)webRequest.GetResponse())
+            HttpWebResponse webResponse = null;
+            try
+            {
+                webResponse = (HttpWebResponse)webRequest.GetResponse();
+
+                ResponseHeaders = new Dictionary<string, string>();
+
+                if (webResponse.Headers != null)
                 {
+                    var keys = webResponse.Headers.AllKeys;
+                    foreach (var key in keys)
+                        ResponseHeaders.Add(key, string.Join(",", webResponse.Headers.Get(key)));
+                }
+
                 if (webResponse.StatusCode != HttpStatusCode.OK)
                     throw new WebException(webResponse.StatusCode.ToString());
 
@@ -305,6 +325,11 @@ namespace XenAPI
                     responseStream.Flush();
                 }
             }
+            finally
+            {
+                RequestHeaders = null;
+                webResponse?.Dispose();
+            }
         }
 
         private JsonSerializerSettings CreateSettings(IList<JsonConverter> converters)
diff --git a/ocaml/sdk-gen/csharp/autogen/src/Session.cs b/ocaml/sdk-gen/csharp/autogen/src/Session.cs
index d78915a90..a2aef1d67 100644
--- a/ocaml/sdk-gen/csharp/autogen/src/Session.cs
+++ b/ocaml/sdk-gen/csharp/autogen/src/Session.cs
@@ -279,6 +279,25 @@ namespace XenAPI
 
         public ICredentials Credentials => JsonRpcClient?.WebProxy?.Credentials;
 
+        /// <summary>
+        /// Optional headers in name-value pairs to be passed in the HttpWebRequests. The
+        /// default value is null. This property can be set by the implementing code before
+        /// each request. It is automatically reset to null once the request has been sent.
+        /// </summary>
+        public Dictionary<string, string> RequestHeaders
+        {
+            set => JsonRpcClient.RequestHeaders = value;
+            get => JsonRpcClient.RequestHeaders;
+        }
+
+        /// <summary>
+        /// Exposes the headers returned in the HttpWebResponses in name-value pairs.
+        /// This property is set once a response is received. The values are comma
+        /// separated strings of header values stored in a header.
+        /// It returns an empty dictionary if no headers are found.
+        /// </summary>
+        public Dictionary<string, string> ResponseHeaders => JsonRpcClient.ResponseHeaders;
+
         /// <summary>
         /// Always true before API version 1.6.
         /// Filled in after successful session_login_with_password for 1.6 or newer connections
diff --git a/ocaml/sdk-gen/csharp/templates/Message2.mustache b/ocaml/sdk-gen/csharp/templates/Message2.mustache
index 0cf943f17..4661d8151 100644
--- a/ocaml/sdk-gen/csharp/templates/Message2.mustache
+++ b/ocaml/sdk-gen/csharp/templates/Message2.mustache
@@ -38,6 +38,7 @@ namespace XenAPI
             UPDATES_FEATURE_EXPIRING_WARNING,
             UPDATES_FEATURE_EXPIRING_MAJOR,
             UPDATES_FEATURE_EXPIRING_CRITICAL,
+            GFS2_CAPACITY,
             LEAF_COALESCE_START_MESSAGE,
             LEAF_COALESCE_COMPLETED,
             LEAF_COALESCE_FAILED,
@@ -62,6 +63,8 @@ namespace XenAPI
                         return MessageType.UPDATES_FEATURE_EXPIRING_MAJOR;
                     case "UPDATES_FEATURE_EXPIRING_CRITICAL":
                         return MessageType.UPDATES_FEATURE_EXPIRING_CRITICAL;
+                    case "GFS2_CAPACITY":
+                        return MessageType.GFS2_CAPACITY;
                     case "LEAF_COALESCE_START_MESSAGE":
                         return MessageType.LEAF_COALESCE_START_MESSAGE;
                     case "LEAF_COALESCE_COMPLETED":
diff --git a/ocaml/tests/common/test_common.ml b/ocaml/tests/common/test_common.ml
index 786a6eaca..d1433868f 100644
--- a/ocaml/tests/common/test_common.ml
+++ b/ocaml/tests/common/test_common.ml
@@ -42,6 +42,7 @@ let default_cpu_info =
      * localhost to be counted *)
     ("cpu_count", "0")
   ; ("socket_count", "0")
+  ; ("threads_per_core", "0")
   ; ("vendor", "Abacus")
   ; ("speed", "")
   ; ("modelname", "")
@@ -77,6 +78,7 @@ let make_localhost ~__context ?(features = Features.all_features) () =
           {
             cpu_count= 1
           ; socket_count= 1
+          ; threads_per_core= 1
           ; vendor= ""
           ; speed= ""
           ; modelname= ""
diff --git a/ocaml/xapi-cli-protocol/cli_protocol.ml b/ocaml/xapi-cli-protocol/cli_protocol.ml
index 261bc11b1..bf58263ea 100644
--- a/ocaml/xapi-cli-protocol/cli_protocol.ml
+++ b/ocaml/xapi-cli-protocol/cli_protocol.ml
@@ -26,6 +26,13 @@ let minor = 2
     a totally different kind of server (eg a standard HTTP server) *)
 let prefix = "XenSource thin CLI protocol"
 
+(* Be careful to add/remove/modify the commands. The CLI interface is expected
+   to be extreme stable. Please keep following in mind when doing the changes:
+     1. backwards compatibility support. E.g. a very old CLI client may need to
+        be supported still,
+     2. a new command should be one for general purpose only rather than for a
+        specific usage. *)
+
 (** Command sent by the server to the client.
     If the command is "Save" then the server waits for "OK" from the client
     and then streams a list of data chunks to the client. *)
@@ -34,8 +41,6 @@ type command =
   | Debug of string (* debug message to optionally display *)
   | Load of string (* filename *)
   | HttpGet of string * string (* filename * path *)
-  | PrintHttpGetJson of string (* path *)
-  | PrintUpdateGuidance of string (* path *)
   | HttpPut of string * string (* filename * path *)
   | HttpConnect of string (* path *)
   | Prompt (* request the user enter some text *)
@@ -68,10 +73,6 @@ let string_of_command = function
       "Load " ^ x
   | HttpGet (filename, path) ->
       "HttpGet " ^ path ^ " -> " ^ filename
-  | PrintHttpGetJson path ->
-      "PrintHttpGetJson " ^ path ^ " -> stdout"
-  | PrintUpdateGuidance path ->
-      "PrintUpdateGuidance " ^ path ^ " -> stdout"
   | HttpPut (filename, path) ->
       "HttpPut " ^ path ^ " -> " ^ filename
   | HttpConnect path ->
@@ -161,7 +162,7 @@ let unmarshal_list pos f =
 (*****************************************************************************)
 (* Marshal/Unmarshal higher-level messages                                   *)
 
-(* Highest command id: 19 *)
+(* Highest command id: 17 *)
 
 let marshal_command = function
   | Print x ->
@@ -172,10 +173,6 @@ let marshal_command = function
       marshal_int 1 ^ marshal_string x
   | HttpGet (a, b) ->
       marshal_int 12 ^ marshal_string a ^ marshal_string b
-  | PrintHttpGetJson a ->
-      marshal_int 18 ^ marshal_string a
-  | PrintUpdateGuidance a ->
-      marshal_int 19 ^ marshal_string a
   | HttpPut (a, b) ->
       marshal_int 13 ^ marshal_string a ^ marshal_string b
   | HttpConnect a ->
@@ -226,12 +223,6 @@ let unmarshal_command pos =
   | 16 ->
       let body, pos = unmarshal_string pos in
       (PrintStderr body, pos)
-  | 18 ->
-      let a, pos = unmarshal_string pos in
-      (PrintHttpGetJson a, pos)
-  | 19 ->
-      let a, pos = unmarshal_string pos in
-      (PrintUpdateGuidance a, pos)
   | n ->
       raise (Unknown_tag ("command", n))
 
diff --git a/ocaml/xapi-cli-server/cli_frontend.ml b/ocaml/xapi-cli-server/cli_frontend.ml
index 55d884db9..f8aa043eb 100644
--- a/ocaml/xapi-cli-server/cli_frontend.ml
+++ b/ocaml/xapi-cli-server/cli_frontend.ml
@@ -1032,7 +1032,7 @@ let rec cmdtable_data : (string * cmd_spec) list =
         reqd= ["hash"]
       ; optn= []
       ; help= "Apply updates from enabled repository on specified host."
-      ; implementation= With_fd Cli_operations.host_apply_updates
+      ; implementation= No_fd Cli_operations.host_apply_updates
       ; flags= [Host_selectors]
       }
     )
@@ -1047,15 +1047,6 @@ let rec cmdtable_data : (string * cmd_spec) list =
       ; flags= [Neverforward]
       }
     )
-  ; ( "host-updates-show-available"
-    , {
-        reqd= []
-      ; optn= []
-      ; help= "Show available updates for a specified host."
-      ; implementation= With_fd Cli_operations.host_updates_show_available
-      ; flags= [Host_selectors]
-      }
-    )
   ; ( "patch-upload"
     , {
         reqd= ["file-name"]
diff --git a/ocaml/xapi-cli-server/cli_operations.ml b/ocaml/xapi-cli-server/cli_operations.ml
index 6766a5161..bc0d9ea30 100644
--- a/ocaml/xapi-cli-server/cli_operations.ml
+++ b/ocaml/xapi-cli-server/cli_operations.ml
@@ -5462,20 +5462,40 @@ let wait_for_task_complete rpc session_id task_id =
     Thread.delay 1.0
   done
 
-let check_task_status ?(quiet_on_success = false) ~rpc ~session_id ~task ~fd
-    ~label ~ok () =
-  (* if the client thinks it's ok, check that the server does too *)
+let download_file rpc session_id task fd filename uri label =
+  marshal fd (Command (HttpGet (filename, uri))) ;
+  let response = ref (Response Wait) in
+  while !response = Response Wait do
+    response := unmarshal fd
+  done ;
+  let ok =
+    match !response with
+    | Response OK ->
+        true
+    | Response Failed ->
+        (* Need to check whether the thin cli managed to contact the server
+           				   or not. If not, we need to mark the task as failed *)
+        if Client.Task.get_progress ~rpc ~session_id ~self:task < 0.0 then
+          Client.Task.set_status ~rpc ~session_id ~self:task ~value:`failure ;
+        false
+    | _ ->
+        false
+  in
+  wait_for_task_complete rpc session_id task ;
+  (* Check the server status -- even if the client thinks it's ok, we need
+     	   to check that the server does too. *)
   match Client.Task.get_status ~rpc ~session_id ~self:task with
-  | `success when ok && not quiet_on_success ->
-      marshal fd (Command (Print (Printf.sprintf "%s succeeded" label)))
-  | `success when ok && quiet_on_success ->
-      ()
   | `success ->
+      if ok then (
+        if filename <> "" then
+          marshal fd (Command (Print (Printf.sprintf "%s succeeded" label)))
+      ) else (
         marshal fd
           (Command
              (PrintStderr (Printf.sprintf "%s failed, unknown error.\n" label))
           ) ;
         raise (ExitWithError 1)
+      )
   | `failure ->
       let result = Client.Task.get_error_info ~rpc ~session_id ~self:task in
       if result = [] then
@@ -5493,31 +5513,6 @@ let check_task_status ?(quiet_on_success = false) ~rpc ~session_id ~task ~fd
       (* should never happen *)
       raise (ExitWithError 1)
 
-let download_file rpc session_id task fd filename uri label =
-  marshal fd (Command (HttpGet (filename, uri))) ;
-  let response = ref (Response Wait) in
-  while !response = Response Wait do
-    response := unmarshal fd
-  done ;
-  let ok =
-    match !response with
-    | Response OK ->
-        true
-    | Response Failed ->
-        (* Need to check whether the thin cli managed to contact the server
-           				   or not. If not, we need to mark the task as failed *)
-        if Client.Task.get_progress ~rpc ~session_id ~self:task < 0.0 then
-          Client.Task.set_status ~rpc ~session_id ~self:task ~value:`failure ;
-        false
-    | _ ->
-        false
-  in
-  wait_for_task_complete rpc session_id task ;
-  (* Check the server status -- even if the client thinks it's ok, we need
-     	   to check that the server does too. *)
-  let quiet_on_success = if filename = "" then true else false in
-  check_task_status ~rpc ~session_id ~task ~fd ~label ~ok ~quiet_on_success ()
-
 let download_file_with_task fd rpc session_id filename uri query label task_name
     =
   let task =
@@ -5680,17 +5675,24 @@ let vm_import fd _printer rpc session_id params =
     in
     marshal fd (Command (Print (String.concat "," uuids)))
 
-let command_in_task ~rpc ~session_id ~fd ~obj ~label ~quiet_on_success f =
-  let task =
+let blob_get fd _printer rpc session_id params =
+  let blob_uuid = List.assoc "uuid" params in
+  let blob_ref = Client.Blob.get_by_uuid ~rpc ~session_id ~uuid:blob_uuid in
+  let filename = List.assoc "filename" params in
+  let blobtask =
     Client.Task.create ~rpc ~session_id
-      ~label:(Printf.sprintf "%s (ref=%s)" label (Ref.string_of obj))
+      ~label:(Printf.sprintf "Obtaining blob, ref=%s" (Ref.string_of blob_ref))
       ~description:""
   in
-  Client.Task.set_progress ~rpc ~session_id ~self:task ~value:(-1.0) ;
-  let command = f session_id task obj in
+  Client.Task.set_progress ~rpc ~session_id ~self:blobtask ~value:(-1.0) ;
+  let bloburi =
+    Printf.sprintf "%s?session_id=%s&task_id=%s&ref=%s" Constants.blob_uri
+      (Ref.string_of session_id) (Ref.string_of blobtask)
+      (Ref.string_of blob_ref)
+  in
   finally
     (fun () ->
-      marshal fd (Command command) ;
+      marshal fd (Command (HttpGet (filename, bloburi))) ;
       let response = ref (Response Wait) in
       while !response = Response Wait do
         response := unmarshal fd
@@ -5700,48 +5702,106 @@ let command_in_task ~rpc ~session_id ~fd ~obj ~label ~quiet_on_success f =
         | Response OK ->
             true
         | Response Failed ->
-            (* Need to check whether the thin cli managed to contact the server
-             * or not. If not, we need to mark the task as failed.
-             *)
-            if Client.Task.get_progress ~rpc ~session_id ~self:task < 0.0 then
-              Client.Task.set_status ~rpc ~session_id ~self:task ~value:`failure ;
+            if Client.Task.get_progress ~rpc ~session_id ~self:blobtask < 0.0
+            then
+              Client.Task.set_status ~rpc ~session_id ~self:blobtask
+                ~value:`failure ;
             false
         | _ ->
             false
       in
-      wait_for_task_complete rpc session_id task ;
-      check_task_status ~rpc ~session_id ~task ~fd ~label ~ok ~quiet_on_success
-        ()
+      wait_for_task_complete rpc session_id blobtask ;
+      (* if the client thinks it's ok, check that the server does too *)
+      match Client.Task.get_status ~rpc ~session_id ~self:blobtask with
+      | `success ->
+          if ok then
+            marshal fd (Command (Print "Blob get succeeded"))
+          else (
+            marshal fd
+              (Command (PrintStderr "Blob get failed, unknown error.\n")) ;
+            raise (ExitWithError 1)
           )
-    (fun () -> Client.Task.destroy ~rpc ~session_id ~self:task)
-
-let blob_uri ~session_id ~task ~blob =
-  let query =
-    [
-      ("session_id", [Ref.string_of session_id])
-    ; ("task_id", [Ref.string_of task])
-    ; ("ref", [Ref.string_of blob])
-    ]
+      | `failure ->
+          let result =
+            Client.Task.get_error_info ~rpc ~session_id ~self:blobtask
           in
-  Uri.make ~path:Constants.blob_uri ~query () |> Uri.to_string
-
-let blob_get fd _printer rpc session_id params =
-  let blob_uuid = List.assoc "uuid" params in
-  let blob_ref = Client.Blob.get_by_uuid ~rpc ~session_id ~uuid:blob_uuid in
-  let filename = List.assoc "filename" params in
-  command_in_task ~rpc ~session_id ~fd ~obj:blob_ref ~label:"GET blob"
-    ~quiet_on_success:false (fun session_id task blob ->
-      HttpGet (filename, blob_uri ~session_id ~task ~blob)
+          if result = [] then
+            marshal fd (Command (PrintStderr "Blob get failed, unknown error\n"))
+          else
+            raise (Api_errors.Server_error (List.hd result, List.tl result))
+      | `cancelled ->
+          marshal fd (Command (PrintStderr "Blob get cancelled\n")) ;
+          raise (ExitWithError 1)
+      | _ ->
+          marshal fd (Command (PrintStderr "Internal error\n")) ;
+          (* should never happen *)
+          raise (ExitWithError 1)
     )
+    (fun () -> Client.Task.destroy ~rpc ~session_id ~self:blobtask)
 
 let blob_put fd _printer rpc session_id params =
   let blob_uuid = List.assoc "uuid" params in
   let blob_ref = Client.Blob.get_by_uuid ~rpc ~session_id ~uuid:blob_uuid in
   let filename = List.assoc "filename" params in
-  command_in_task ~rpc ~session_id ~fd ~obj:blob_ref ~label:"PUT blob"
-    ~quiet_on_success:false (fun session_id task blob ->
-      HttpPut (filename, blob_uri ~session_id ~task ~blob)
+  let blobtask =
+    Client.Task.create ~rpc ~session_id
+      ~label:(Printf.sprintf "Blob PUT, ref=%s" (Ref.string_of blob_ref))
+      ~description:""
+  in
+  Client.Task.set_progress ~rpc ~session_id ~self:blobtask ~value:(-1.0) ;
+  let bloburi =
+    Printf.sprintf "%s?session_id=%s&task_id=%s&ref=%s" Constants.blob_uri
+      (Ref.string_of session_id) (Ref.string_of blobtask)
+      (Ref.string_of blob_ref)
+  in
+  finally
+    (fun () ->
+      marshal fd (Command (HttpPut (filename, bloburi))) ;
+      let response = ref (Response Wait) in
+      while !response = Response Wait do
+        response := unmarshal fd
+      done ;
+      let ok =
+        match !response with
+        | Response OK ->
+            true
+        | Response Failed ->
+            if Client.Task.get_progress ~rpc ~session_id ~self:blobtask < 0.0
+            then
+              Client.Task.set_status ~rpc ~session_id ~self:blobtask
+                ~value:`failure ;
+            false
+        | _ ->
+            false
+      in
+      wait_for_task_complete rpc session_id blobtask ;
+      (* if the client thinks it's ok, check that the server does too *)
+      match Client.Task.get_status ~rpc ~session_id ~self:blobtask with
+      | `success ->
+          if ok then
+            marshal fd (Command (Print "Blob put succeeded"))
+          else (
+            marshal fd
+              (Command (PrintStderr "Blob put failed, unknown error.\n")) ;
+            raise (ExitWithError 1)
+          )
+      | `failure ->
+          let result =
+            Client.Task.get_error_info ~rpc ~session_id ~self:blobtask
+          in
+          if result = [] then
+            marshal fd (Command (PrintStderr "Blob put failed, unknown error\n"))
+          else
+            raise (Api_errors.Server_error (List.hd result, List.tl result))
+      | `cancelled ->
+          marshal fd (Command (PrintStderr "Blob put cancelled\n")) ;
+          raise (ExitWithError 1)
+      | _ ->
+          marshal fd (Command (PrintStderr "Internal error\n")) ;
+          (* should never happen *)
+          raise (ExitWithError 1)
     )
+    (fun () -> Client.Task.destroy ~rpc ~session_id ~self:blobtask)
 
 let blob_create printer rpc session_id params =
   let name = List.assoc "name" params in
@@ -7624,58 +7684,19 @@ let update_resync_host _printer rpc session_id params =
   let host = Client.Host.get_by_uuid ~rpc ~session_id ~uuid in
   Client.Pool_update.resync_host ~rpc ~session_id ~host
 
-let get_avail_updates_uri ~session_id ~task ~host =
-  let query =
-    [
-      ("session_id", [Ref.string_of session_id])
-    ; ("task_id", [Ref.string_of task])
-    ; ("host_refs", [Ref.string_of host])
-    ]
-  in
-  Uri.make ~path:Constants.get_updates_uri ~query () |> Uri.to_string
-
-let print_avail_updates ~rpc ~session_id ~fd ~host =
-  command_in_task ~rpc ~session_id ~fd ~obj:host
-    ~label:"Print available updates for host" ~quiet_on_success:true
-    (fun session_id task host ->
-      PrintHttpGetJson (get_avail_updates_uri ~session_id ~task ~host)
-  )
-
-let print_update_guidance ~rpc ~session_id ~fd ~host =
-  command_in_task ~rpc ~session_id ~fd ~obj:host
-    ~label:"Print update guidance for host" ~quiet_on_success:true
-    (fun session_id task host ->
-      PrintUpdateGuidance (get_avail_updates_uri ~session_id ~task ~host)
-  )
-
-let host_apply_updates fd printer rpc session_id params =
+let host_apply_updates _printer rpc session_id params =
   let hash = List.assoc "hash" params in
-  do_host_op rpc session_id ~multiple:false
+  ignore
+    (do_host_op rpc session_id ~multiple:false
        (fun _ host ->
          let host = host.getref () in
-      printer (Cli_printer.PMsg "Guidance of updates:") ;
-      print_update_guidance ~rpc ~session_id ~fd ~host ;
-      printer (Cli_printer.PMsg "Applying updates ...") ;
-      match Client.Host.apply_updates ~rpc ~session_id ~self:host ~hash with
-      | [] ->
-          printer (Cli_printer.PMsg "Updated.")
-      | warnings ->
-          printer (Cli_printer.PMsg "Updated with warnings:") ;
-          List.iter
-            (fun l -> printer (Cli_printer.PMsg (String.concat "; " l)))
-            warnings
+         Client.Host.apply_updates ~rpc ~session_id ~self:host ~hash
+         |> List.iter (fun l ->
+                _printer (Cli_printer.PMsg (String.concat "; " l))
+            )
        )
        params ["hash"]
-  |> ignore
-
-let host_updates_show_available fd _printer rpc session_id params =
-  do_host_op rpc session_id ~multiple:false
-    (fun _ host ->
-      let host = host.getref () in
-      print_avail_updates ~rpc ~session_id ~fd ~host
     )
-    params []
-  |> ignore
 
 module SDN_controller = struct
   let introduce printer rpc session_id params =
diff --git a/ocaml/xapi-consts/api_messages.ml b/ocaml/xapi-consts/api_messages.ml
index 250099d2c..bb63facfe 100644
--- a/ocaml/xapi-consts/api_messages.ml
+++ b/ocaml/xapi-consts/api_messages.ml
@@ -299,11 +299,18 @@ let pool_cpu_features_down = addMessage "POOL_CPU_FEATURES_DOWN" 5L
 let pool_cpu_features_up = addMessage "POOL_CPU_FEATURES_UP" 5L
 
 (* Cluster messages *)
+let cluster_quorum_approaching_lost =
+  addMessage "CLUSTER_QUORUM_APPROACHING_LOST" 2L
+
 let cluster_host_enable_failed = addMessage "CLUSTER_HOST_ENABLE_FAILED" 3L
 
 (* raised by external script in clustering daemon, do not delete this: it is not dead code *)
 let cluster_host_fencing = addMessage "CLUSTER_HOST_FENCING" 2L
 
+let cluster_host_leaving = addMessage "CLUSTER_HOST_LEAVING" 3L
+
+let cluster_host_joining = addMessage "CLUSTER_HOST_JOINING" 4L
+
 (* Certificate expiration messages *)
 let host_server_certificate_expiring = "HOST_SERVER_CERTIFICATE_EXPIRING"
 
diff --git a/ocaml/xapi-consts/constants.ml b/ocaml/xapi-consts/constants.ml
index 5a0eea13a..e0ae61b00 100644
--- a/ocaml/xapi-consts/constants.ml
+++ b/ocaml/xapi-consts/constants.ml
@@ -398,3 +398,19 @@ let good_ciphersuites =
     ["ECDHE-RSA-AES256-GCM-SHA384"; "ECDHE-RSA-AES128-GCM-SHA256"]
 
 let verify_certificates_path = "/var/xapi/verify-certificates"
+
+let observer_component_xapi = "xapi"
+
+let observer_component_xenopsd = "xenopsd"
+
+let observer_component_xapi_clusterd = "xapi-clusterd"
+
+let observer_component_smapi = "smapi"
+
+let observer_components_all =
+  [
+    observer_component_xapi
+  ; observer_component_xenopsd
+  ; observer_component_xapi_clusterd
+  ; observer_component_smapi
+  ]
diff --git a/ocaml/xapi-guard/lib/disk_cache.ml b/ocaml/xapi-guard/lib/disk_cache.ml
new file mode 100644
index 000000000..0f0a6e2c2
--- /dev/null
+++ b/ocaml/xapi-guard/lib/disk_cache.ml
@@ -0,0 +1,604 @@
+(* Copyright (C) Cloud Software Group, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU Lesser General Public License as published
+   by the Free Software Foundation; version 2.1 only. with the special
+   exception on linking described in file LICENSE.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU Lesser General Public License for more details.
+*)
+
+module D = Debug.Make (struct let name = __MODULE__ end)
+
+let ( // ) = Filename.concat
+
+let runtime_data = "/var/lib" // "xapi-guard"
+
+let ( let* ) = Lwt.bind
+
+let ( let@ ) f x = f x
+
+let with_lock = Lwt_mutex.with_lock
+
+type t = Uuidm.t * Mtime.t * Types.Tpm.key
+
+let cache_of service = runtime_data // Types.Service.to_string service
+
+let fistpoint () =
+  let name = "/tmp/fist_disable_xapi_guard_cache" in
+  Lwt.catch
+    (fun () ->
+      let* () = Lwt_unix.access name [Unix.F_OK] in
+      Lwt.return true
+    )
+    (fun _ -> Lwt.return false)
+
+let files_in dir ~otherwise =
+  Lwt.catch
+    (fun () ->
+      let* listing = Lwt_unix.files_of_directory dir |> Lwt_stream.to_list in
+      List.filter_map
+        (function "." | ".." -> None | name -> Some (dir // name))
+        listing
+      |> Lwt.return
+    )
+    otherwise
+
+let unlink_safe file =
+  let __FUN = __FUNCTION__ in
+  Lwt.catch
+    (fun () -> Lwt_unix.unlink file)
+    (function
+      | Unix.(Unix_error (ENOENT, _, _)) ->
+          Lwt.pause ()
+      | e ->
+          D.info "%s: error %s when deleting %s, ignoring" __FUN
+            (Printexc.to_string e) file ;
+          Lwt.pause ()
+      )
+
+type valid_file = t * string
+
+type file =
+  | Latest of valid_file
+  | Outdated of valid_file
+  | Temporary of string
+  | Invalid of string
+
+let path_of_key root (uuid, timestamp, key) =
+  root
+  // Uuidm.to_string uuid
+  // Types.Tpm.(serialize_key key |> string_of_int)
+  // Mtime.(to_uint64_ns timestamp |> Int64.to_string)
+
+let key_of_path path =
+  let ( let* ) = Option.bind in
+  let key_dir = Filename.(dirname path) in
+  let* uuid = Filename.(basename (dirname key_dir)) |> Uuidm.of_string in
+  let* key =
+    Filename.basename key_dir
+    |> int_of_string_opt
+    |> Option.map Types.Tpm.deserialize_key
+  in
+  let* timestamp =
+    Filename.basename path
+    |> Int64.of_string_opt
+    |> Option.map Mtime.of_uint64_ns
+  in
+  Some ((uuid, timestamp, key), path)
+
+let path_is_temp path =
+  let pathlen = String.length path in
+  String.ends_with ~suffix:".pre" path
+  && key_of_path (String.sub path 0 (pathlen - 4)) |> Option.is_some
+
+let temp_of_path path = path ^ ".pre"
+
+let sort_updates contents =
+  let classify elem =
+    match key_of_path elem with
+    | None ->
+        let file =
+          if path_is_temp elem then
+            Temporary elem
+          else
+            Invalid elem
+        in
+        Either.Right file
+    | Some valid_file ->
+        Either.Left valid_file
+  in
+  let valid_files, invalid = List.partition_map classify contents in
+
+  let valid =
+    let ordered =
+      List.fast_sort
+        (fun ((_, x, _), _) ((_, y, _), _) -> Mtime.compare y x)
+        valid_files
+    in
+    match ordered with
+    | [] ->
+        []
+    | latest :: outdated ->
+        Latest latest :: List.map (fun outdated -> Outdated outdated) outdated
+  in
+  List.concat [valid; invalid]
+
+let get_all_contents root =
+  let empty = Fun.const (Lwt.return []) in
+  let contents_of_key key =
+    let* contents = files_in key ~otherwise:empty in
+    Lwt.return (sort_updates contents)
+  in
+  let* tpms = files_in root ~otherwise:empty in
+  let* files =
+    Lwt_list.map_p
+      (fun tpm ->
+        let* keys = files_in tpm ~otherwise:empty in
+        Lwt_list.map_p contents_of_key keys
+      )
+      tpms
+  in
+  Lwt.return List.(concat (concat files))
+
+(** Warning, may raise Unix.Unix_error *)
+let read_from ~filename =
+  let flags = Unix.[O_RDONLY] in
+  let perm = 0o000 in
+  Lwt_io.with_file ~flags ~perm ~mode:Input filename Lwt_io.read
+
+let persist_to ~filename:f_path ~contents =
+  let atomic_write_to_file ~perm f =
+    let tmp_path = temp_of_path f_path in
+    let dirname = Filename.dirname f_path in
+    let flags = Unix.[O_WRONLY; O_CREAT; O_SYNC] in
+    let* fd_tmp = Lwt_unix.openfile tmp_path flags perm in
+    let* () =
+      Lwt.finalize
+        (fun () ->
+          (* do not close fd when closing the channel, avoids double-closing the fd *)
+          let close () = Lwt.return_unit in
+          let chan = Lwt_io.of_fd ~mode:Output ~close fd_tmp in
+          let* () =
+            Lwt.finalize (fun () -> f chan) (fun () -> Lwt_io.close chan)
+          in
+          Lwt_unix.fsync fd_tmp
+        )
+        (fun () -> Lwt_unix.close fd_tmp)
+    in
+    let* () = Lwt_unix.rename tmp_path f_path in
+    let* fd_dir = Lwt_unix.openfile dirname [O_RDONLY] 0 in
+    Lwt.finalize
+      (fun () -> Lwt_unix.fsync fd_dir)
+      (fun () -> Lwt_unix.close fd_dir)
+  in
+  let write out_chan = Lwt_io.write out_chan contents in
+  atomic_write_to_file ~perm:0o600 write
+
+(** - Direct: request doesn't pass through the cache
+    - Engaged: both side coordinate through the queue, writer ends the mode
+      when the queue has been filled.
+    - Disengaged: writer ignores the queue, reader empties it and the cache;
+      then it changes the mode to engaged.
+*)
+type state = Direct | Engaged | Disengaged
+
+type channel = {
+    queue: t Lwt_bounded_stream.t
+  ; push: t option -> unit option
+  ; lock: Lwt_mutex.t (* lock for the states *)
+  ; mutable state: state
+}
+
+(*
+   Notes:
+   - uses Mtime.t to force usage of monotonic time
+   - This means that between runs (and reboots) cached stated is lost if not
+     persisted first.
+   IDEA: carryover: read contents of cache and "convert it" to the current run
+
+   TODO:
+     - Exponential backoff on xapi push error
+     - Limit error logging on xapi push error: once per downtime is enough
+ *)
+
+module Writer : sig
+  val with_cache :
+       direct:
+         (t -> (string, exn) Lwt_result.t)
+         * (t -> string -> (unit, exn) Lwt_result.t)
+    -> Types.Service.t
+    -> channel
+    -> ((t -> string Lwt.t) * (t -> string -> unit Lwt.t) -> 'a Lwt.t)
+    -> 'a Lwt.t
+  (** [with_cache ~direct typ queue context] creates a cache for content of
+      type [typ]. The cache is readable and writable through the function
+      [context], which is provided a reading and writing functions [direct].
+      It uses [channel] to push events to
+
+      Example:
+        Xapi_guard.Disk_cache.(Writer.with_cache ~direct:(read, upload) Tpm channel)
+        @@ fun read_tpm, write_tpm -> write_tpm (uuid, time, key) contents
+    *)
+end = struct
+  let mkdir_p ?(perm = 0o755) path =
+    let rec loop acc path =
+      let create_dir () = Lwt_unix.mkdir path perm in
+      let create_subdirs () = Lwt_list.iter_s (fun (_, f) -> f ()) acc in
+      Lwt.try_bind create_dir create_subdirs (function
+        | Unix.(Unix_error (EEXIST, _, _)) ->
+            (* create directories, parents first *)
+            create_subdirs ()
+        | Unix.(Unix_error (ENOENT, _, _)) ->
+            let parent = Filename.dirname path in
+            loop ((path, create_dir) :: acc) parent
+        | exn ->
+            let msg =
+              Printf.sprintf {|Could not create directory "%s" because: %s|}
+                path (Printexc.to_string exn)
+            in
+            Lwt.fail (Failure msg)
+        )
+    in
+    loop [] path
+
+  let files_in_existing dir =
+    let create_dir = function
+      | Unix.(Unix_error (ENOENT, _, _)) ->
+          let* () = mkdir_p dir ~perm:0o700 in
+          Lwt.return []
+      | e ->
+          raise e
+    in
+    files_in dir ~otherwise:create_dir
+
+  let fail exn =
+    Debug.log_backtrace exn (Backtrace.get exn) ;
+    Lwt_result.fail exn
+
+  let read_contents ~direct root (uuid, now, key) =
+    let read_remote () =
+      let read, _ = direct in
+      let* result =
+        Lwt.try_bind
+          (fun () -> read (uuid, now, key))
+          (function
+            | Ok contents -> Lwt_result.return contents | Error exn -> fail exn
+            )
+          fail
+      in
+      match result with
+      | Ok contents ->
+          Lwt.return contents
+      | Error exn ->
+          raise exn
+    in
+
+    let key_str = Types.Tpm.(serialize_key key |> string_of_int) in
+    let key_dir = root // Uuidm.(to_string uuid) // key_str in
+
+    (* 1. Get updates *)
+    let* contents = files_in key_dir ~otherwise:(fun _ -> Lwt.return []) in
+    let updates = sort_updates contents in
+
+    (* 2. Pick latest *)
+    let only_latest = function
+      | Latest (_, p) ->
+          Either.Left p
+      | Temporary p | Outdated (_, p) | Invalid p ->
+          Right p
+    in
+    let latest, _ = List.partition_map only_latest updates in
+
+    (* 3. fall back to remote read if needed *)
+    let get_contents path =
+      Lwt.catch (fun () -> read_from ~filename:path) (fun _ -> read_remote ())
+    in
+
+    match latest with path :: _ -> get_contents path | [] -> read_remote ()
+
+  let write_contents ~direct root queue (uuid, now, key) contents =
+    let __FUN = __FUNCTION__ in
+
+    let _, direct = direct in
+    let key_str = Types.Tpm.(serialize_key key |> string_of_int) in
+    let key_dir = root // Uuidm.(to_string uuid) // key_str in
+    (* 1. Record existing requests in cache *)
+    let* outdated_contents = files_in_existing key_dir in
+
+    let filename = key_dir // (Mtime.to_uint64_ns now |> Int64.to_string) in
+
+    (* 2. Try to push the changes, if possible. If it's not possible because of
+       the mode or a failure, write new timestamped content to cache,
+       atomically; and finally notify the other side if needed *)
+    (* Note that all queue operations must use while holding its mutex *)
+    let persist () = persist_to ~filename ~contents in
+    let persist_and_push () =
+      let push () =
+        match queue.push (Some (uuid, now, key)) with
+        | Some () ->
+            Lwt.return_unit
+        | None ->
+            (* Queue is full, change mode to ignore queue *)
+            queue.state <- Disengaged ;
+            Lwt.return_unit
+      in
+      let* () = persist () in
+      push ()
+    in
+    let engage_and_persist exn =
+      queue.state <- Engaged ;
+      D.info "%s: Error on push. Reason: %s" __FUN (Printexc.to_string exn) ;
+      let* () = persist_and_push () in
+      Lwt_result.return ()
+    in
+    let read_state_and_push on_exception () =
+      match queue.state with
+      | Direct ->
+          let* result =
+            Lwt.try_bind
+              (fun () -> direct (uuid, now, key) contents)
+              (function
+                | Ok () -> Lwt_result.return () | Error exn -> on_exception exn
+                )
+              on_exception
+          in
+          Lwt.return result
+      | Engaged ->
+          let* () = persist_and_push () in
+          Lwt_result.return ()
+      | Disengaged ->
+          let* () = persist () in
+          Lwt_result.return ()
+    in
+    let* cache_disabled = fistpoint () in
+    let on_exception = if cache_disabled then fail else engage_and_persist in
+
+    let* result = with_lock queue.lock (read_state_and_push on_exception) in
+    let* () =
+      match result with Ok () -> Lwt.return_unit | Error exn -> raise exn
+    in
+
+    (* 4. Delete previous requests from filesystem *)
+    let* _ = Lwt_list.map_p unlink_safe outdated_contents in
+    Lwt.return_unit
+
+  let with_cache ~direct typ queue f =
+    let root = cache_of typ in
+    let* () = mkdir_p root ~perm:0o700 in
+    f (read_contents ~direct root, write_contents ~direct root queue)
+end
+
+module Watcher : sig
+  val watch :
+       direct:(t -> string -> (unit, exn) Lwt_result.t)
+    -> Types.Service.t
+    -> channel
+    -> unit
+    -> unit Lwt.t
+end = struct
+  type push_cache = File of valid_file | Update_all | Wait
+
+  (* Outdated and invalid files can be deleted, keep temporary files just in case
+     they need to be recovered *)
+  let discarder = function
+    | Latest _ as f ->
+        Either.Left f
+    | Temporary _ as f ->
+        Left f
+    | Outdated (_, p) ->
+        Right p
+    | Invalid p ->
+        Right p
+
+  let get_latest_and_delete_rest root =
+    let* files = get_all_contents root in
+    let keep, to_delete = List.partition_map discarder files in
+    let* () = Lwt_list.iter_p unlink_safe to_delete in
+    (* Ignore temporaty files *)
+    let latest =
+      List.filter_map (function Latest f -> Some f | _ -> None) keep
+    in
+    Lwt.return latest
+
+  let retry_push push (uuid, timestamp, key) contents =
+    let __FUN = __FUNCTION__ in
+    let push' () = push (uuid, timestamp, key) contents in
+    let rec retry k =
+      let on_error e =
+        D.info "%s: Error on push, attempt %i. Reason: %s" __FUN k
+          (Printexc.to_string e) ;
+        let* () = Lwt_unix.sleep 0.1 in
+        retry (k + 1)
+      in
+      Lwt.try_bind push'
+        (function Ok () -> Lwt.return_unit | Error e -> on_error e)
+        on_error
+    in
+    retry 1
+
+  let push_file push (key, path) =
+    let __FUN = __FUNCTION__ in
+    let on_error = function
+      | Unix.(Unix_error (ENOENT, _, _)) ->
+          Lwt.return_unit
+      | exn ->
+          D.info "%s: error when reading '%s': %s" __FUN path
+            Printexc.(to_string exn) ;
+          Lwt.return_unit
+    in
+
+    Lwt.try_bind
+      (fun () -> read_from ~filename:path)
+      (fun contents ->
+        let* () = retry_push push key contents in
+        unlink_safe path
+      )
+      on_error
+
+  let push_files push files = Lwt_list.iter_s (push_file push) (List.rev files)
+
+  let update_all queue push root =
+    let __FUN = __FUNCTION__ in
+    let* contents = get_latest_and_delete_rest root in
+    let* () = push_files push contents in
+    let@ () = with_lock queue.lock in
+    let* contents = get_latest_and_delete_rest root in
+    let* () =
+      match contents with
+      | [] ->
+          queue.state <- Direct ;
+          D.debug "%s: Cache clean; Going direct" __FUN ;
+          Lwt.return_unit
+      | _ ->
+          Lwt.return_unit
+    in
+    Lwt.return_unit
+
+  let resolve queue push root = function
+    | File file -> (
+        let* () = push_file push file in
+        let@ () = with_lock queue.lock in
+        match queue.state with
+        | Direct | Disengaged ->
+            Lwt.return_unit
+        | Engaged ->
+            let () =
+              if Lwt_bounded_stream.size queue.queue = 0 then
+                queue.state <- Direct
+            in
+            Lwt.return_unit
+      )
+    | Update_all ->
+        update_all queue push root
+    | Wait ->
+        (* Do not busy loop when the system can cope with the requests *)
+        Lwt_unix.sleep 0.2
+
+  let watch ~direct typ queue =
+    let root = cache_of typ in
+    let __FUN = __FUNCTION__ in
+    let rec loop () =
+      (* When the pushing side is disengaged it doesn't push events to the
+         queue, this means that trying to drain it completely would leave the
+         pulling side locked waiting when the queue is empty.
+           - Read the number of elements in the queue while draining it and
+             then switch to read the contents from the cache; or
+           - Switch immediately to reading the contents from cache and ignore
+             the contents of the queue by calling an specialized method in the
+             queue module to drain it.
+      *)
+      let get_action () =
+        let@ () = with_lock queue.lock in
+        match queue.state with
+        | Disengaged when Lwt_bounded_stream.size queue.queue < 1 ->
+            let* () = Lwt.pause () in
+            Lwt.return Update_all
+        | Direct ->
+            let* () = Lwt.pause () in
+            Lwt.return Wait
+        | _ -> (
+            let* elem = Lwt_bounded_stream.get queue.queue in
+            match elem with
+            | None ->
+                raise (Failure "Other side closed channel, cannot continue")
+            | Some elem ->
+                Lwt.return (File (elem, path_of_key root elem))
+          )
+      in
+      let* action = get_action () in
+      let* () = resolve queue direct root action in
+      loop ()
+    in
+    loop
+end
+
+(** Module use to change the cache contents before the reader and writer start
+    running *)
+module Setup : sig
+  val retime_cache_contents : Types.Service.t -> unit Lwt.t
+end = struct
+  type file_action =
+    | Keep of file
+    | Delete of string
+    | Move of {from: string; into: string}
+
+  let get_fs_action root now = function
+    | Latest ((uuid, timestamp, key), from) as latest ->
+        if Mtime.is_later ~than:now timestamp then
+          let timestamp = now in
+          let into = path_of_key root (uuid, timestamp, key) in
+          Move {from; into}
+        else
+          Keep latest
+    | Temporary _ as temp ->
+        Keep temp
+    | Invalid p | Outdated (_, p) ->
+        Delete p
+
+  let commit __FUN = function
+    | Keep (Temporary p) ->
+        D.warn "%s: Found temporary file, ignoring '%s'" __FUN p ;
+        Lwt.return_unit
+    | Keep _ ->
+        Lwt.return_unit
+    | Delete p ->
+        D.info "%s: Deleting '%s'" __FUN p ;
+        Lwt_unix.unlink p
+    | Move {from; into} ->
+        D.info "%s: Moving '%s' to '%s'" __FUN from into ;
+        Lwt_unix.rename from into
+
+  let rec delete_empty_dirs ~delete_root root =
+    (* Delete subdirectories, then *)
+    let* files = files_in root ~otherwise:(fun _ -> Lwt.return []) in
+    let* () =
+      Lwt_list.iter_p
+        (fun path ->
+          let* {st_kind; _} = Lwt_unix.stat path in
+          match st_kind with
+          | S_DIR ->
+              delete_empty_dirs ~delete_root:true path
+          | _ ->
+              Lwt.return_unit
+        )
+        files
+    in
+    if not delete_root then
+      Lwt.return_unit
+    else
+      let* files = files_in root ~otherwise:(fun _ -> Lwt.return []) in
+      Lwt.catch
+        (fun () ->
+          if files = [] then
+            Lwt_unix.rmdir root
+          else
+            Lwt.return_unit
+        )
+        (fun _ -> Lwt.return_unit)
+
+  (* The code assumes it's the only with access to the disk cache while running *)
+  let retime_cache_contents typ =
+    let now = Mtime_clock.now () in
+    let root = cache_of typ in
+    let* contents = get_all_contents root in
+    let* () =
+      contents
+      |> List.map (get_fs_action root now)
+      |> Lwt_list.iter_p (commit __FUNCTION__)
+    in
+    delete_empty_dirs ~delete_root:false root
+end
+
+let setup typ read write =
+  let* () = Setup.retime_cache_contents typ in
+  let queue, push = Lwt_bounded_stream.create 2 in
+  let lock = Lwt_mutex.create () in
+  let q = {queue; push; lock; state= Disengaged} in
+  Lwt.return
+    ( Writer.with_cache ~direct:(read, write) typ q
+    , Watcher.watch ~direct:write typ q
+    )
diff --git a/ocaml/xapi-guard/lib/disk_cache.mli b/ocaml/xapi-guard/lib/disk_cache.mli
new file mode 100644
index 000000000..08c345615
--- /dev/null
+++ b/ocaml/xapi-guard/lib/disk_cache.mli
@@ -0,0 +1,31 @@
+(* Copyright (C) Cloud Software Group, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU Lesser General Public License as published
+   by the Free Software Foundation; version 2.1 only. with the special
+   exception on linking described in file LICENSE.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU Lesser General Public License for more details.
+*)
+
+(** [t] t is the minimal type to recognise elements in a cache. This does not
+    contain the contents of the elements being contained, only the metadata *)
+type t = Uuidm.t * Mtime.t * Types.Tpm.key
+
+val setup :
+     Types.Service.t
+  -> (t -> (string, exn) Lwt_result.t)
+  -> (t -> string -> (unit, exn) Lwt_result.t)
+  -> ( (   ((t -> string Lwt.t) * (t -> string -> unit Lwt.t) -> 'a Lwt.t)
+        -> 'a Lwt.t
+       )
+     * (unit -> unit Lwt.t)
+     )
+     Lwt.t
+(** [setup service read_callback push_callback] Returns a local disk buffer for
+    [service] which will use [push_callback] to push the elements to their
+    final destination and [read_callback] to read elements if they are not in
+    the buffer. *)
diff --git a/ocaml/xapi-guard/lib/dorpc.ml b/ocaml/xapi-guard/lib/dorpc.ml
index 2074f35e5..bbb1f11bb 100644
--- a/ocaml/xapi-guard/lib/dorpc.ml
+++ b/ocaml/xapi-guard/lib/dorpc.ml
@@ -13,7 +13,7 @@
  *)
 open Idl
 
-module D = Debug.Make (struct let name = "varstored-guard rpc" end)
+module D = Debug.Make (struct let name = "xapi-guard rpc" end)
 
 let wrap_rpc error f =
   let on_error e =
diff --git a/ocaml/xapi-guard/lib/dune b/ocaml/xapi-guard/lib/dune
index bfb4841ab..052810ead 100644
--- a/ocaml/xapi-guard/lib/dune
+++ b/ocaml/xapi-guard/lib/dune
@@ -20,10 +20,13 @@
 )
 (library
  (name xapi_guard)
- (modules dorpc)
+ (modules dorpc types disk_cache lwt_bounded_stream)
  (libraries
   rpclib.core
+  inotify
+  inotify.lwt
   lwt
+  lwt.unix
   uri
   xapi-backtrace
   xapi-consts
diff --git a/ocaml/xapi-guard/lib/lwt_bounded_stream.ml b/ocaml/xapi-guard/lib/lwt_bounded_stream.ml
new file mode 100644
index 000000000..90efe8375
--- /dev/null
+++ b/ocaml/xapi-guard/lib/lwt_bounded_stream.ml
@@ -0,0 +1,48 @@
+(*
+ * Copyright (c) 2012 Citrix Systems
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ *)
+
+let ( let* ) = Lwt.bind
+
+type 'a t = {stream: 'a Lwt_stream.t; capacity: int; size: int ref}
+
+let create capacity =
+  let stream, stream_push = Lwt_stream.create () in
+  let t = {stream; capacity; size= ref 0} in
+  let push = function
+    | Some _ when !(t.size) > t.capacity ->
+        None
+    | None ->
+        stream_push None ; Some ()
+    | elem ->
+        stream_push elem ; incr t.size ; Some ()
+  in
+  (t, push)
+
+let size {size; _} = !size
+
+let get_available t =
+  let all = Lwt_stream.get_available t.stream in
+  t.size := !(t.size) - List.length all ;
+  all
+
+let get t =
+  let* elem = Lwt_stream.get t.stream in
+  decr t.size ; Lwt.return elem
+
+let nget n t =
+  let* all = Lwt_stream.nget n t.stream in
+  t.size := !(t.size) - List.length all ;
+  Lwt.return all
diff --git a/ocaml/xapi-guard/lib/lwt_bounded_stream.mli b/ocaml/xapi-guard/lib/lwt_bounded_stream.mli
new file mode 100644
index 000000000..b2b310f77
--- /dev/null
+++ b/ocaml/xapi-guard/lib/lwt_bounded_stream.mli
@@ -0,0 +1,34 @@
+(*
+ * Copyright (c) 2012 Citrix Systems
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ *)
+
+(** Similar to Lwt_stream.bounded_push except threads never block in push() *)
+type 'a t
+
+val create : int -> 'a t * ('a option -> unit option)
+(** [create capacity] creates a stream which can contain at most
+    [capacity] elements *)
+
+val get_available : 'a t -> 'a list
+(** [get_available t] returns all available elements from [t] without blocking *)
+
+val get : 'a t -> 'a option Lwt.t
+(** [get t] returns an element from [t] *)
+
+val nget : int -> 'a t -> 'a list Lwt.t
+(** [nget n t] returns [n] elements from [t] *)
+
+val size : 'a t -> int
+(** [size t] return the number of enqueued elements *)
diff --git a/ocaml/xapi-guard/lib/server_interface.ml b/ocaml/xapi-guard/lib/server_interface.ml
index 5b2acd7c2..0884c2bf1 100644
--- a/ocaml/xapi-guard/lib/server_interface.ml
+++ b/ocaml/xapi-guard/lib/server_interface.ml
@@ -15,9 +15,10 @@
 open Rpc
 open Lwt.Syntax
 
-module D = Debug.Make (struct let name = "varstored_interface" end)
+module D = Debug.Make (struct let name = __MODULE__ end)
 
 open D
+module Tpm = Xapi_guard.Types.Tpm
 
 type rpc_t = Rpc.t
 
@@ -25,7 +26,7 @@ let err = Xenops_interface.err
 
 type nvram = (string * string) list [@@deriving rpcty]
 
-let originator = "varstored-guard"
+let originator = "xapi-guard"
 
 type session = [`session] Ref.t
 
@@ -50,8 +51,8 @@ let () =
    * this is only needed for syscalls that would otherwise block *)
   Lwt_unix.set_pool_size 16
 
-let with_xapi ~cache f =
-  Lwt_unix.with_timeout 120. (fun () -> SessionCache.with_session cache f)
+let with_xapi ~cache ?(timeout = 120.) f =
+  Lwt_unix.with_timeout timeout (fun () -> SessionCache.with_session cache f)
 
 let serve_forever_lwt path callback =
   let conn_closed _ = () in
@@ -99,89 +100,72 @@ let serve_forever_lwt_callback rpc_fn path _ req body =
       in
       Cohttp_lwt_unix.Server.respond_string ~status:`Method_not_allowed ~body ()
 
-(* The TPM has 3 kinds of states *)
-type state = {
-    permall: string  (** permanent storage *)
-  ; savestate: string  (** for ACPI S3 *)
-  ; volatilestate: string  (** for snapshot/migration/etc. *)
-}
-
-let split_char = ' '
-
-let join_string = String.make 1 split_char
-
-let deserialize t =
-  match String.split_on_char split_char t with
-  | [permall] ->
-      (* backwards compat with reading tpm2-00.permall *)
-      {permall; savestate= ""; volatilestate= ""}
-  | [permall; savestate; volatilestate] ->
-      {permall; savestate; volatilestate}
-  | splits ->
-      Fmt.failwith "Invalid state: too many splits %d" (List.length splits)
-
-let serialize t =
-  (* it is assumed that swtpm has already base64 encoded this *)
-  String.concat join_string [t.permall; t.savestate; t.volatilestate]
-
-let lookup_key key t =
-  match key with
-  | "/tpm2-00.permall" ->
-      t.permall
-  | "/tpm2-00.savestate" ->
-      t.savestate
-  | "/tpm2-00.volatilestate" ->
-      t.volatilestate
-  | s ->
-      Fmt.invalid_arg "Unknown TPM state key: %s" s
-
-let update_key key state t =
-  if String.contains state split_char then
-    Fmt.invalid_arg
-      "State to be stored (%d bytes) contained forbidden separator: %c"
-      (String.length state) split_char ;
-  match key with
-  | "/tpm2-00.permall" ->
-      {t with permall= state}
-  | "/tpm2-00.savestate" ->
-      {t with savestate= state}
-  | "/tpm2-00.volatilestate" ->
-      {t with volatilestate= state}
-  | s ->
-      Fmt.invalid_arg "Unknown TPM state key: %s" s
-
-let empty = ""
-
-let serve_forever_lwt_callback_vtpm ~cache mutex vtpm _path _ req body =
+let with_xapi_vtpm ~cache vm_uuid =
+  let vm_uuid_str = Uuidm.to_string vm_uuid in
+  let* vm =
+    with_xapi ~cache @@ Xen_api_lwt_unix.VM.get_by_uuid ~uuid:vm_uuid_str
+  in
+  let* vTPMs = with_xapi ~cache @@ Xen_api_lwt_unix.VM.get_VTPMs ~self:vm in
+  match vTPMs with
+  | [] ->
+      D.warn
+        "%s: received a request from a VM that has no VTPM associated, \
+         ignoring request"
+        __FUNCTION__ ;
+      let msg =
+        Printf.sprintf "No VTPM associated with VM %s, nothing to do"
+          vm_uuid_str
+      in
+      raise (Failure msg)
+  | self :: _ ->
+      Lwt.return self
+
+let push_vtpm ~cache (vm_uuid, _timestamp, key) contents =
+  let* self = with_xapi_vtpm ~cache vm_uuid in
+  let* old_contents = with_xapi ~cache @@ VTPM.get_contents ~self in
+  let contents =
+    Tpm.(old_contents |> deserialize |> update key contents |> serialize)
+  in
+  let* () = with_xapi ~cache @@ VTPM.set_contents ~self ~contents in
+  Lwt_result.return ()
+
+let read_vtpm ~cache (vm_uuid, _timestamp, key) =
+  let* self = with_xapi_vtpm ~cache vm_uuid in
+  let* contents = with_xapi ~cache @@ VTPM.get_contents ~self in
+  let body = Tpm.(contents |> deserialize |> lookup ~key) in
+  Lwt_result.return body
+
+let serve_forever_lwt_callback_vtpm ~cache mutex (read, persist) vm_uuid _ req
+    body =
   let uri = Cohttp.Request.uri req in
+  let timestamp = Mtime_clock.now () in
   (* in case the connection is interrupted/etc. we may still have pending operations,
      so use a per vTPM mutex to ensure we really only have 1 pending operation at a time for a vTPM
   *)
   Lwt_mutex.with_lock mutex @@ fun () ->
   (* TODO: some logging *)
   match (Cohttp.Request.meth req, Uri.path uri) with
-  | `GET, key when key <> "/" ->
-      let* contents = with_xapi ~cache @@ VTPM.get_contents ~self:vtpm in
-      let body = contents |> deserialize |> lookup_key key in
+  | `GET, path when path <> "/" ->
+      let key = Tpm.key_of_swtpm path in
+      let* body = read (vm_uuid, timestamp, key) in
       let headers =
         Cohttp.Header.of_list [("Content-Type", "application/octet-stream")]
       in
       Cohttp_lwt_unix.Server.respond_string ~headers ~status:`OK ~body ()
-  | `PUT, key when key <> "/" ->
+  | `PUT, path when path <> "/" ->
       let* body = Cohttp_lwt.Body.to_string body in
-      let* contents = with_xapi ~cache @@ VTPM.get_contents ~self:vtpm in
-      let contents =
-        contents |> deserialize |> update_key key body |> serialize
-      in
-      let* () = with_xapi ~cache @@ VTPM.set_contents ~self:vtpm ~contents in
+      let key = Tpm.key_of_swtpm path in
+      let* () = persist (vm_uuid, timestamp, key) body in
       Cohttp_lwt_unix.Server.respond ~status:`No_content
         ~body:Cohttp_lwt.Body.empty ()
-  | `DELETE, key when key <> "/" ->
-      let* contents = with_xapi ~cache @@ VTPM.get_contents ~self:vtpm in
+  | `DELETE, path when path <> "/" ->
+      let* self = with_xapi_vtpm ~cache vm_uuid in
+      let* contents = with_xapi ~cache @@ VTPM.get_contents ~self in
+      let key = Tpm.key_of_swtpm path in
       let contents =
-        contents |> deserialize |> update_key key empty |> serialize
+        Tpm.(contents |> deserialize |> update key empty_state |> serialize)
       in
-      let* () = with_xapi ~cache @@ VTPM.set_contents ~self:vtpm ~contents in
+      let* () = with_xapi ~cache @@ VTPM.set_contents ~self ~contents in
       Cohttp_lwt_unix.Server.respond ~status:`No_content
         ~body:Cohttp_lwt.Body.empty ()
   | _, _ ->
@@ -189,24 +173,33 @@ let serve_forever_lwt_callback_vtpm ~cache mutex vtpm _path _ req body =
       Cohttp_lwt_unix.Server.respond_string ~status:`Method_not_allowed ~body ()
 
 (* Create a restricted RPC function and socket for a specific VM *)
-let make_server_varstored ~cache path vm_uuid =
+let make_server_varstored _persist ~cache path vm_uuid =
+  let vm_uuid_str = Uuidm.to_string vm_uuid in
   let module Server =
     Xapi_idl_guard_varstored.Interface.RPC_API (Rpc_lwt.GenServer ()) in
-  let* vm = with_xapi ~cache @@ VM.get_by_uuid ~uuid:vm_uuid in
+  let get_vm_ref () = with_xapi ~cache @@ VM.get_by_uuid ~uuid:vm_uuid_str in
   let ret v =
     (* TODO: maybe map XAPI exceptions *)
     Lwt.bind v Lwt.return_ok |> Rpc_lwt.T.put
   in
-  let get_nvram _ _ = ret @@ with_xapi ~cache @@ VM.get_NVRAM ~self:vm in
+  let get_nvram _ _ =
+    (let* self = get_vm_ref () in
+     with_xapi ~cache @@ VM.get_NVRAM ~self
+    )
+    |> ret
+  in
   let set_nvram _ _ nvram =
-    ret @@ with_xapi ~cache @@ VM.set_NVRAM_EFI_variables ~self:vm ~value:nvram
+    (let* self = get_vm_ref () in
+     with_xapi ~cache @@ VM.set_NVRAM_EFI_variables ~self ~value:nvram
+    )
+    |> ret
   in
   let message_create _ _name priority _cls _uuid body =
     ret
       (let* (_ : _ Ref.t) =
          with_xapi ~cache
          @@ Message.create ~name:"VM_SECURE_BOOT_FAILED" ~priority ~cls:`VM
-              ~obj_uuid:vm_uuid ~body
+              ~obj_uuid:vm_uuid_str ~body
        in
        Lwt.return_unit
       )
@@ -223,21 +216,9 @@ let make_server_varstored ~cache path vm_uuid =
   serve_forever_lwt_callback (Rpc_lwt.server Server.implementation) path
   |> serve_forever_lwt path
 
-let make_server_vtpm_rest ~cache path vm_uuid =
-  let vtpm_server uuid =
-    let* vtpm = with_xapi ~cache @@ VTPM.get_by_uuid ~uuid in
+let make_server_vtpm_rest read_write ~cache path vm_uuid =
   let mutex = Lwt_mutex.create () in
-    serve_forever_lwt_callback_vtpm ~cache mutex vtpm path
-    |> serve_forever_lwt path
+  let callback =
+    serve_forever_lwt_callback_vtpm ~cache mutex read_write vm_uuid
   in
-  let* vm = with_xapi ~cache @@ Xen_api_lwt_unix.VM.get_by_uuid ~uuid:vm_uuid in
-  let* vTPMs = with_xapi ~cache @@ Xen_api_lwt_unix.VM.get_VTPMs ~self:vm in
-  match vTPMs with
-  | [] ->
-      D.warn
-        "%s: asked to start swtpm server in socket, but no vtpms associated!"
-        __FUNCTION__ ;
-      Lwt.return Lwt.return
-  | self :: _ ->
-      let* uuid = with_xapi ~cache @@ Xen_api_lwt_unix.VTPM.get_uuid ~self in
-      vtpm_server uuid
+  serve_forever_lwt path callback
diff --git a/ocaml/xapi-guard/lib/types.ml b/ocaml/xapi-guard/lib/types.ml
new file mode 100644
index 000000000..3f2b41c76
--- /dev/null
+++ b/ocaml/xapi-guard/lib/types.ml
@@ -0,0 +1,82 @@
+module Service = struct
+  type t = Varstored | Swtpm [@@deriving rpcty]
+
+  let to_string = function Varstored -> "Varstored" | Swtpm -> "Swtpm"
+end
+
+module Tpm = struct
+  (* The TPM has 3 kinds of states *)
+  type t = {
+      permall: string  (** permanent storage *)
+    ; savestate: string  (** for ACPI S3 *)
+    ; volatilestate: string  (** for snapshot/migration/etc. *)
+  }
+
+  type key = Perm | Save | Volatile
+
+  let key_of_swtpm = function
+    | "/tpm2-00.permall" ->
+        Perm
+    | "/tpm2-00.savestate" ->
+        Save
+    | "/tpm2-00.volatilestate" ->
+        Volatile
+    | s ->
+        Fmt.invalid_arg "Unknown TPM state key: %s" s
+
+  let serialize_key = function Perm -> 0 | Save -> 1 | Volatile -> 2
+
+  let deserialize_key = function
+    | 0 ->
+        Perm
+    | 1 ->
+        Save
+    | 2 ->
+        Volatile
+    | s ->
+        Fmt.invalid_arg "Unknown TPM state key: %i" s
+
+  let empty_state = ""
+
+  let empty = {permall= ""; savestate= ""; volatilestate= ""}
+
+  let split_char = ' '
+
+  let join_string = String.make 1 split_char
+
+  let deserialize t =
+    match String.split_on_char split_char t with
+    | [permall] ->
+        (* backwards compat with reading tpm2-00.permall *)
+        {permall; savestate= ""; volatilestate= ""}
+    | [permall; savestate; volatilestate] ->
+        {permall; savestate; volatilestate}
+    | splits ->
+        Fmt.failwith "Invalid state: too many splits %d" (List.length splits)
+
+  let serialize t =
+    (* it is assumed that swtpm has already base64 encoded this *)
+    String.concat join_string [t.permall; t.savestate; t.volatilestate]
+
+  let lookup ~key t =
+    match key with
+    | Perm ->
+        t.permall
+    | Save ->
+        t.savestate
+    | Volatile ->
+        t.volatilestate
+
+  let update key state t =
+    if String.contains state split_char then
+      Fmt.invalid_arg
+        "State to be stored (%d bytes) contained forbidden separator: %c"
+        (String.length state) split_char ;
+    match key with
+    | Perm ->
+        {t with permall= state}
+    | Save ->
+        {t with savestate= state}
+    | Volatile ->
+        {t with volatilestate= state}
+end
diff --git a/ocaml/xapi-guard/lib/types.mli b/ocaml/xapi-guard/lib/types.mli
new file mode 100644
index 000000000..f210ea8c9
--- /dev/null
+++ b/ocaml/xapi-guard/lib/types.mli
@@ -0,0 +1,36 @@
+module Service : sig
+  type t = Varstored | Swtpm
+
+  val typ_of : t Rpc.Types.typ
+
+  val to_string : t -> string
+end
+
+module Tpm : sig
+  (** TPMs have 3 kind of states *)
+  type t
+
+  (** key to access a single state *)
+  type key
+
+  val key_of_swtpm : string -> key
+  (** [key_of_swtpm path] returns a state key represented by [path]. These paths
+     are parts of the requests generated by SWTPM and may contain slashes *)
+
+  val deserialize_key : int -> key
+
+  val serialize_key : key -> int
+  (** [serialize key] returns the state key represented by [key]. *)
+
+  val empty : t
+
+  val empty_state : string
+
+  val deserialize : string -> t
+
+  val serialize : t -> string
+
+  val update : key -> string -> t -> t
+
+  val lookup : key:key -> t -> string
+end
diff --git a/ocaml/xapi-guard/src/dune b/ocaml/xapi-guard/src/dune
index dfbf8e9a9..baac1d241 100644
--- a/ocaml/xapi-guard/src/dune
+++ b/ocaml/xapi-guard/src/dune
@@ -1,5 +1,6 @@
 (executable
  (name main)
+ (modules main)
  (libraries
    astring
    cmdliner
diff --git a/ocaml/xapi-guard/src/main.ml b/ocaml/xapi-guard/src/main.ml
index 551a60372..9fb40aa03 100644
--- a/ocaml/xapi-guard/src/main.ml
+++ b/ocaml/xapi-guard/src/main.ml
@@ -13,18 +13,18 @@
  * GNU Lesser General Public License for more details.
  *)
 
-open Xapi_guard
 open Lwt.Syntax
 open Xapi_guard_server
+module Types = Xapi_guard.Types
 module SessionCache = Xen_api_lwt_unix.SessionCache
 
-module D = Debug.Make (struct let name = "varstored-guard" end)
+let ( let@ ) f x = f x
 
-let ret v = Lwt.bind v Lwt.return_ok |> Rpc_lwt.T.put
+let daemon_name = "xapi-guard"
 
-type ty = Varstored | Swtpm [@@deriving rpcty]
+module D = Debug.Make (struct let name = daemon_name end)
 
-let ty_to_string = function Varstored -> "Varstored" | Swtpm -> "Swtpm"
+let ret v = Lwt.bind v Lwt.return_ok |> Rpc_lwt.T.put
 
 let log_fds () =
   let count stream = Lwt_stream.fold (fun _ n -> n + 1) stream 0 in
@@ -41,7 +41,7 @@ module Persistent = struct
       vm_uuid: Xapi_idl_guard_privileged.Interface.Uuidm.t
     ; path: string
     ; gid: int
-    ; typ: ty
+    ; typ: Types.Service.t
   }
   [@@deriving rpcty]
 
@@ -99,7 +99,7 @@ let () =
       Xen_api_lwt_unix.SessionCache.destroy cache
   )
 
-let listen_for_vm {Persistent.vm_uuid; path; gid; typ} =
+let listen_for_vm read_write {Persistent.vm_uuid; path; gid; typ} =
   let make_server =
     match typ with
     | Varstored ->
@@ -109,21 +109,28 @@ let listen_for_vm {Persistent.vm_uuid; path; gid; typ} =
   in
   let vm_uuid_str = Uuidm.to_string vm_uuid in
   D.debug "%s: listening for %s on socket %s for VM %s" __FUNCTION__
-    (ty_to_string typ) path vm_uuid_str ;
+    (Types.Service.to_string typ)
+    path vm_uuid_str ;
   let* () = safe_unlink path in
-  let* stop_server = make_server ~cache path vm_uuid_str in
+  let* stop_server = make_server read_write ~cache path vm_uuid in
   let* () = log_fds () in
   Hashtbl.add sockets path (stop_server, (vm_uuid, gid, typ)) ;
   let* () = Lwt_unix.chmod path 0o660 in
   Lwt_unix.chown path 0 gid
 
-let resume () =
+let resume ~vtpm_read_write ~uefi_read_write () =
   let* vms = Persistent.loadfrom recover_path in
-  let+ () = Lwt_list.iter_p listen_for_vm vms in
+  let listen_to_vm = function
+    | Persistent.{typ= Varstored; _} as vm ->
+        listen_for_vm uefi_read_write vm
+    | Persistent.{typ= Swtpm; _} as vm ->
+        listen_for_vm vtpm_read_write vm
+  in
+  let+ () = Lwt_list.iter_p listen_to_vm vms in
   D.debug "%s: completed" __FUNCTION__
 
 (* caller here is trusted (xenopsd through message-switch) *)
-let depriv_varstored_create dbg vm_uuid gid path =
+let depriv_varstored_create write_push dbg vm_uuid gid path =
   if Hashtbl.mem sockets path then
     Lwt.return_error
       (Xapi_idl_guard_privileged.Interface.InternalError
@@ -135,7 +142,9 @@ let depriv_varstored_create dbg vm_uuid gid path =
     @@
     ( D.debug "[%s] creating deprivileged socket at %s, owned by group %d" dbg
         path gid ;
-      let* () = listen_for_vm {Persistent.path; vm_uuid; gid; typ= Varstored} in
+      let* () =
+        listen_for_vm write_push {Persistent.path; vm_uuid; gid; typ= Varstored}
+      in
       store_args sockets
     )
 
@@ -157,7 +166,7 @@ let depriv_varstored_destroy dbg gid path =
       D.debug "[%s] stopped server for gid %d and removed socket" dbg gid ;
       Lwt.return_unit
 
-let depriv_swtpm_create dbg vm_uuid gid path =
+let depriv_swtpm_create read_write dbg vm_uuid gid path =
   if Hashtbl.mem sockets path then
     Lwt.return_error
       (Xapi_idl_guard_privileged.Interface.InternalError
@@ -169,7 +178,9 @@ let depriv_swtpm_create dbg vm_uuid gid path =
     @@
     ( D.debug "[%s] creating deprivileged socket at %s, owned by group %d" dbg
         path gid ;
-      let* () = listen_for_vm {Persistent.path; vm_uuid; gid; typ= Swtpm} in
+      let* () =
+        listen_for_vm read_write {Persistent.path; vm_uuid; gid; typ= Swtpm}
+      in
       store_args sockets
     )
 
@@ -198,6 +209,9 @@ let depriv_swtpm_destroy dbg gid path =
       Lwt.return_unit
 
 (* TODO: these 2 APIs need to be updated to go through the generic interface *)
+(* These 2 functions are only reachable from message-switch. They are part of
+   the control plane and be called when xapi controls the lifecycle of a VM, so
+   it's OK to assume it's available. *)
 
 let vtpm_set_contents dbg vtpm_uuid contents =
   let open Xen_api_lwt_unix in
@@ -216,33 +230,64 @@ let vtpm_get_contents _dbg vtpm_uuid =
   @@ let* self = Server_interface.with_xapi ~cache @@ VTPM.get_by_uuid ~uuid in
      Server_interface.with_xapi ~cache @@ VTPM.get_contents ~self
 
-let rpc_fn =
+let rpc_fn ~vtpm_read_write ~uefi_read_write =
   let module Server =
     Xapi_idl_guard_privileged.Interface.RPC_API (Rpc_lwt.GenServer ()) in
   (* bind APIs *)
-  Server.varstore_create depriv_varstored_create ;
+  Server.varstore_create (depriv_varstored_create uefi_read_write) ;
   Server.varstore_destroy depriv_varstored_destroy ;
-  Server.vtpm_create depriv_swtpm_create ;
+  Server.vtpm_create (depriv_swtpm_create vtpm_read_write) ;
   Server.vtpm_destroy depriv_swtpm_destroy ;
   Server.vtpm_set_contents vtpm_set_contents ;
   Server.vtpm_get_contents vtpm_get_contents ;
   Rpc_lwt.server Server.implementation
 
-let process body =
+let process ~vtpm_read_write ~uefi_read_write body =
   let+ response =
-    Dorpc.wrap_rpc Xapi_idl_guard_privileged.Interface.E.error (fun () ->
+    Xapi_guard.Dorpc.wrap_rpc Xapi_idl_guard_privileged.Interface.E.error
+      (fun () ->
         let call = Jsonrpc.call_of_string body in
         D.debug "Received request from message-switch, method %s" call.Rpc.name ;
-        rpc_fn call
+        rpc_fn ~vtpm_read_write ~uefi_read_write call
     )
   in
   Jsonrpc.string_of_response response
 
+let retry_forever fname f =
+  let rec loop () =
+    let* () =
+      Lwt.catch f (function exn ->
+          D.info "%s failed with %s, retrying..." fname (Printexc.to_string exn) ;
+          Lwt_unix.sleep 0.5
+          )
+    in
+    (loop [@tailcall]) ()
+  in
+  loop ()
+
+let cache_reader with_watcher = retry_forever "cache watcher" with_watcher
+
 let make_message_switch_server () =
+  let* with_swtpm_cache, with_watch =
+    Xapi_guard.Disk_cache.(
+      setup Swtpm
+        Server_interface.(read_vtpm ~cache)
+        Server_interface.(push_vtpm ~cache)
+    )
+  in
   let open Message_switch_lwt.Protocol_lwt in
   let wait_server, server_stopped = Lwt.task () in
+  let@ vtpm_read_write = with_swtpm_cache in
+  let uefi_read_write =
+    (* This is unused for the time being, added to be consistent with both
+       interfaces *)
+    ((fun _ -> Lwt.return ""), fun _ _ -> Lwt.return_unit)
+  in
+  let server =
     let* result =
-    Server.listen ~process ~switch:!Xcp_client.switch_path
+      Server.listen
+        ~process:(process ~vtpm_read_write ~uefi_read_write)
+        ~switch:!Xcp_client.switch_path
         ~queue:Xapi_idl_guard_privileged.Interface.queue_name ()
     in
     match Server.error_to_msg result with
@@ -254,7 +299,7 @@ let make_message_switch_server () =
         ) ;
         (* best effort resume *)
         let* () =
-        Lwt.catch resume (fun e ->
+          Lwt.catch (resume ~vtpm_read_write ~uefi_read_write) (fun e ->
               D.log_backtrace () ;
               D.warn "Resume failed: %s" (Printexc.to_string e) ;
               Lwt.return_unit
@@ -264,6 +309,10 @@ let make_message_switch_server () =
     | Error (`Msg m) ->
         Lwt.fail_with
           (Printf.sprintf "Failed to listen on message-switch queue: %s" m)
+  in
+  let reader = cache_reader with_watch in
+  let* _ = Lwt.all [server; reader] in
+  Lwt.return_unit
 
 let main log_level =
   Debug.set_level log_level ;
@@ -278,12 +327,12 @@ let main log_level =
        old_hook exn
   ) ;
   let () = Lwt_main.run @@ make_message_switch_server () in
-  D.debug "Exiting varstored-guard"
+  D.debug "Exiting %s" daemon_name
 
 open! Cmdliner
 
 let cmd =
-  let info = Cmd.info "varstored-guard" in
+  let info = Cmd.info daemon_name in
   let log_level =
     let doc = "Syslog level. E.g. debug, info etc." in
     let level_conv =
diff --git a/ocaml/xapi-guard/test/cache_test.ml b/ocaml/xapi-guard/test/cache_test.ml
new file mode 100644
index 000000000..97b144839
--- /dev/null
+++ b/ocaml/xapi-guard/test/cache_test.ml
@@ -0,0 +1,204 @@
+let ( let@ ) f x = f x
+
+let ( let* ) = Lwt.bind
+
+module Tpm = Xapi_guard.Types.Tpm
+
+module TPMs = struct
+  let writes_created = Atomic.make 1
+
+  let reads_created = Atomic.make 1
+
+  let request_persist uuid write =
+    let __FUN = __FUNCTION__ in
+
+    let key = Tpm.deserialize_key (Random.int 3) in
+
+    let time = Mtime_clock.now () in
+    let serial_n = Atomic.fetch_and_add writes_created 1 in
+    let contents =
+      Printf.sprintf "contents %s" (Mtime.to_uint64_ns time |> Int64.to_string)
+    in
+    let* () =
+      Logs_lwt.app (fun m ->
+          m "%s: Write № %i requested: %a/%i/%a" __FUN serial_n Uuidm.pp uuid
+            Tpm.(serialize_key key)
+            Mtime.pp time
+      )
+    in
+    write (uuid, time, key) contents
+
+  let request_read uuid read =
+    let __FUN = __FUNCTION__ in
+
+    let key = Tpm.deserialize_key (Random.int 3) in
+
+    let time = Mtime_clock.now () in
+    let serial_n = Atomic.fetch_and_add reads_created 1 in
+    let* () =
+      Logs_lwt.app (fun m ->
+          m "%s: Read № %i requested: %a/%i/%a" __FUN serial_n Uuidm.pp uuid
+            Tpm.(serialize_key key)
+            Mtime.pp time
+      )
+    in
+    let* () = Lwt_unix.sleep 0.05 in
+    read (uuid, time, key)
+end
+
+let lwt_reporter () =
+  let buf_fmt ~like =
+    let b = Buffer.create 512 in
+    ( Fmt.with_buffer ~like b
+    , fun () ->
+        let m = Buffer.contents b in
+        Buffer.reset b ; m
+    )
+  in
+  let app, app_flush = buf_fmt ~like:Fmt.stdout in
+  let dst, dst_flush = buf_fmt ~like:Fmt.stderr in
+  let reporter = Logs_fmt.reporter ~app ~dst () in
+  let report src level ~over k msgf =
+    let k () =
+      let write () =
+        match level with
+        | Logs.App ->
+            Lwt_io.write Lwt_io.stdout (app_flush ())
+        | _ ->
+            Lwt_io.write Lwt_io.stderr (dst_flush ())
+      in
+      let unblock () = over () ; Lwt.return_unit in
+      Lwt.finalize write unblock |> Lwt.ignore_result ;
+      k ()
+    in
+    reporter.Logs.report src level ~over:(fun () -> ()) k msgf
+  in
+  {Logs.report}
+
+let setup_log level =
+  Logs.set_level level ;
+  Logs.set_reporter (lwt_reporter ()) ;
+  ()
+
+let ok = Lwt_result.ok
+
+let retry_forever fname f =
+  let rec loop () =
+    let* () =
+      Lwt.catch f (function exn ->
+          let* () =
+            Logs_lwt.app (fun m ->
+                m "%s failed with %s, retrying..." fname (Printexc.to_string exn)
+            )
+          in
+          Lwt_unix.sleep 0.5
+          )
+    in
+    (loop [@tailcall]) ()
+  in
+  loop ()
+
+let max_writes = 128
+
+let max_reads = 500_000
+
+let received_writes = ref 0
+
+let received_reads = ref 0
+
+let throttled_reads = Mtime.Span.(200 * ms)
+
+let failing_writes_period = Mtime.Span.(500 * ms)
+
+let epoch = Mtime_clock.now ()
+
+let should_fail () : bool =
+  let rec polarity elapsed =
+    if Mtime.Span.compare elapsed failing_writes_period < 0 then
+      true
+    else
+      not (polarity Mtime.Span.(abs_diff elapsed failing_writes_period))
+  in
+  let elapsed = Mtime.span epoch (Mtime_clock.now ()) in
+  polarity elapsed
+
+let log_write (uuid, timestamp, key) content =
+  let __FUN = __FUNCTION__ in
+  let ( let* ) = Lwt_result.bind in
+  let maybe_fail () =
+    if should_fail () then
+      Lwt_result.fail
+        (failwith (Printf.sprintf {|oops, could not write '%s'|} content))
+    else
+      Lwt_result.return ()
+  in
+  let* () = maybe_fail () in
+  received_writes := !received_writes + 1 ;
+  Logs_lwt.app (fun m ->
+      m "%s Write № %i detected: %a/%i/%a" __FUN !received_writes Uuidm.pp uuid
+        Tpm.(serialize_key key)
+        Mtime.pp timestamp
+  )
+  |> ok
+
+let log_read (uuid, timestamp, key) =
+  let __FUN = __FUNCTION__ in
+  let ( let* ) = Lwt_result.bind in
+  received_reads := !received_reads + 1 ;
+  let* () =
+    Logs_lwt.app (fun m ->
+        m "%s Read to source № %i detected: %a/%i/%a" __FUN !received_reads
+          Uuidm.pp uuid
+          Tpm.(serialize_key key)
+          Mtime.pp timestamp
+    )
+    |> ok
+  in
+  Lwt_result.return "yes"
+
+let to_cache with_read_writes =
+  let __FUN = __FUNCTION__ in
+  let elapsed = Mtime_clock.counter () in
+  let persist uuid (_, write_tpm) = TPMs.request_persist uuid write_tpm in
+  let read uuid (read_tpm, _) =
+    let* contents = TPMs.request_read uuid read_tpm in
+    Logs_lwt.app (fun m -> m "%s Read received: '%s'" __FUN contents)
+  in
+  let rec loop_and_stop f name uuid max sent =
+    let sent = sent + 1 in
+    let@ read_write = with_read_writes in
+    let* () = f uuid read_write in
+    if sent >= max then
+      Logs_lwt.app (fun m ->
+          m "%s: Stopping requests after %i %ss" __FUN sent name
+      )
+    else if Mtime.Span.compare (Mtime_clock.count elapsed) throttled_reads > 0
+    then
+      let* () = Lwt_unix.sleep 0.1 in
+      loop_and_stop f name uuid max sent
+    else
+      let* () = Lwt.pause () in
+      loop_and_stop f name uuid max sent
+  in
+  let vms = List.init 4 (fun _ -> Uuidm.(v `V4)) in
+
+  List.concat
+    [
+      List.map (fun uuid -> loop_and_stop persist "write" uuid max_writes 0) vms
+    ; List.map (fun uuid -> loop_and_stop read "read" uuid max_reads 0) vms
+    ]
+
+let from_cache with_watcher = retry_forever "watcher" with_watcher
+
+let main () =
+  let* with_read_writes, with_watcher =
+    Xapi_guard.Disk_cache.(setup Swtpm log_read log_write)
+  in
+  let reader = from_cache with_watcher in
+  let writers = to_cache with_read_writes in
+  let* _ = Lwt.all (reader :: writers) in
+  Lwt.return_unit
+
+let () =
+  setup_log @@ Some Logs.Debug ;
+  Lwt_main.run (main ())
diff --git a/ocaml/xapi-guard/test/cache_test.mli b/ocaml/xapi-guard/test/cache_test.mli
new file mode 100644
index 000000000..e69de29bb
diff --git a/ocaml/xapi-guard/test/dune b/ocaml/xapi-guard/test/dune
index 934256a9f..e082a47a6 100644
--- a/ocaml/xapi-guard/test/dune
+++ b/ocaml/xapi-guard/test/dune
@@ -1,6 +1,7 @@
 (test
  (name xapi_guard_test)
  (modes exe)
+ (modules (:standard \ cache_test))
  (libraries
   alcotest
   alcotest-lwt
@@ -17,3 +18,18 @@
   xen-api-client-lwt)
  (package varstored-guard)
  )
+
+(executable
+ (name cache_test)
+ (modules cache_test)
+ (libraries
+   logs
+   logs.fmt
+   logs.lwt
+   lwt
+   lwt.unix
+   mtime
+   mtime.clock.os
+   uuidm
+   xapi_guard)
+ (preprocess (pps ppx_deriving_rpc)))
diff --git a/ocaml/xapi-guard/test/xapi_guard_test.ml b/ocaml/xapi-guard/test/xapi_guard_test.ml
index 41ce8f6e3..86efb713d 100644
--- a/ocaml/xapi-guard/test/xapi_guard_test.ml
+++ b/ocaml/xapi-guard/test/xapi_guard_test.ml
@@ -60,7 +60,9 @@ let xapi_rpc call =
   | _ ->
       Fmt.failwith "XAPI RPC call %s not expected in test" call.Rpc.name
 
-let vm_uuid = Uuidx.(to_string (make ()))
+let vm_uuid = Uuidm.v `V4
+
+let vm_uuid_str = Uuidm.to_string vm_uuid
 
 let () =
   let old_hook = !Lwt.async_exception_hook in
@@ -78,9 +80,10 @@ let with_rpc f switch () =
   in
   (Lwt_switch.add_hook (Some switch) @@ fun () -> SessionCache.destroy cache) ;
   let path = Filename.concat tmp "socket" in
+  let push_nothing _ = Lwt_result.return () in
   (* Create an internal server on 'path', the socket that varstored would connect to *)
   let* stop_server =
-    Server_interface.make_server_varstored ~cache path vm_uuid
+    Server_interface.make_server_varstored push_nothing ~cache path vm_uuid
   in
   (* rpc simulates what varstored would do *)
   let uri = Uri.make ~scheme:"file" ~path () |> Uri.to_string in
@@ -101,7 +104,7 @@ let with_rpc f switch () =
 let dict = Alcotest.(list @@ pair string string)
 
 let test_change_nvram ~rpc ~session_id () =
-  let* self = VM.get_by_uuid ~rpc ~session_id ~uuid:vm_uuid in
+  let* self = VM.get_by_uuid ~rpc ~session_id ~uuid:vm_uuid_str in
   let* nvram0 = VM.get_NVRAM ~rpc ~session_id ~self in
   Alcotest.(check' dict) ~msg:"nvram initial" ~expected:[] ~actual:nvram0 ;
   let contents = "nvramnew" in
@@ -131,7 +134,7 @@ let test_bad_set_nvram ~rpc ~session_id () =
   let* () =
     VM.set_NVRAM_EFI_variables ~rpc ~session_id ~self:vm_bad ~value:"bad"
   in
-  let* vm_ref = VM.get_by_uuid ~rpc ~session_id ~uuid:vm_uuid in
+  let* vm_ref = VM.get_by_uuid ~rpc ~session_id ~uuid:vm_uuid_str in
   let* nvram = VM.get_NVRAM ~rpc ~session_id ~self:vm_ref in
   Alcotest.(check' dict)
     ~msg:"only managed to change own nvram" ~actual:nvram
diff --git a/ocaml/xapi-idl/lib/debuginfo.ml b/ocaml/xapi-idl/lib/debuginfo.ml
index c29a74c14..599537ff5 100644
--- a/ocaml/xapi-idl/lib/debuginfo.ml
+++ b/ocaml/xapi-idl/lib/debuginfo.ml
@@ -66,10 +66,9 @@ let with_dbg ?(with_thread = false) ~module_name ~name ~dbg f =
   | false ->
       f_with_trace ()
 
-let span_context_of_di di =
-  Option.map (fun span -> Tracing.Span.get_context span) di.tracing
-
 let traceparent_of_dbg dbg =
-  of_string dbg
-  |> span_context_of_di
-  |> Option.map Tracing.SpanContext.trace_id_of_span_context
+  match String.split_on_char separator dbg with
+  | [_; traceparent] ->
+      Some traceparent
+  | _ ->
+      None
diff --git a/ocaml/xapi-idl/xen/xenops_interface.ml b/ocaml/xapi-idl/xen/xenops_interface.ml
index ab240a53c..083c345f1 100644
--- a/ocaml/xapi-idl/xen/xenops_interface.ml
+++ b/ocaml/xapi-idl/xen/xenops_interface.ml
@@ -461,6 +461,7 @@ module Host = struct
   type cpu_info = {
       cpu_count: int
     ; socket_count: int
+    ; threads_per_core: int
     ; vendor: string
     ; speed: string
     ; modelname: string
diff --git a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/plugin.py b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/plugin.py
index 40e3a0091..08fb78407 100755
--- a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/plugin.py
+++ b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/plugin.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python2
 
 """
  Copyright (C) Citrix Systems, Inc.
diff --git a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/sr.py b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/sr.py
index 82c77d891..3cd7a211c 100755
--- a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/sr.py
+++ b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/sr.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python2
 
 """
  Copyright (C) Citrix Systems, Inc.
@@ -6,7 +6,7 @@
 
 import os
 import sys
-import urllib.parse
+import urlparse
 import xapi.storage.api.volume
 
 import plugin
@@ -21,11 +21,11 @@ class Implementation(xapi.storage.api.volume.SR_skeleton):
         return
 
     def detach(self, dbg, sr):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         return
 
     def ls(self, dbg, sr):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         qr = plugin.Implementation().query(dbg)
         return [{
             "name": qr['name'],
@@ -40,7 +40,7 @@ class Implementation(xapi.storage.api.volume.SR_skeleton):
             }]
 
     def stat(self, dbg, sr):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         qr = plugin.Implementation().query(dbg)
         return {
             "sr": sr,
diff --git a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/volume.py b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/volume.py
index 848c13bfd..448ee6dcb 100755
--- a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/volume.py
+++ b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummy/volume.py
@@ -1,11 +1,11 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python2
 
 """
  Copyright (C) Citrix Systems, Inc.
 """
 
 import uuid
-import urllib.parse
+import urlparse
 import os
 import sys
 import xapi.storage.api.volume
@@ -17,7 +17,7 @@ import plugin
 class Implementation(xapi.storage.api.volume.Volume_skeleton):
 
     def create(self, dbg, sr, name, description, size):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         voluuid = str(uuid.uuid4())
         return {
             "name": name,
@@ -32,11 +32,11 @@ class Implementation(xapi.storage.api.volume.Volume_skeleton):
         }
 
     def destroy(self, dbg, sr, key):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         return
 
     def stat(self, dbg, sr, key):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         qr = plugin.Implementation().query(dbg)
         return {
                 "name": qr['name'],
diff --git a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/plugin.py b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/plugin.py
index e9ef122ca..5816f0dd2 100755
--- a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/plugin.py
+++ b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/plugin.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python2
 
 """
  Copyright (C) Citrix Systems, Inc.
diff --git a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/sr.py b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/sr.py
index 3c649423d..6100407e9 100755
--- a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/sr.py
+++ b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/sr.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python2
 
 """
  Copyright (C) Citrix Systems, Inc.
@@ -6,7 +6,7 @@
 
 import os
 import sys
-import urllib.parse
+import urlparse
 import xapi.storage.api.v5.volume
 
 import plugin
@@ -22,11 +22,11 @@ class Implementation(xapi.storage.api.v5.volume.SR_skeleton):
         return configuration
 
     def detach(self, dbg, sr):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         return
 
     def ls(self, dbg, sr):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         qr = plugin.Implementation().query(dbg)
         return [{
             "name": qr['name'],
@@ -42,7 +42,7 @@ class Implementation(xapi.storage.api.v5.volume.SR_skeleton):
             }]
 
     def stat(self, dbg, sr):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         qr = plugin.Implementation().query(dbg)
         return {
             "sr": sr,
diff --git a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/volume.py b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/volume.py
index fcf52ce38..20822dd8d 100755
--- a/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/volume.py
+++ b/ocaml/xapi-storage-script/test/volume/org.xen.xapi.storage.dummyv5/volume.py
@@ -1,11 +1,11 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python2
 
 """
  Copyright (C) Citrix Systems, Inc.
 """
 
 import uuid
-import urllib.parse
+import urlparse
 import os
 import sys
 import xapi.storage.api.v5.volume
@@ -17,7 +17,7 @@ import plugin
 class Implementation(xapi.storage.api.v5.volume.Volume_skeleton):
 
     def create(self, dbg, sr, name, description, size, sharable):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         voluuid = str(uuid.uuid4())
         return {
             "name": name,
@@ -33,11 +33,11 @@ class Implementation(xapi.storage.api.v5.volume.Volume_skeleton):
         }
 
     def destroy(self, dbg, sr, key):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         return
 
     def stat(self, dbg, sr, key):
-        urllib.parse.urlparse(sr)
+        urlparse.urlparse(sr)
         qr = plugin.Implementation().query(dbg)
         return {
                 "name": qr['name'],
diff --git a/ocaml/xapi-storage/python/xapi/__init__.py b/ocaml/xapi-storage/python/xapi/__init__.py
index 1f8b178af..0027af213 100644
--- a/ocaml/xapi-storage/python/xapi/__init__.py
+++ b/ocaml/xapi-storage/python/xapi/__init__.py
@@ -31,13 +31,6 @@ import traceback
 import json
 import argparse
 
-# pylint: disable=invalid-name,redefined-builtin,undefined-variable
-# pyright: reportUndefinedVariable=false
-if sys.version_info[0] > 2:
-    long = int
-    unicode = str
-    str = bytes
-
 
 def success(result):
     return {"Status": "Success", "Value": result}
diff --git a/ocaml/xapi-storage/python/xapi/storage/api/datapath.py b/ocaml/xapi-storage/python/xapi/storage/api/datapath.py
index 78981f46c..1d5b43b0d 100644
--- a/ocaml/xapi-storage/python/xapi/storage/api/datapath.py
+++ b/ocaml/xapi-storage/python/xapi/storage/api/datapath.py
@@ -6,14 +6,6 @@ import json
 import argparse
 import traceback
 import logging
-
-# pylint: disable=invalid-name,redefined-builtin,undefined-variable
-# pyright: reportUndefinedVariable=false
-if sys.version_info[0] > 2:
-    long = int
-    unicode = str
-    str = bytes
-
 class Unimplemented(Rpc_light_failure):
     def __init__(self, arg_0):
         Rpc_light_failure.__init__(self, "Unimplemented", [ arg_0 ])
diff --git a/ocaml/xapi-storage/python/xapi/storage/api/plugin.py b/ocaml/xapi-storage/python/xapi/storage/api/plugin.py
index 1032725c7..0185d9001 100644
--- a/ocaml/xapi-storage/python/xapi/storage/api/plugin.py
+++ b/ocaml/xapi-storage/python/xapi/storage/api/plugin.py
@@ -6,14 +6,6 @@ import json
 import argparse
 import traceback
 import logging
-
-# pylint: disable=invalid-name,redefined-builtin,undefined-variable
-# pyright: reportUndefinedVariable=false
-if sys.version_info[0] > 2:
-    long = int
-    unicode = str
-    str = bytes
-
 class Unimplemented(Rpc_light_failure):
     def __init__(self, arg_0):
         Rpc_light_failure.__init__(self, "Unimplemented", [ arg_0 ])
diff --git a/ocaml/xapi/context.ml b/ocaml/xapi/context.ml
index 33fb5b0db..4179cf7d9 100644
--- a/ocaml/xapi/context.ml
+++ b/ocaml/xapi/context.ml
@@ -50,17 +50,8 @@ type t = {
   ; mutable test_clusterd_rpc: (Rpc.call -> Rpc.response) option
 }
 
-let complete_tracing __context =
-  ( match Tracing.Tracer.finish __context.tracing with
-  | Ok _ ->
-      ()
-  | Error e ->
-      R.warn "Failed to complete tracing: %s" (Printexc.to_string e)
-  ) ;
-  __context.tracing <- None
-
-let complete_tracing_with_exn __context error =
-  ( match Tracing.Tracer.finish ~error __context.tracing with
+let complete_tracing ?error __context =
+  ( match Tracing.Tracer.finish ?error __context.tracing with
   | Ok _ ->
       ()
   | Error e ->
@@ -187,6 +178,9 @@ let destroy __context =
   if not __context.forwarded_task then
     !__destroy_task ~__context __context.task_id
 
+let hash_of_session_id session_id =
+  session_id |> Ref.string_of |> Digest.string |> Digest.to_hex
+
 (* CP-982: create tracking id in log files to link username to actions *)
 let trackid_of_session ?(with_brackets = false) ?(prefix = "") session_id =
   match session_id with
@@ -195,8 +189,7 @@ let trackid_of_session ?(with_brackets = false) ?(prefix = "") session_id =
   | Some session_id ->
       (* a hash is used instead of printing the sensitive session_id value *)
       let trackid =
-        Printf.sprintf "trackid=%s"
-          (Digest.to_hex (Digest.string (Ref.string_of session_id)))
+        Printf.sprintf "trackid=%s" (hash_of_session_id session_id)
       in
       if with_brackets then Printf.sprintf "%s(%s)" prefix trackid else trackid
 
@@ -234,7 +227,36 @@ let parent_of_origin (origin : origin) span_name =
   | _ ->
       None
 
-let start_tracing_helper parent_fn task_name =
+let make_attributes ?task_name ?task_id ?task_uuid ?session_id ?origin () =
+  let attribute_helper_fn f v = Option.fold ~none:[] ~some:f v in
+  [
+    attribute_helper_fn
+      (fun task_name -> [("xs.xapi.task.name", task_name)])
+      task_name
+  ; attribute_helper_fn
+      (fun task_id -> [("xs.xapi.task.id", Ref.really_pretty_and_small task_id)])
+      task_id
+  ; attribute_helper_fn
+      (fun task_uuid -> [("xs.xapi.task.uuid", Uuidx.to_string task_uuid)])
+      task_uuid
+  ; attribute_helper_fn
+      (fun session_id ->
+        [("xs.xapi.session.track.id", hash_of_session_id session_id)]
+      )
+      session_id
+  ; attribute_helper_fn
+      (fun origin ->
+        match origin with
+        | Internal ->
+            [("xs.xapi.task.origin", "internal")]
+        | Http _ ->
+            [("xs.xapi.task.origin", "http")]
+      )
+      origin
+  ]
+  |> List.concat
+
+let start_tracing_helper ?(span_attributes = []) parent_fn task_name =
   let open Tracing in
   let span_details_from_task_name task_name =
     let uuid_length = 36 in
@@ -242,9 +264,11 @@ let start_tracing_helper parent_fn task_name =
     let open String in
     if starts_with ~prefix:dispatch_system_is_alive task_name then
       let uuid = sub task_name (length dispatch_system_is_alive) uuid_length in
-      ("dispatch:system.isAlive", [("xs.span.arg.vm.uuid", uuid)])
+      ( "dispatch:system.isAlive"
+      , ("xs.span.arg.vm.uuid", uuid) :: span_attributes
+      )
     else
-      (task_name, [])
+      (task_name, span_attributes)
   in
   let span_name, span_attributes = span_details_from_task_name task_name in
   let parent = parent_fn span_name in
@@ -276,7 +300,12 @@ let from_forwarded_task ?(http_other_config = []) ?session_id
   let dbg = make_dbg http_other_config task_name task_id in
   info "task %s forwarded%s" dbg
     (trackid_of_session ~with_brackets:true ~prefix:" " session_id) ;
-  let tracing = start_tracing_helper (parent_of_origin origin) task_name in
+  let span_attributes =
+    make_attributes ~task_id ~task_name ?session_id ~origin ()
+  in
+  let tracing =
+    start_tracing_helper ~span_attributes (parent_of_origin origin) task_name
+  in
   {
     session_id
   ; task_id
@@ -323,7 +352,12 @@ let make ?(http_other_config = []) ?(quiet = false) ?subtask_of ?session_id
             " by task " ^ make_dbg [] "" subtask_of
         )
   ) ;
-  let tracing = start_tracing_helper (parent_of_origin origin) task_name in
+  let span_attributes =
+    make_attributes ~task_id ~task_name ~origin ?session_id ~task_uuid ()
+  in
+  let tracing =
+    start_tracing_helper ~span_attributes (parent_of_origin origin) task_name
+  in
   {
     session_id
   ; database
@@ -344,7 +378,8 @@ let make_subcontext ~__context ?task_in_database task_name =
   let tracing =
     Option.bind __context.tracing (fun parent ->
         let parent = Some parent in
-        start_tracing_helper (fun _ -> parent) task_name
+        let span_attributes = make_attributes ?session_id () in
+        start_tracing_helper ~span_attributes (fun _ -> parent) task_name
     )
   in
   {subcontext with client= __context.client; tracing}
diff --git a/ocaml/xapi/context.mli b/ocaml/xapi/context.mli
index 3f2b2c5dc..7b2ece18c 100644
--- a/ocaml/xapi/context.mli
+++ b/ocaml/xapi/context.mli
@@ -142,9 +142,7 @@ val get_client_ip : t -> string option
 
 val get_user_agent : t -> string option
 
-val complete_tracing : t -> unit
-
-val complete_tracing_with_exn : t -> exn * string -> unit
+val complete_tracing : ?error:exn * string -> t -> unit
 
 val tracing_of : t -> Tracing.Span.t option
 
diff --git a/ocaml/xapi/cpuid_helpers.ml b/ocaml/xapi/cpuid_helpers.ml
index db346fcef..571a7f073 100644
--- a/ocaml/xapi/cpuid_helpers.ml
+++ b/ocaml/xapi/cpuid_helpers.ml
@@ -43,6 +43,8 @@ let cpu_count = Map_check.(field "cpu_count" int)
 
 let socket_count = Map_check.(field "socket_count" int)
 
+let threads_per_core = Map_check.(field "threads_per_core" int)
+
 let vendor = Map_check.(field "vendor" string)
 
 let get_flags_for_vm ~__context vm cpu_info =
diff --git a/ocaml/xapi/cpuid_helpers.mli b/ocaml/xapi/cpuid_helpers.mli
index ee1fbeb0d..4d5f091d7 100644
--- a/ocaml/xapi/cpuid_helpers.mli
+++ b/ocaml/xapi/cpuid_helpers.mli
@@ -28,6 +28,8 @@ val cpu_count : int Map_check.field
 
 val socket_count : int Map_check.field
 
+val threads_per_core : int Map_check.field
+
 val features : [`vm] Xenops_interface.CPU_policy.t Map_check.field
 
 val features_pv : [`host] Xenops_interface.CPU_policy.t Map_check.field
diff --git a/ocaml/xapi/create_misc.ml b/ocaml/xapi/create_misc.ml
index a0d14ae94..7a2630ea5 100644
--- a/ocaml/xapi/create_misc.ml
+++ b/ocaml/xapi/create_misc.ml
@@ -567,6 +567,7 @@ let create_host_cpu ~__context host_info =
         [
           ("cpu_count", string_of_int cpu_info.cpu_count)
         ; ("socket_count", string_of_int cpu_info.socket_count)
+        ; ("threads_per_core", string_of_int cpu_info.threads_per_core)
         ; ("vendor", cpu_info.vendor)
         ; ("speed", cpu_info.speed)
         ; ("modelname", cpu_info.modelname)
@@ -592,10 +593,11 @@ let create_host_cpu ~__context host_info =
       let old_cpu_info = Db.Host.get_cpu_info ~__context ~self:host in
       debug
         "create_host_cpuinfo: setting host cpuinfo: socket_count=%d, \
-         cpu_count=%d, features_hvm=%s, features_pv=%s, features_hvm_host=%s, \
-         features_pv_host=%s"
+         cpu_count=%d, threads_per_core=%d, features_hvm=%s, features_pv=%s, \
+         features_hvm_host=%s, features_pv_host=%s"
         (Map_check.getf socket_count cpu)
         (Map_check.getf cpu_count cpu)
+        (Map_check.getf threads_per_core cpu)
         (Map_check.getf features_hvm cpu |> CPU_policy.to_string)
         (Map_check.getf features_pv cpu |> CPU_policy.to_string)
         (Map_check.getf features_hvm_host cpu |> CPU_policy.to_string)
diff --git a/ocaml/xapi/sm_exec.ml b/ocaml/xapi/sm_exec.ml
index 0cec2e885..a6d0d231e 100644
--- a/ocaml/xapi/sm_exec.ml
+++ b/ocaml/xapi/sm_exec.ml
@@ -336,20 +336,19 @@ let exec_xmlrpc ~dbg ?context:_ ?(needs_session = true) (driver : string)
           (* Logging call.cmd is safe, but call.args could contain a password. *)
           try
             E.debug "smapiv2=>smapiv1 [label=\"%s\"];" call.cmd ;
+            let args = [Xml.to_string xml] in
             let output, stderr =
-              let env =
+              let env, exe, args =
                 match Xapi_observer_components.is_smapi_enabled () with
                 | false ->
-                    None
+                    (None, exe, args)
                 | true ->
                     let traceparent = Debuginfo.traceparent_of_dbg dbg in
-                    Some
-                      (Xapi_observer_components.env_vars_of_component
+                    Xapi_observer_components.env_exe_args_of
                       ~component:Xapi_observer_components.SMApi ~traceparent
-                      )
+                      ~exe ~args
               in
-              Forkhelpers.execute_command_get_output ?env exe
-                [Xml.to_string xml]
+              Forkhelpers.execute_command_get_output ?env exe args
             in
             try (Xml.parse_string output, stderr)
             with e ->
diff --git a/ocaml/xapi/taskHelper.ml b/ocaml/xapi/taskHelper.ml
index ee896269e..abe7f4b45 100644
--- a/ocaml/xapi/taskHelper.ml
+++ b/ocaml/xapi/taskHelper.ml
@@ -244,7 +244,7 @@ let cancel ~__context =
 
 let failed ~__context exn =
   let backtrace = Printexc.get_backtrace () in
-  Context.complete_tracing_with_exn __context (exn, backtrace) ;
+  Context.complete_tracing __context ~error:(exn, backtrace) ;
   let code, params = ExnHelper.error_of_exn exn in
   operate_on_db_task ~__context (fun self ->
       let status = Db_actions.DB_Action.Task.get_status ~__context ~self in
diff --git a/ocaml/xapi/xapi_cluster_helpers.ml b/ocaml/xapi/xapi_cluster_helpers.ml
index bbb51278a..210ad5068 100644
--- a/ocaml/xapi/xapi_cluster_helpers.ml
+++ b/ocaml/xapi/xapi_cluster_helpers.ml
@@ -103,3 +103,62 @@ let with_cluster_operation ~__context ~(self : [`Cluster] API.Ref.t) ~doc ~op
           (Datamodel_common._cluster, Ref.string_of self)
       with _ -> ()
   )
+
+let cluster_health_enabled ~__context =
+  let pool = Helpers.get_pool ~__context in
+  let restrictions = Db.Pool.get_restrictions ~__context ~self:pool in
+  List.assoc_opt "restrict_cluster_health" restrictions = Some "false"
+
+let maybe_generate_alert ~__context ~num_hosts ~missing_hosts ~new_hosts ~quorum
+    =
+  let generate_alert join cluster_host =
+    let host = Db.Cluster_host.get_host ~__context ~self:cluster_host in
+    let host_uuid = Db.Host.get_uuid ~__context ~self:host in
+    let host_name = Db.Host.get_name_label ~__context ~self:host in
+    let body, name, priority =
+      match join with
+      | true ->
+          let body =
+            Printf.sprintf
+              "Host %s has joined the cluster, there are now %d host(s) in \
+               cluster and %d hosts are required to form a quorum"
+              host_name num_hosts quorum
+          in
+          let name, priority = Api_messages.cluster_host_joining in
+          (body, name, priority)
+      | false ->
+          let body =
+            Printf.sprintf
+              "Host %s has left the cluster, there are now %d host(s) in \
+               cluster and %d hosts are required to form a quorum"
+              host_name num_hosts quorum
+          in
+          let name, priority = Api_messages.cluster_host_leaving in
+          (body, name, priority)
+    in
+    Helpers.call_api_functions ~__context (fun rpc session_id ->
+        ignore
+        @@ Client.Client.Message.create ~rpc ~session_id ~name ~priority
+             ~cls:`Host ~obj_uuid:host_uuid ~body
+    )
+  in
+  if cluster_health_enabled ~__context then (
+    List.iter (generate_alert false) missing_hosts ;
+    List.iter (generate_alert true) new_hosts ;
+    (* only generate this alert when the number of hosts is decreasing *)
+    if missing_hosts <> [] && num_hosts <= quorum then
+      let pool = Helpers.get_pool ~__context in
+      let pool_uuid = Db.Pool.get_uuid ~__context ~self:pool in
+      let name, priority = Api_messages.cluster_quorum_approaching_lost in
+      let body =
+        Printf.sprintf
+          "The cluster is losing quorum: current %d hosts, need %d hosts for a \
+           quorum"
+          num_hosts quorum
+      in
+      Helpers.call_api_functions ~__context (fun rpc session_id ->
+          ignore
+          @@ Client.Client.Message.create ~rpc ~session_id ~name ~priority
+               ~cls:`Pool ~obj_uuid:pool_uuid ~body
+      )
+  )
diff --git a/ocaml/xapi/xapi_cluster_host.ml b/ocaml/xapi/xapi_cluster_host.ml
index 0d623c0cd..05cc43961 100644
--- a/ocaml/xapi/xapi_cluster_host.ml
+++ b/ocaml/xapi/xapi_cluster_host.ml
@@ -13,6 +13,7 @@
  *)
 
 open Xapi_clustering
+open Xapi_cluster_helpers
 
 module D = Debug.Make (struct let name = "xapi_cluster_host" end)
 
@@ -53,6 +54,20 @@ let call_api_function_with_alert ~__context ~msg ~cls ~obj_uuid ~body
         raise err
   )
 
+let alert_for_cluster_host ~__context ~cluster_host ~missing_hosts ~new_hosts =
+  let num_hosts = Db.Cluster_host.get_all ~__context |> List.length in
+  let cluster = Db.Cluster_host.get_cluster ~__context ~self:cluster_host in
+  let quorum = Db.Cluster.get_quorum ~__context ~self:cluster |> Int64.to_int in
+  maybe_generate_alert ~__context ~missing_hosts ~new_hosts ~num_hosts ~quorum
+
+let alert_for_cluster_host_leave ~__context ~cluster_host =
+  alert_for_cluster_host ~__context ~cluster_host ~missing_hosts:[cluster_host]
+    ~new_hosts:[]
+
+let alert_for_cluster_host_join ~__context ~cluster_host =
+  alert_for_cluster_host ~__context ~cluster_host ~missing_hosts:[]
+    ~new_hosts:[cluster_host]
+
 (* Create xapi db object for cluster_host, resync_host calls clusterd *)
 let create_internal ~__context ~cluster ~host ~pIF : API.ref_Cluster_host =
   with_clustering_lock __LOC__ (fun () ->
@@ -65,6 +80,7 @@ let create_internal ~__context ~cluster ~host ~pIF : API.ref_Cluster_host =
         ~enabled:false ~current_operations:[] ~allowed_operations:[]
         ~other_config:[] ~joined:false ~live:false
         ~last_update_live:API.Date.epoch ;
+      alert_for_cluster_host_join ~__context ~cluster_host:ref ;
       ref
   )
 
@@ -226,12 +242,14 @@ let destroy_op ~__context ~self ~force =
       let result = local_fn (rpc ~__context) dbg in
       match Idl.IdM.run @@ Cluster_client.IDL.T.get result with
       | Ok () ->
+          alert_for_cluster_host_leave ~__context ~cluster_host:self ;
           Db.Cluster_host.destroy ~__context ~self ;
           debug "Cluster_host.%s was successful" fn_str ;
           Xapi_clustering.Daemon.disable ~__context
       | Error error ->
           warn "Error occurred during Cluster_host.%s" fn_str ;
           if force then (
+            alert_for_cluster_host_leave ~__context ~cluster_host:self ;
             let ref_str = Ref.string_of self in
             Db.Cluster_host.destroy ~__context ~self ;
             debug "Cluster_host %s force destroyed." ref_str
@@ -279,6 +297,7 @@ let forget ~__context ~self =
           Db.Cluster.set_pending_forget ~__context ~self:cluster ~value:[] ;
           (* must not disable the daemon here, because we declared another unreachable node dead,
            * not the current one *)
+          alert_for_cluster_host_leave ~__context ~cluster_host:self ;
           Db.Cluster_host.destroy ~__context ~self ;
           debug "Cluster_host.forget was successful"
       | Error error ->
diff --git a/ocaml/xapi/xapi_clustering.ml b/ocaml/xapi/xapi_clustering.ml
index 7bb881b02..699aa9342 100644
--- a/ocaml/xapi/xapi_clustering.ml
+++ b/ocaml/xapi/xapi_clustering.ml
@@ -13,6 +13,7 @@
  *)
 
 open Cluster_interface
+open Xapi_cluster_helpers
 
 module D = Debug.Make (struct let name = "xapi_clustering" end)
 
@@ -457,20 +458,27 @@ let on_corosync_update ~__context ~cluster updates =
               (fun h -> not (List.mem h quorum_hosts))
               all_cluster_hosts
           in
+          let new_hosts =
+            List.filter
+              (fun h -> not (Db.Cluster_host.get_live ~__context ~self:h))
+              quorum_hosts
+          in
           List.iter
             (fun self ->
               Db.Cluster_host.set_live ~__context ~self ~value:true ;
               Db.Cluster_host.set_last_update_live ~__context ~self
                 ~value:current_time
             )
-            quorum_hosts ;
+            new_hosts ;
           List.iter
             (fun self ->
               Db.Cluster_host.set_live ~__context ~self ~value:false ;
               Db.Cluster_host.set_last_update_live ~__context ~self
                 ~value:current_time
             )
-            missing_hosts
+            missing_hosts ;
+          maybe_generate_alert ~__context ~missing_hosts ~new_hosts
+            ~num_hosts:(List.length quorum_hosts) ~quorum:diag.quorum
       ) ;
       Db.Cluster.set_quorum ~__context ~self:cluster
         ~value:(Int64.of_int diag.quorum) ;
@@ -515,12 +523,7 @@ let create_cluster_watcher_on_master ~__context ~host =
             Thread.delay 3.
       done
     in
-    let feature_enabled =
-      let pool = Helpers.get_pool ~__context in
-      let restrictions = Db.Pool.get_restrictions ~__context ~self:pool in
-      List.assoc_opt "restrict_cluster_health" restrictions = Some "false"
-    in
-    if feature_enabled then (
+    if Xapi_cluster_helpers.cluster_health_enabled ~__context then (
       debug "%s: create watcher for corosync-notifyd on master" __FUNCTION__ ;
       ignore @@ Thread.create watch ()
     ) else
diff --git a/ocaml/xapi/xapi_db_upgrade.ml b/ocaml/xapi/xapi_db_upgrade.ml
index cf8bc3f4c..05f20f083 100644
--- a/ocaml/xapi/xapi_db_upgrade.ml
+++ b/ocaml/xapi/xapi_db_upgrade.ml
@@ -859,14 +859,25 @@ let empty_pool_uefi_certificates =
       )
   }
 
-let update_livepatch_guidance =
+(* 1. Replace reboot_host_on_livepatch_failure in host.pending_guidances \
+ *    with reboot_host_on_kernel_livepatch_failure and \
+ *    reboot_host_on_xen_livepatch_failure in \
+ *    host.pending_guidances_recommended.
+ * 2. Move the rest guidances in \
+ *    host.pending_guidances into host.pending_guidances_recommended *)
+let upgrade_update_guidance =
   {
     description=
-      "Replace reboot_host_on_livepatch_failure in host.pending_guidances with \
-       reboot_host_on_kernel_livepatch_failure and \
-       reboot_host_on_xen_livepatch_failure in \
-       host.pending_guidances_recommended"
-  ; version= (fun _ -> true)
+      "Upgrade pending update gudiances"
+      (* TODO: update below schema version to which the feature branch got merged with *)
+  ; version=
+      (fun x ->
+        x
+        < ( Datamodel_common.nile_release_schema_major_vsn
+          , Datamodel_common.nile_release_schema_minor_vsn
+          )
+      )
+      (* the version where update guidance improvement is made *)
   ; fn=
       (fun ~__context ->
         Db.Host.get_all ~__context
@@ -881,7 +892,14 @@ let update_livepatch_guidance =
                    ~value:`reboot_host_on_xen_livepatch_failure ;
                  Db.Host.remove_pending_guidances ~__context ~self
                    ~value:`reboot_host_on_livepatch_failure
+               ) ;
+               List.iter
+                 (fun g ->
+                   Db.Host.add_pending_guidances_recommended ~__context ~self
+                     ~value:g
                  )
+                 (Db.Host.get_pending_guidances ~__context ~self) ;
+               Db.Host.set_pending_guidances ~__context ~self ~value:[]
            )
       )
   }
@@ -914,7 +932,7 @@ let rules =
   ; upgrade_secrets
   ; remove_legacy_ssl_support
   ; empty_pool_uefi_certificates
-  ; update_livepatch_guidance
+  ; upgrade_update_guidance
   ]
 
 (* Maybe upgrade most recent db *)
diff --git a/ocaml/xapi/xapi_fist.ml b/ocaml/xapi/xapi_fist.ml
index 7798713e4..4f211185a 100644
--- a/ocaml/xapi/xapi_fist.ml
+++ b/ocaml/xapi/xapi_fist.ml
@@ -160,3 +160,5 @@ let int_seed name : int option =
 let exchange_certificates_in_pool () : int option =
   let name = "exchange_certificates_in_pool" in
   int_seed name
+
+let disable_xapi_guard_cache () = fistpoint "disable_xapi_guard_cache"
diff --git a/ocaml/xapi/xapi_globs.ml b/ocaml/xapi/xapi_globs.ml
index d7f48ea08..37e9f5615 100644
--- a/ocaml/xapi/xapi_globs.ml
+++ b/ocaml/xapi/xapi_globs.ml
@@ -16,6 +16,7 @@
 
 module String_plain = String (* For when we don't want the Xstringext version *)
 open Xapi_stdext_std.Xstringext
+module StringSet = Set.Make (String)
 
 module D = Debug.Make (struct let name = "xapi_globs" end)
 
@@ -1017,6 +1018,11 @@ let observer_endpoint_http_enabled = ref false
 
 let observer_endpoint_https_enabled = ref false
 
+let python3_path = ref "/usr/bin/python3"
+
+let observer_experimental_components =
+  ref (StringSet.singleton Constants.observer_component_smapi)
+
 let xapi_globs_spec =
   [
     ( "master_connection_reset_timeout"
@@ -1513,6 +1519,29 @@ let other_options =
     , (fun () -> string_of_bool !observer_endpoint_https_enabled)
     , "Enable https endpoints to be used by observers"
     )
+  ; ( "observer-experimental-components"
+    , Arg.String
+        (fun s ->
+          observer_experimental_components :=
+            match s with
+            | "" ->
+                StringSet.empty
+            | s ->
+                let input_set =
+                  s |> String.split_on_char ',' |> StringSet.of_list
+                in
+                let valid_set =
+                  Constants.observer_components_all |> StringSet.of_list
+                in
+                StringSet.inter input_set valid_set
+        )
+    , (fun () ->
+        !observer_experimental_components
+        |> StringSet.elements
+        |> String.concat ","
+      )
+    , "Comma-separated list of experimental observer components"
+    )
   ]
 
 (* The options can be set with the variable xapiflags in /etc/sysconfig/xapi.
diff --git a/ocaml/xapi/xapi_observer_components.ml b/ocaml/xapi/xapi_observer_components.ml
index cdd278403..e91fd143e 100644
--- a/ocaml/xapi/xapi_observer_components.ml
+++ b/ocaml/xapi/xapi_observer_components.ml
@@ -12,34 +12,36 @@
  * GNU Lesser General Public License for more details.
  *)
 
+module D = Debug.Make (struct let name = "xapi_observer_components" end)
+
 type t = Xapi | Xenopsd | Xapi_clusterd | SMApi [@@deriving ord]
 
 exception Unsupported_Component of string
 
-let all = [Xapi; Xenopsd; Xapi_clusterd; SMApi]
-
 let to_string = function
   | Xapi ->
-      "xapi"
+      Constants.observer_component_xapi
   | Xenopsd ->
-      "xenopsd"
+      Constants.observer_component_xenopsd
   | Xapi_clusterd ->
-      "xapi-clusterd"
+      Constants.observer_component_xapi_clusterd
   | SMApi ->
-      "smapi"
+      Constants.observer_component_smapi
 
 let of_string = function
-  | "xapi" ->
+  | str when String.equal str Constants.observer_component_xapi ->
       Xapi
-  | "xenopsd" ->
+  | str when String.equal str Constants.observer_component_xenopsd ->
       Xenopsd
-  | "xapi-clusterd" ->
+  | str when String.equal str Constants.observer_component_xapi_clusterd ->
       Xapi_clusterd
-  | "smapi" ->
+  | str when String.equal str Constants.observer_component_smapi ->
       SMApi
   | c ->
       raise (Unsupported_Component c)
 
+let all = List.map of_string Constants.observer_components_all
+
 (* We start up the observer for clusterd only if clusterd has been enabled
    otherwise we initialise clusterd separately in cluster_host so that
    there is no need to restart xapi in order for clusterd to be observed.
@@ -54,8 +56,21 @@ let assert_valid_components components =
   with Unsupported_Component component ->
     raise Api_errors.(Server_error (invalid_value, ["component"; component]))
 
+let filter_out_exp_components components =
+  let open Xapi_globs in
+  let component_set = components |> List.map to_string |> StringSet.of_list in
+  StringSet.diff component_set !observer_experimental_components
+  |> StringSet.elements
+  |> List.map of_string
+
 let observed_components_of components =
-  match components with [] -> startup_components () | components -> components
+  ( match components with
+  | [] ->
+      startup_components ()
+  | components ->
+      components
+  )
+  |> filter_out_exp_components
 
 let is_component_enabled ~component =
   try
@@ -84,13 +99,17 @@ let ( // ) = Filename.concat
 let dir_name_of_component component =
   Xapi_globs.observer_config_dir // to_string component // "enabled"
 
-let env_vars_of_component ~component ~traceparent =
+let env_exe_args_of ~component ~traceparent ~exe ~args =
   let dir_name_value = Filename.quote (dir_name_of_component component) in
+  let env_vars =
     Array.concat
       [
         Forkhelpers.default_path_env_pair
       ; Env_record.to_string_array
-        ([Env_record.pair ("OBSERVER_CONFIG_DIR", dir_name_value)]
+          ([
+             Env_record.pair ("OBSERVER_CONFIG_DIR", dir_name_value)
+           ; Env_record.pair ("PYTHONPATH", Filename.dirname exe)
+           ]
           @
           match traceparent with
           | None ->
@@ -99,3 +118,7 @@ let env_vars_of_component ~component ~traceparent =
               [Env_record.pair ("TRACEPARENT", traceparent)]
           )
       ]
+  in
+  let args = "-m" :: "observer" :: exe :: args in
+  let new_exe = !Xapi_globs.python3_path in
+  (Some env_vars, new_exe, args)
diff --git a/ocaml/xapi/xapi_observer_components.mli b/ocaml/xapi/xapi_observer_components.mli
index 23b2fea80..9ee531bfe 100644
--- a/ocaml/xapi/xapi_observer_components.mli
+++ b/ocaml/xapi/xapi_observer_components.mli
@@ -62,8 +62,12 @@ val dir_name_of_component : t -> string
   * is enabled.
   *)
 
-val env_vars_of_component :
-  component:t -> traceparent:string option -> string array
-(** Returns an array of environment viariables used by python scripts to 
-  *  configure the python observers.  
+val env_exe_args_of :
+     component:t
+  -> traceparent:string option
+  -> exe:string
+  -> args:string list
+  -> string array option * string * string list
+(** Returns an array option of environment variables and the modified exe and
+  *  args used by python scripts to configure the python observers.
   *)
diff --git a/ocaml/xcp-rrdd/scripts/rrdd/rrdd.py b/ocaml/xcp-rrdd/scripts/rrdd/rrdd.py
index 909e9ab20..a5dadf326 100644
--- a/ocaml/xcp-rrdd/scripts/rrdd/rrdd.py
+++ b/ocaml/xcp-rrdd/scripts/rrdd/rrdd.py
@@ -100,7 +100,7 @@ class Proxy(xmlrpc.client.ServerProxy):
     def __init__(self, uri, transport=None, encoding=None, verbose=0,
                  allow_none=1):
         xmlrpc.client.ServerProxy.__init__(self, uri, transport, encoding,
-                                       verbose, allow_none)
+                                           bool(verbose), bool(allow_none))
         self.transport = transport
 
     def request(self, methodname, params):
@@ -248,11 +248,11 @@ class API(object):
 
     def get_header(self):
         """Get the 'static' first line of the expected output format."""
-        return self.header
+        return self.header  # pytype: disable=attribute-error
 
     def get_path(self):
         """Get the path of the file in which to write the results to."""
-        return self.path
+        return self.path  # pytype: disable=attribute-error
 
     def register(self):
         """Register plugin if not already registered, and return next_reading."""
@@ -288,7 +288,7 @@ class API(object):
                 return
             except socket.error:
                 msg = "Failed to contact xcp-rrdd. Sleeping for 5 seconds .."
-                print >> sys.stderr, msg
+                print(msg, file=sys.stderr)
                 time.sleep(5)
 
     def update(self):
diff --git a/ocaml/xcp-rrdd/scripts/rrdd/test_api_wait_until_next_reading.py b/ocaml/xcp-rrdd/scripts/rrdd/test_api_wait_until_next_reading.py
new file mode 100644
index 000000000..5ca9b897f
--- /dev/null
+++ b/ocaml/xcp-rrdd/scripts/rrdd/test_api_wait_until_next_reading.py
@@ -0,0 +1,79 @@
+# Test: pytest -v -s ocaml/xcp-rrdd/scripts/rrdd/test_api_wait_until_next_reading.py
+"""Parametrized test exercising all conditions in rrdd.API.wait_until_next_reading()"""
+import socket
+from warnings import catch_warnings as import_without_warnings, simplefilter
+
+# Dependencies:
+# pip install pytest-mock
+import pytest
+
+# Handle DeprecationWarning from importing imp (it was removed with Python 3.12)
+with import_without_warnings():
+    simplefilter(action="ignore", category=DeprecationWarning)
+    import rrdd
+
+
+# pylint:disable=no-member,redefined-outer-name  # pytest fixture, see below
+
+
+@pytest.fixture
+def api(mocker):
+    """Pytest fixture for creating a rrdd.API() instance"""
+    instance = rrdd.API("plugin_id")
+    instance.deregister = mocker.Mock()
+    return instance
+
+
+# pylint:disable=too-many-arguments  # pytest parametrized test, see below
+@pytest.mark.parametrize(
+    "neg_shift, interval, reading, sleep",
+    [
+        # Happy path tests with various realistic test values
+        (None, 5, (6,), 5),  # Test the default value of neg_shift
+        (1, 5, (6,), 5),  # to call in the same sleep as neg_shift=1
+        (2.25, 5, (6,), 3.75),  # Test neg_shift as float to get sleep as float
+        (0.5, 30, (30.5,), 30),  # Also as a fraction of a second
+        (2, 120, (122,), 120),  # Test large interval and reading
+        # Edge cases
+        (11, 5, (1,), 0),  # large neg_shift results in no sleep
+        (1, 10, (1,), 0),  # neg_shift equals reading from xcp-rrdd
+        (1, 9, (10,), 9),  # wait_time is exactly one cycle
+        (1, 10, (9,), 8),  # wait_time is negative, should wrap around
+        # Error case
+        (1, 7, (socket.error, 6), 5),  # first register raises socket.error
+    ],
+)
+def test_params(api, mocker, neg_shift, interval, reading, sleep, capsys):
+    """Test that wait_until_reading_from_xcp_rrd() works with various test values"""
+    # Arrange
+    api.frequency_in_seconds = interval
+    api.lazy_complete_init = mocker.Mock()
+    api.register = mocker.Mock(side_effect=reading)
+    api.deregister = mocker.Mock()
+
+    # Act
+    mock_sleep = mocker.patch("time.sleep")
+    if neg_shift is None:
+        rrdd.API.wait_until_next_reading(api)
+    else:
+        rrdd.API.wait_until_next_reading(api, neg_shift)
+
+    # Assert
+    mock_sleep.assert_called_with(sleep)
+
+    with capsys.disabled():
+        stderr = capsys.readouterr().err
+        stdout = capsys.readouterr().out
+        if reading[0] is socket.error:
+            assert stderr == "Failed to contact xcp-rrdd. Sleeping for 5 seconds ..\n"
+        else:
+            assert stderr == ""
+        assert stdout == ""
+
+
+def test_api_getter_functions(api):
+    """Test that the API getter functions work (and cover the code)"""
+    api.header = "header"
+    api.path = "path"
+    assert api.get_header() == "header"
+    assert api.get_path() == "path"
diff --git a/ocaml/xe-cli/newcli.ml b/ocaml/xe-cli/newcli.ml
index 7c07c1995..d197b849a 100644
--- a/ocaml/xe-cli/newcli.ml
+++ b/ocaml/xe-cli/newcli.ml
@@ -435,38 +435,6 @@ let assert_filename_permitted ?(permit_cwd = false) permitted_filenames filename
   | _ ->
       ()
 
-let do_http_get ofd url exit_code f =
-  try
-    let rec doit url =
-      let server, path = parse_url url in
-      debug "Opening connection to server '%s' path '%s'\n%!" server path ;
-      with_open_tcp server @@ fun (ic, oc) ->
-      Printf.fprintf oc "GET %s HTTP/1.0\r\n\r\n" path ;
-      flush oc ;
-      (* Get the result header immediately *)
-      let resultline = input_line ic in
-      debug "Got %s\n%!" resultline ;
-      match http_response_code resultline with
-      | 200 ->
-          f ic ; marshal ofd (Response OK)
-      | 302 ->
-          let headers = read_rest_of_headers ic in
-          let newloc = List.assoc "location" headers in
-          (* see above about Unixfd.with_connection *)
-          close_in_noerr ic ; close_out_noerr oc ; doit newloc
-      | _ ->
-          failwith "Unhandled response code"
-    in
-    doit url
-  with
-  | ClientSideError msg ->
-      marshal ofd (Response Failed) ;
-      Printf.fprintf stderr "Operation failed. Error: %s\n" msg ;
-      exit_code := Some 1
-  | e ->
-      debug "HTTP GET failure: %s\n%!" (Printexc.to_string e) ;
-      marshal ofd (Response Failed)
-
 let main_loop ifd ofd permitted_filenames =
   (* Intially exchange version information *)
   let major', minor' =
@@ -741,8 +709,19 @@ let main_loop ifd ofd permitted_filenames =
                the normal communication channel *)
             marshal ofd (Response Failed)
       )
-    | Command (HttpGet (filename, url)) ->
-        do_http_get ofd url exit_code (fun ic ->
+    | Command (HttpGet (filename, url)) -> (
+      try
+        let rec doit url =
+          let server, path = parse_url url in
+          debug "Opening connection to server '%s' path '%s'\n%!" server path ;
+          with_open_tcp server @@ fun (ic, oc) ->
+          Printf.fprintf oc "GET %s HTTP/1.0\r\n\r\n" path ;
+          flush oc ;
+          (* Get the result header immediately *)
+          let resultline = input_line ic in
+          debug "Got %s\n%!" resultline ;
+          match http_response_code resultline with
+          | 200 ->
               let file_ch =
                 if filename = "" then
                   Unix.out_channel_of_descr (Unix.dup Unix.stdout)
@@ -760,35 +739,33 @@ let main_loop ifd ofd permitted_filenames =
                 ()
               done ;
               Pervasiveext.finally
-              (fun () -> copy_with_heartbeat ic file_ch heartbeat_fun)
-              (fun () -> try close_out file_ch with _ -> ())
-        )
-    | Command (PrintHttpGetJson url) ->
-        do_http_get ofd url exit_code (fun ic ->
-            while input_line ic <> "\r" do
-              ()
-            done ;
-            Yojson.Basic.from_channel ic
-            |> Yojson.Basic.pretty_to_string
-            |> print_endline ;
-            flush stdout
+                (fun () ->
+                  copy_with_heartbeat ic file_ch heartbeat_fun ;
+                  marshal ofd (Response OK)
                 )
-    | Command (PrintUpdateGuidance url) ->
-        do_http_get ofd url exit_code (fun ic ->
-            while input_line ic <> "\r" do
-              ()
-            done ;
-            Yojson.Basic.from_channel ic |> Yojson.Basic.Util.member "hosts"
-            |> function
-            | `List [] ->
-                raise (ClientSideError "No host data returned")
-            | `List (host :: _) ->
-                Yojson.Basic.Util.member "guidance" host
-                |> Yojson.Basic.pretty_to_string
-                |> print_endline ;
-                flush stdout
+                (fun () -> try close_out file_ch with _ -> ())
+          | 302 ->
+              let headers = read_rest_of_headers ic in
+              let newloc = List.assoc "location" headers in
+              (* see above about Unixfd.with_connection *)
+              close_in_noerr ic ; close_out_noerr oc ; doit newloc
           | _ ->
-                raise (ClientSideError "Unknown data format")
+              failwith "Unhandled response code"
+        in
+        doit url
+      with
+      | ClientSideError msg ->
+          marshal ofd (Response Failed) ;
+          Printf.fprintf stderr "Operation failed. Error: %s\n" msg ;
+          exit_code := Some 1
+      | e -> (
+        match e with
+        | Filename_not_permitted _ ->
+            raise e
+        | _ ->
+            debug "HttpGet failure: %s\n%!" (Printexc.to_string e) ;
+            marshal ofd (Response Failed)
+      )
     )
     | Command Prompt ->
         let data = input_line stdin in
diff --git a/ocaml/xenopsd/lib/xenops_server_skeleton.ml b/ocaml/xenopsd/lib/xenops_server_skeleton.ml
index c688aa792..dc1b826f8 100644
--- a/ocaml/xenopsd/lib/xenops_server_skeleton.ml
+++ b/ocaml/xenopsd/lib/xenops_server_skeleton.ml
@@ -26,6 +26,7 @@ module HOST = struct
         {
           Host.cpu_count= 0
         ; socket_count= 0
+        ; threads_per_core= 0
         ; vendor= "unknown"
         ; speed= ""
         ; modelname= ""
diff --git a/ocaml/xenopsd/scripts/make-custom-xenopsd.conf b/ocaml/xenopsd/scripts/make-custom-xenopsd.conf
index 09fe45592..b49610f0e 100755
--- a/ocaml/xenopsd/scripts/make-custom-xenopsd.conf
+++ b/ocaml/xenopsd/scripts/make-custom-xenopsd.conf
@@ -37,12 +37,12 @@ supported-vbd-backend-kinds=vbd,qdisk,9pfs
 xenguest=${LIBEXECDIR}/xenguest
 
 network-conf=${ETCDIR}/xapi/network.conf
-vif-script=${LIBEXECDIR}/vif
-vif-xl-script=${LIBEXECDIR}/vif
-vbd-script=${LIBEXECDIR}/block
-vbd-xl-script=${LIBEXECDIR}/block
-qemu-vif-script=${LIBEXECDIR}/qemu-vif-script
-setup-vif-rules=${LIBEXECDIR}/setup-vif-rules
+vif-script=${XENOPSD_LIBEXECDIR}/vif
+vif-xl-script=${XENOPSD_LIBEXECDIR}/vif
+vbd-script=${XENOPSD_LIBEXECDIR}/block
+vbd-xl-script=${XENOPSD_LIBEXECDIR}/block
+qemu-vif-script=${XENOPSD_LIBEXECDIR}/qemu-vif-script
+setup-vif-rules=${XENOPSD_LIBEXECDIR}/setup-vif-rules
 sockets-group=$group
 qemu-wrapper=${QEMU_WRAPPER_DIR}/qemu-wrapper
 swtpm-wrapper=${QEMU_WRAPPER_DIR}/qemu-wrapper
diff --git a/ocaml/xenopsd/xc/xenops_server_xen.ml b/ocaml/xenopsd/xc/xenops_server_xen.ml
index a04dabe98..4de4bbc35 100644
--- a/ocaml/xenopsd/xc/xenops_server_xen.ml
+++ b/ocaml/xenopsd/xc/xenops_server_xen.ml
@@ -985,6 +985,7 @@ module HOST = struct
         let socket_count =
           p.nr_cpus / (p.threads_per_core * p.cores_per_socket)
         in
+        let threads_per_core = p.threads_per_core in
         let features = get_cpu_featureset xc Featureset_host in
         (* this is Default policy in Xen's terminology, used on boot for new VMs *)
         let features_pv_host = get_cpu_featureset xc Featureset_pv in
@@ -1012,6 +1013,7 @@ module HOST = struct
             {
               Host.cpu_count
             ; socket_count
+            ; threads_per_core
             ; vendor
             ; speed
             ; modelname
diff --git a/pyproject.toml b/pyproject.toml
index f155e2a06..b65a36bb0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -124,6 +124,9 @@ disable = [
     "import-error",  # xenfsimage, xcp.bootloader. xcp.cmd
     "ignored-abstractmethod",
     "ignored-metaclass",
+    # https://github.com/google/pytype/issues/1130,
+    # https://github.com/google/pytype/issues/1485:
+    "pyi-error",
 ]
 platform = "linux"
 pythonpath = "scripts/examples/python:.:scripts:scripts/plugins:scripts/examples"
diff --git a/pytype_reporter.py b/pytype_reporter.py
index 43e81169a..8207053d7 100755
--- a/pytype_reporter.py
+++ b/pytype_reporter.py
@@ -1,5 +1,6 @@
 #!/usr/bin/env python
 """GitHub action workflow Runner for pytype which works also locally without GitHub"""
+import json
 import re
 import selectors
 import shlex
@@ -9,6 +10,7 @@ from os.path import basename
 from subprocess import check_output, PIPE, Popen  # nosec:B404
 from sys import argv, exit as sys_exit, stderr, stdout
 from typing import Dict, List, TextIO, Tuple, TYPE_CHECKING
+from urllib import request
 from warnings import catch_warnings, simplefilter
 
 import toml
@@ -200,7 +202,9 @@ def report_on(config: Config, log: TextIO, command: List[str], results: Info) ->
         command, stdout=PIPE, stderr=PIPE, universal_newlines=True
     ) as popen:
         assert popen.stdout and popen.stderr  # nosec:B101
-        return (popen.returncode or 0), parse_annotations(config, popen, log, results)
+        output = parse_annotations(config, popen, log, results)
+        print("Ran: " + " ".join(shlex.quote(arg) for arg in command), file=log)
+        return (popen.returncode or 0), output
 
 
 def readline(fileobj):
@@ -369,6 +373,11 @@ The string "expected_to_fail" in {config["section"]}"""
         print("::endgroup::")
     # When the the regular non-xfail run does not pass without errors bail with them:
     if err or len(results):
+        print("::warning::pytype exited with errors on regular run that should pass")
+        print(
+            "::error::"
+            "Add the file that caused the error to expected_to_fail in pyproject.toml"
+        )
         return (err, results) if err > 0 else (len(results), results)
     # Else continue with running pytype for the files marked xfail, record its results:
     for xfail in xfail_files:
@@ -419,6 +428,7 @@ def generate_markdown(config: Config, out: TextIO, returncode: int, results: Inf
     else:
         summary = f"#### {runner_link} reports no errors from {pytype_link} output."
         out.write(summary + "\n")
+    summary += "\n\n" + config.get("msg", "")
     if "PR_NUMBER" in env:
         write_summary_file(summary, anchor_text)
 
@@ -494,28 +504,84 @@ def load_config(config_file: str, script_basename: str) -> Config:
         repository_url = f"{github_server_url}/{github_repository}"
         branch = env.get("GITHUB_HEAD_REF") or env.get("GITHUB_REF_NAME")
 
+    config["repo_url"] = repository_url
     config["tree_url"] = f"{repository_url}/blob/{branch}"
     config["section"] = f"{config_file}[tool.{script_name}]"
     return config
 
 
-def get_changed_xfail_files(config: Config) -> list[str]:
-    """Get the list of changed files compared to the default branch.
-
-    :param config: The configuration dictionary.
-    :return: The list of changed files.
-    """
+def git_diff(*args) -> str:
+    """Run git diff with the given arguments and return the output as a string."""
     return check_output(  # nosec:B603
         args=[
             "git",
             "diff",
             "--ignore-space-change",
-            "--name-only",
-            "origin/" + config["default_branch"],
-            *config["expected_to_fail"],
+            *args,
         ],
         universal_newlines=True,
-    ).splitlines()
+    )
+
+
+def find_branch_point(config):
+    """Get the commit hash of the default where the current branch was created from"""
+    cmd = ["git", "merge-base", "origin/" + config["default_branch"], "HEAD"]
+    return check_output(cmd, universal_newlines=True).strip()  # nosec:B603
+
+
+def github_get_pr_commit_messages(repo, pr_number):
+    """Get the commit messages of the PR from the GitHub API"""
+    empty = []
+    github_token = env.get("GITHUB_TOKEN")
+    debug("PR_NUMBER: %s", pr_number)
+    if not github_token or not pr_number:
+        return empty
+
+    headers = {
+        "Authorization": f"Bearer {github_token}",
+        "Accept": "application/vnd.github.v3+json",
+    }
+    url = env.get("GITHUB_API_URL") + f"/repos/{repo}/pulls/{pr_number}/commits"
+    debug(url)
+    req = request.Request(url, headers=headers)
+
+    with request.urlopen(req) as response:
+        if response.getcode() == 200:
+            commits = json.loads(response.read())
+            return [commit['commit']['message'].split("\n")[0] for commit in commits]
+        print(f"Failed to fetch PR commits. Status code: {response.getcode()}")
+        print(response.read())
+    return empty
+
+
+def check_only_reverts_from_branch_point(config: Config, changed_files: List[str]):
+    """Check if the branch only contains revert commits since the branch point."""
+
+    repo = env.get("GITHUB_REPOSITORY", "")
+    pr_number = env.get("PR_NUMBER", "")
+    msgs = github_get_pr_commit_messages(repo, pr_number)
+    if not msgs:
+        old_cmd = ["git", "log", "--pretty=%s", find_branch_point(config) + "..HEAD"]
+        msgs = check_output(old_cmd, universal_newlines=True).split("\n")  # nosec:B607 B603
+
+    for commit_message in msgs:  # Check if each commit is a revert
+        print("#> " + commit_message)
+        if commit_message and "Revert" not in commit_message:
+            return False
+
+    # diff the xfail files in the PR with their state 4 weeks ago and show the check
+    # on stdout and the GitHub PR comment added by saving a file in this script:
+    pr_url = f"{config['repo_url']}/pull/{pr_number or '<PR-number>'}"
+    old_cmd = ["git", "rev-list", "-n1", "--before=4 weeks ago", "HEAD" ]
+    old_ref = check_output(old_cmd, universal_newlines=True).strip()
+    config["msg"] = f'\n## {config["script_name"]}: Only "Revert" commits on this PR.\n'
+    config["msg"] += "Checking the revert diff:\n```sh\ngh pr checkout " + pr_url + "\n"
+    config["msg"] += "REF=$(git rev-list -n 1 --before='4 weeks ago' HEAD)\n"
+    config["msg"] += "git diff $REF " + " ".join(changed_files) + "\n```\ndiff:\n"
+    config["msg"] += "```py\n" + git_diff(old_ref, *changed_files) + "```\n"
+    config["msg"] += "An empty `git diff` means the changes in these files are reverted"
+    print(config["msg"])
+    return True
 
 
 def main():
@@ -534,7 +600,16 @@ def main():
     config = load_config(config_file, basename(__file__))
     config.setdefault("expected_to_fail", [])
     debug("Expected to fail: %s", ", ".join(config["expected_to_fail"]))
-    changed_but_in_expected_to_fail = get_changed_xfail_files(config)
+
+    changed_but_in_expected_to_fail = git_diff(
+        "--name-only",
+        find_branch_point(config),
+        *config["expected_to_fail"],
+    ).splitlines()
+
+    if check_only_reverts_from_branch_point(config, changed_but_in_expected_to_fail):
+        return run_pytype_and_generate_summary(config)
+
     for changed_xfail in changed_but_in_expected_to_fail:
         annotate(
             kind="error",
diff --git a/quality-gate.sh b/quality-gate.sh
index d33edacff..224e852aa 100755
--- a/quality-gate.sh
+++ b/quality-gate.sh
@@ -3,7 +3,7 @@
 set -e
 
 list-hd () {
-  N=316
+  N=318
   LIST_HD=$(git grep -r --count 'List.hd' -- **/*.ml | cut -d ':' -f 2 | paste -sd+ - | bc)
   if [ "$LIST_HD" -eq "$N" ]; then
     echo "OK counted $LIST_HD List.hd usages"
diff --git a/scripts/hfx_filename b/scripts/hfx_filename
index e454eb299..cea0f8082 100755
--- a/scripts/hfx_filename
+++ b/scripts/hfx_filename
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python
 
 # Copyright (c) 2015 Citrix, Inc.
 #
@@ -14,8 +14,8 @@
 # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 
-
-import sys, socket, urllib.request, XenAPI
+from __future__ import print_function
+import sys, os, socket, urllib2, urlparse, XenAPI, traceback, xmlrpclib
 
 db_url = "/remote_db_access"
 
@@ -76,6 +76,7 @@ def read_field(session_id, table, fld, rf):
     return response
 
 if __name__ == "__main__":
+    import XenAPI
     xapi = XenAPI.xapi_local()
     xapi.xenapi.login_with_password('root', '')
     session_id = xapi._session
diff --git a/scripts/perfmon b/scripts/perfmon
index 1af3ce057..a84c8eb5d 100644
--- a/scripts/perfmon
+++ b/scripts/perfmon
@@ -1,4 +1,4 @@
-#!/usr/bin/env python3
+#!/usr/bin/env python
 #
 # perfmon - a daemon for monitoring performance of the host on which it is run 
 # and of all the local VMs, and for generating events based on configurable 
@@ -29,13 +29,13 @@
 #
 # The "cf" CGI param specfies the row.  (All rows are returned if it's missing.)
 
-
+from __future__ import print_function
 import sys
 import os
 import getopt
 import traceback
 import XenAPI
-import urllib.request
+import urllib
 from xml import sax            # used to parse rrd_updates because this may be large and sax is more efficient
 from xml.dom import minidom    # used to parse other-config:perfmon.  Efficiency is less important than reliability here
 from xml.parsers.expat import ExpatError
@@ -46,7 +46,7 @@ import syslog
 import socket
 import gc
 import signal
-import subprocess
+import commands
 
 def print_debug(string):
     if debug: 
@@ -118,7 +118,7 @@ class ObjectReport:
     def get_uuid(self):
         return self.uuid
     def get_var_names(self):
-        return list(self.vars.keys())
+        return self.vars.keys()
     def get_value(self, var_name, row):
         try:
             return (self.vars[var_name])[row]
@@ -312,7 +312,9 @@ class RRDUpdates:
         paramstr = "&".join(["%s=%s"  % (k,params[k]) for k in params])
         print_debug("Calling http://localhost/rrd_updates?%s" % paramstr)
 
-        sock = urllib.request.urlopen("http://localhost/rrd_updates?%s" % paramstr)
+        # this is better than urllib.urlopen() as it raises an Exception on http 401 'Unauthorised' error 
+        # rather than drop into interactive mode
+        sock = urllib.URLopener().open("http://localhost/rrd_updates?%s" % paramstr)
         xmlsource = sock.read()
         sock.close()
 
@@ -352,8 +354,8 @@ def average(mylist):
 
 def get_percent_log_fs_usage(ignored):
     "Get the percent usage of the host filesystem for logs partition. Input list is ignored and should be empty"
-    fs_output = subprocess.getoutput('df /etc/passwd')
-    log_fs_output = subprocess.getoutput('df /var/log')
+    fs_output = commands.getoutput('df /etc/passwd')
+    log_fs_output = commands.getoutput('df /var/log')
     fs_output = ' '.join(fs_output.splitlines()[1:])
     log_fs_output = ' '.join(log_fs_output.splitlines()[1:])
     # Get the percent usage only when there is a separate logs partition
@@ -367,7 +369,7 @@ def get_percent_log_fs_usage(ignored):
 def get_percent_fs_usage(ignored):
     "Get the percent usage of the host filesystem.  Input list is ignored and should be empty"
     # this file is on the filesystem of interest in both OEM and Retail
-    output = subprocess.getoutput('df /etc/passwd')
+    output = commands.getoutput('df /etc/passwd')
     output = ' '.join(output.splitlines()[1:]) # remove header line and rewrap on single line
     percentage = output.split()[4]
     # remove % character and convert to float
@@ -637,10 +639,10 @@ class ObjectMonitor:
         
         for var in self.get_active_variables():
             # find the subset of the params returned for this object that we need to consolidate into var
-            params_to_consolidate = list(filter(var.rrd_regex.match, params_in_obj_report))
+            params_to_consolidate = filter(var.rrd_regex.match, params_in_obj_report)
             for row in range(num_rows):
                 # Get the values to consolidate
-                values_to_consolidate = [obj_report.get_value(param, row) for param in params_to_consolidate]
+                values_to_consolidate = map(lambda param: obj_report.get_value(param, row), params_to_consolidate)
                 # Consolidate them
                 value = var.consolidation_fn(values_to_consolidate)
                 # Pass result on to the variable object - this may result in an alarm being generated
@@ -939,7 +941,7 @@ def update_all_xmlconfigs(session):
     for recs in (all_host_recs, all_vm_recs, all_sr_recs):
         all_otherconfigs.update([
                 (recs[ref]['uuid'], recs[ref]['other_config'])
-                for ref in recs
+                for ref in recs.keys()
                 ])
 
     # rebuild dictionary mapping uuids to xmlconfigs
@@ -1067,13 +1069,13 @@ def main():
             vm_uuid_list = rrd_updates.get_uuid_list_by_objtype('vm')
 
             # Remove any monitors for VMs no longer listed in rrd_updates page
-            for uuid in vm_mon_lookup:
+            for uuid in vm_mon_lookup.keys():
                 if uuid not in vm_uuid_list:
                     vm_mon_lookup.pop(uuid)
                 
             # Create monitors for VMs that have just appeared in rrd_updates page
             for uuid in vm_uuid_list:
-                if uuid not in vm_mon_lookup:
+                if uuid not in vm_mon_lookup.keys():
                     vm_mon_lookup[uuid] = VMMonitor(uuid)
                 else:
                     # check if the config has changed, e.g. by XenCenter
@@ -1103,12 +1105,12 @@ def main():
             print_debug("sr_uuid_list = %s" % sr_uuid_list)
 
             # Remove monitors for SRs no longer listed in the rrd_updates page
-            for uuid in sr_mon_lookup:
+            for uuid in sr_mon_lookup.keys():
                 if uuid not in sr_uuid_list:
                     sr_mon_lookup.pop(uuid)
             # Create monitors for SRs that have just appeared in rrd_updates page
             for uuid in sr_uuid_list:
-                if uuid not in sr_mon_lookup:
+                if uuid not in sr_mon_lookup.keys():
                     sr_mon_lookup[uuid] = SRMonitor(uuid)
                 else:
                     sr_mon_lookup[uuid].refresh_config()
diff --git a/scripts/test_perfmon.py b/scripts/test_perfmon.py
deleted file mode 100644
index d0eba5c67..000000000
--- a/scripts/test_perfmon.py
+++ /dev/null
@@ -1,132 +0,0 @@
-#!/usr/bin/env python3
-#
-# unittest for perfmon
-
-import unittest
-from mock import MagicMock, patch
-import sys
-import os
-import subprocess
-import math
-
-# mock modules to avoid dependencies
-sys.modules["XenAPI"] = MagicMock()
-
-def import_from_file(module_name, file_path):
-    """Import a file as a module"""
-    if sys.version_info.major == 2:
-        return None
-    else:
-        from importlib import machinery, util
-        loader = machinery.SourceFileLoader(module_name, file_path)
-        spec = util.spec_from_loader(module_name, loader)
-        assert spec
-        assert spec.loader
-        module = util.module_from_spec(spec)
-        # Probably a good idea to add manually imported module stored in sys.modules
-        sys.modules[module_name] = module
-        spec.loader.exec_module(module)
-        return module
-
-def get_module():
-    """Import the perfmon script as a module for executing unit tests on functions"""
-    testdir = os.path.dirname(__file__)
-    return import_from_file("perfmon", testdir + "/perfmon")
-
-perfmon = get_module()
-@unittest.skipIf(sys.version_info < (3, 0), reason="requires python3")
-@patch("subprocess.getoutput")
-class TestGetPercentage(unittest.TestCase):
-    def mock_subprocess_getoutput(self, cmd):
-        df_etc_passwd = r"""Filesystem     1K-blocks    Used Available Use% Mounted on
-        /dev/sda1       18402132 2244748  15213668  13% /
-        """
-        df_var_log = r"""Filesystem     1K-blocks  Used Available Use% Mounted on
-        /dev/sda5        4054752 59820   3785220   2% /var/log
-        """
-        if cmd == "df /etc/passwd":
-            return df_etc_passwd
-        if cmd == "df /var/log":
-            return df_var_log
-        
-    def mock_subprocess_getoutput_same_file_system(self, cmd):
-        df_etc_passwd = r"""Filesystem     1K-blocks    Used Available Use% Mounted on
-        /dev/sda5       18402132 2244748  15213668  13% /
-        """
-        df_var_log = r"""Filesystem     1K-blocks  Used Available Use% Mounted on
-        /dev/sda5        4054752 59820   3785220   2% /var/log
-        """
-        if cmd == "df /etc/passwd":
-            return df_etc_passwd
-        if cmd == "df /var/log":
-            return df_var_log
-
-    def test_get_percent_log_fs_usage(self, mock_getoutput):
-        """Assert that get_percent_log_fs_usage returns as expected"""
-        mock_getoutput.side_effect = self.mock_subprocess_getoutput
-
-        expected_percentage = 0.02
-        test_percentage = perfmon.get_percent_log_fs_usage(None)
-        self.assertAlmostEqual(test_percentage, expected_percentage, 7)
-        
-    def test_get_percent_log_fs_usage_same_file_system(self, mock_getoutput):
-        """Test where /etc/passwd and /var/log are in the same filesystem"""
-        mock_getoutput.side_effect = self.mock_subprocess_getoutput_same_file_system
-
-        test_percentage = perfmon.get_percent_log_fs_usage(None)
-        self.assertTrue(math.isnan(test_percentage))
-        
-    def test_get_percent_fs_usage(self, mock_getoutput):
-        """Assert that get_percent_fs_usage returns as expected"""
-        mock_getoutput.side_effect = self.mock_subprocess_getoutput
-
-        expected_percentage = 0.13
-        test_percentage = perfmon.get_percent_fs_usage(None)
-        self.assertAlmostEqual(test_percentage, expected_percentage, 7)
-@unittest.skipIf(sys.version_info < (3, 0), reason="requires python3")
-@patch("perfmon.Variable.update")
-@patch("perfmon.get_percent_fs_usage")
-@patch("perfmon.get_percent_log_fs_usage")
-@patch("perfmon.get_percent_mem_usage")
-class TestVMMonitor(unittest.TestCase):   
-    def test_process_rrd_updates(self, mock_get_percent_fs_usage, mock_get_percent_log_fs_usage, 
-                                 mock_get_percent_mem_usage, mock_update):
-        uuid = 'e1ae3f5d-4c8b-4575-bbb8-2af7e8a2c31e'
-        perfmon.all_xmlconfigs = {'e1ae3f5d-4c8b-4575-bbb8-2af7e8a2c31e': '''<config><variable><name value="fs_usage"/>
-                                                                            <alarm_trigger_level value="0.9"/>
-                                                                            <alarm_trigger_period value="60"/>
-                                                                            <alarm_auto_inhibit_period value="3600"/>
-                                                                            </variable><variable><name value="mem_usage"/>
-                                                                            <alarm_trigger_level value="0.95"/>
-                                                                            <alarm_trigger_period value="60"/>
-                                                                            <alarm_auto_inhibit_period value="3600"/>
-                                                                            </variable><variable><name value="log_fs_usage"/>
-                                                                            <alarm_trigger_level value="0.9"/>
-                                                                            <alarm_trigger_period value="60"/>
-                                                                            <alarm_auto_inhibit_period value="3600"/>
-                                                                            </variable></config>'''}
-        monitor = perfmon.VMMonitor(uuid)
-        rrd_updates = perfmon.RRDUpdates()
-        obj_report = perfmon.ObjectReport("vm", uuid)
-        obj_report.vars = {'cpu0': [0.0063071, 0.0048038, 0.0045862, 0.0048865, 0.0048923],
-                           'cpu1': [0.0067629, 0.0055811, 0.0058988, 0.0058809, 0.0053645], 
-                           'cpu2': [0.0088599, 0.0078701, 0.0058573, 0.0063993, 0.0056833], 
-                           'cpu3': [0.0085826, 0.0056874, 0.005697, 0.0061155, 0.0048769], 
-                           'cpu4': [0.0051265, 0.0045452, 0.0046137, 0.0066399, 0.0050993], 
-                           'cpu5': [0.0062369, 0.0053982, 0.0056624, 0.00606, 0.0062017], 
-                           'cpu6': [0.006235, 0.0041764, 0.0048101, 0.0053798, 0.0050934], 
-                           'cpu7': [0.0050709, 0.005482, 0.0058926, 0.0052934, 0.0049544], 
-                           'memory': [2785000000.0, 2785000000.0, 2785000000.0, 2785000000.0, 2785000000.0]}
-        rrd_updates.report.obj_reports[uuid] = obj_report
-        rrd_updates.report.rows = 5  
-        session = perfmon.XapiSession()
-        
-        mock_get_percent_fs_usage.return_value = 0.12
-        mock_get_percent_log_fs_usage.return_value = float("NaN")
-        mock_get_percent_mem_usage.return_value = 0.17380
-        monitor.process_rrd_updates(rrd_updates, session)
-        mock_get_percent_fs_usage.assert_called()
-        mock_get_percent_log_fs_usage.assert_called()
-        mock_get_percent_mem_usage.assert_called()
-        mock_update.assert_called()
-       
diff --git a/scripts/xapi.service b/scripts/xapi.service
index e08cbff10..58923c0a9 100644
--- a/scripts/xapi.service
+++ b/scripts/xapi.service
@@ -1,6 +1,8 @@
 [Unit]
 Description=XenAPI server (XAPI)
 
+Requires=systemd-tmpfiles-setup.service
+After=systemd-tmpfiles-setup.service
 After=attach-static-vdis.service
 After=forkexecd.service
 After=message-switch.service

An empty git diff means the changes in these files are reverted.

You can check the results of the job here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants