Skip to content

xapi_xenops: Avoid a race during suspend #6454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 7, 2025

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented May 6, 2025

As described in #6451, a xapi event could prevent update_vm from pulling the latest Xenopsd metadata, overwriting it with stale information. In case of suspend, this would make the snapshot unresumable, raising an assert in xenopsd due to incongruities in memory values.

Instead pull the xenopsd metadata right before updating DB.power_state inXapi_vm_lifecycle.force_state_reset_keep_current_operations, eliminating the window for the race.

Closes #6451

@last-genius
Copy link
Contributor Author

I've manually tested this fix and could not reproduce the issue with the xenopsd assert making a snapshot unresumable anymore.

@@ -1714,7 +1717,8 @@ module Xenopsd_metadata = struct
in
delete_nolock ~__context id ;
md
)
)
(fun () -> try Mutex.unlock metadata_m with Sys_error _ -> ())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this correct even if the lock was aquired not in this function but higher up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly - you're right. But there's no other users of the function and in this case we no longer need to hold the lock once pull is done, so it's alright

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to leave a comment that release of the lock is ok. Altogether it would be better if we don't have to check dynamically if the lock is held already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment

Copy link
Contributor

@Vincent-lau Vincent-lau May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 2094 there is a with_lock call which will do an unlock at the end of that call, and here if the lock is already held higher up, and a Mutex.unlock is called, metadata_m will be unlocked, and later on when it reaches the with_lock on line 2094, it will be unlocked again. Is this true? If so, it is not safe to unlock a mutex twice in ocaml, see https://ocaml.org/manual/5.1/api/Mutex.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too - but didn't see any exceptions in testing. (the exception will anyhow be caught and logged by finally)

@robhoes
Copy link
Member

robhoes commented May 6, 2025

So is the problem that this code in Xapi_xenops.create_metadata

  let domains =
    (* For suspended VMs, the last_booted_record contains the "live" xenopsd state. *)
    if vm.API.vM_power_state = `Suspended then
      Some vm.API.vM_last_booted_record
    else
      None

grabs a stale VM.last_booted_record during the window where – in another thread that updates the DB based on an event from xenopsd – the power_state was already set to Suspended but the "real" live xenopsd state was not yet written to VM.last_booted_record?

@last-genius
Copy link
Contributor Author

last-genius commented May 6, 2025

So is the problem that this code in Xapi_xenops.create_metadata

  let domains =
    (* For suspended VMs, the last_booted_record contains the "live" xenopsd state. *)
    if vm.API.vM_power_state = `Suspended then
      Some vm.API.vM_last_booted_record
    else
      None

grabs a stale VM.last_booted_record during the window where – in another thread that updates the DB based on an event from xenopsd – the power_state was already set to Suspended but the "real" live xenopsd state was not yet written to VM.last_booted_record?

Yes! Exactly.

Perhaps I should describe it better in the commit message/code comment?

@last-genius last-genius force-pushed the asv/xenops-race-fix branch from 5ed3c3e to ee19704 Compare May 6, 2025 15:05
@@ -1687,7 +1687,10 @@ module Xenopsd_metadata = struct

(* Unregisters a VM with xenopsd, and cleans up metadata and caches *)
let pull ~__context id =
with_lock metadata_m (fun () ->
(* Continue even if mutex is already locked one level above *)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use a reentrant lock here?
I think Colin has done something before to make the db lock reentrant (fae9b55), could we borrow that or use a library?

@@ -1714,7 +1717,11 @@ module Xenopsd_metadata = struct
in
delete_nolock ~__context id ;
md
)
)
(* NOTE: This function has a single user and releasing the upper-level
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it has a single user always, how about making it as pull_no_lock so that it could solely rely on its caller to acquire the lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The outer lock will be acquired only when "power_state = `Suspended".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't be called unless power_state = Suspended, so I could just make it pull_no_lock indeed and leave the reentrant lock refactoring for the future if someone else starts using this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, locking/unlocking twice seems a bit strange to me.

@minglumlu
Copy link
Member

minglumlu commented May 7, 2025

So is the problem that this code in Xapi_xenops.create_metadata

let domains =
(* For suspended VMs, the last_booted_record contains the "live" xenopsd state. *)
if vm.API.vM_power_state = `Suspended then
Some vm.API.vM_last_booted_record
else
None
grabs a stale VM.last_booted_record during the window where – in another thread that updates the DB based on an event from xenopsd – the power_state was already set to Suspended but the "real" live xenopsd state was not yet written to VM.last_booted_record?

So the xapi event handler pushed the wrong metadata to the xenopsd, and then the update_vm pulled the wrong metadata from the xenopsd?
If so, should the xenopsd reject the pushing since the VM was in suspended state already?

@last-genius
Copy link
Contributor Author

So is the problem that this code in Xapi_xenops.create_metadata
let domains =
(* For suspended VMs, the last_booted_record contains the "live" xenopsd state. *)
if vm.API.vM_power_state = `Suspended then
Some vm.API.vM_last_booted_record
else
None
grabs a stale VM.last_booted_record during the window where – in another thread that updates the DB based on an event from xenopsd – the power_state was already set to Suspended but the "real" live xenopsd state was not yet written to VM.last_booted_record?

So the xapi event handler pushed the wrong metadata to the xenopsd, and then the update_vm pulled the wrong metadata from the xenopsd? If so, should the xenopsd reject the pushing since the VM was in suspended state already?

This could be another way of doing this - though I am not 100% sure we wouldn't prevent some desired behaviour this way.

@minglumlu
Copy link
Member

This could be another way of doing this - though I am not 100% sure we wouldn't prevent some desired behaviour this way.

I just thought the locking scope is bigger and the lock is shared by all operations. But it may be fine as the bigger locking scope only covers the suspended case.

Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, pressed the Approve button by mistake 😅

@last-genius last-genius force-pushed the asv/xenops-race-fix branch from ee19704 to c8eb4c8 Compare May 7, 2025 14:35
@robhoes
Copy link
Member

robhoes commented May 7, 2025

Instead pull the xenopsd metadata as soon as the power_state is updated,
shrinking the window for the race.

It's not just shrinking the window, but eliminating it: if the VM.power_state field in the DB is not Suspended, then the xenopsd metadata for the VM will not be overwritten at all. And Xapi_vm_lifecycle.force_state_reset_keep_current_operations is what sets VM.power_state in the DB.

@last-genius last-genius force-pushed the asv/xenops-race-fix branch from c8eb4c8 to 1e67e7d Compare May 7, 2025 14:45
As described in [xapi-project#6451](xapi-project#6451),
a xapi event could prevent update_vm from pulling the latest Xenopsd metadata,
overwriting it with stale information. In case of suspend, this would make the
snapshot unresumable, raising an assert in xenopsd due to incongruities in
memory values.

Instead pull the xenopsd metadata right before updating DB.power_state in
Xapi_vm_lifecycle.force_state_reset_keep_current_operations, eliminating the
window for the race.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
@last-genius last-genius force-pushed the asv/xenops-race-fix branch from 1e67e7d to d875cfa Compare May 7, 2025 14:46
@last-genius last-genius changed the title xapi_xenops: Try to avoid a race during suspend xapi_xenops: Avoid a race during suspend May 7, 2025
@last-genius last-genius enabled auto-merge May 7, 2025 14:46
@last-genius last-genius added this pull request to the merge queue May 7, 2025
Merged via the queue into xapi-project:master with commit 21128b3 May 7, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition in xapi_xenops overwriting VM data on suspend makes it unresumable
5 participants