Skip to content

Conversation

@zli
Copy link
Contributor

@zli zli commented May 23, 2016

This pull req does best-effort DP destroy and other cleanup actions in case of mirror/copy task failures. Along the way, we also made small amount of refactoring to make the code cleaner and more concise. See the summary of each commit for details.

Signed-off-by: Zheng Li dev@zheng.li

@jonludlam
Copy link
Contributor

@zli Please make a better title and description, as these get into the commit history.

@jonludlam
Copy link
Contributor

This is certainly a big improvement over the original. However, there's still a window for problems, I think - if an exception happens here: https://github.com/zli/xen-api/blob/a24e501f867f0f05d2b7bec8bb53d06ac6451baf/ocaml/xapi/xapi_vm_migrate.ml#L541-L568
we'll still leak, no?

@zli
Copy link
Contributor Author

zli commented May 23, 2016

However, there's still a window for problems...

Indeed. I though that part of code is pretty safe as the main activity (copy/mirror) must have been completed by then. But yes, there is still chance the remote query might fail. I can rework a little bit to have the try..with cover that section.

... aggregates two similar branching logics, no semantics change.

Signed-off-by: Zheng Li <dev@zheng.li>
let len = (Int64.to_float vconf.size) /. (Int64.to_float total_size) in
fun x -> start +. x *. len
in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion to make this code clearer. The name so_far suggests that we have something done and need to do more. I also think it is clearer to expose the argument x early. I'd write:

let mapfn x =
  let total     = Int64.to_float total_size in
  let done_     = Int64.to_float !so_far ./ total in
  let remaining = Int64.to_float vconf.size /. total in
    done_ +. x *. remaining

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrated as a separate commit, thanks!

@zli zli changed the title CA-199734 CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures May 24, 2016
@zli
Copy link
Contributor Author

zli commented May 24, 2016

@jonludlam , it's now updated.

  • Add pull req title & description
  • Changed the exception handling to cover larger range, also did some surgery with the code organization
  • More code refactoring as suggested by @lindig

let dest_vdi_ref = XenAPI.VDI.get_by_uuid remote.rpc remote.session vdi_uuid in
vconf.location,dest_vdi_ref,None,None
remote_vdi := vconf.location;
remote_vdi_reference := dest_vdi_ref
Copy link
Contributor

@jonludlam jonludlam Jun 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause us to try to destroy a remote VDI that wasn't made by us in the exception handler, wont it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spot! The exception handling logic should be aware if the remote disk wasn't created by mirror/copy and action accordingly. Code is updated.

zli added 2 commits June 7, 2016 14:42
…task failure

... previously we didn't do this which caused issue like CA-199734.

A bit of code re-organization and refactoring is done as part of this. Mainly,
we use reference cells to record configuration variables. These ref cells are
gradually filled up with actual configurations along the process, so that the
initialization status of these cells also reflects the overall progress, i.e.
which steps have already been done and which haven't. With this information,
the error handling logic can decide what to do or undo in case of failures. The
benefits are, there is no need for multi layers of exceptions handling, also a
single entry point for a unified cleanup function.

Signed-off-by: Zheng Li <dev@zheng.li>
The code change was suggested by Christian Lindig <christian.lindig@citrix.com>
for better naming and style.

Signed-off-by: Zheng Li <dev@zheng.li>
@zli zli removed the needs updating label Jun 7, 2016
@jonludlam
Copy link
Contributor

Hi @zli. I've been having some philosophical discussions with @simonjbeaumont and @johnelse about this PR. There has been a fair bit of refactoring recently to move to a continuation-passing style implementation in this module. Although, as you say, the logic of the cleanup seems quite reasonable and clear, we've been wondering how it would look when it's refactored into a 'with_dp', 'with_remote_vdi' or some similar formulation. Essentially having smaller functions that do more specific things with their own smaller exception handlers all chained together. I'd really like to see this in comparison with your current more imperative implementation. We can keep this pull request open and then make a decision about which one is clearer, as, although I think it likely that the CPS style will end up being simpler, I'm honestly not sure at this point whether it will be better or not.

I'm quite keen to hear other peoples views on this too.

@jonludlam
Copy link
Contributor

I've marked this as blocked until we can compare with the CPS style fix.

@jonludlam
Copy link
Contributor

@zli - please prepare an alternative PR as described above. Thanks!

@zli
Copy link
Contributor Author

zli commented Jun 17, 2016

@jonludlam: sure, will create a separate one.

@jonludlam
Copy link
Contributor

Closed in favour of the updated PR

@jonludlam jonludlam closed this Jun 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants