CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures #2664

zli · 2016-05-23T09:15:31Z

This pull req does best-effort DP destroy and other cleanup actions in case of mirror/copy task failures. Along the way, we also made small amount of refactoring to make the code cleaner and more concise. See the summary of each commit for details.

Signed-off-by: Zheng Li dev@zheng.li

jonludlam · 2016-05-23T13:23:18Z

@zli Please make a better title and description, as these get into the commit history.

jonludlam · 2016-05-23T13:26:02Z

This is certainly a big improvement over the original. However, there's still a window for problems, I think - if an exception happens here: https://github.com/zli/xen-api/blob/a24e501f867f0f05d2b7bec8bb53d06ac6451baf/ocaml/xapi/xapi_vm_migrate.ml#L541-L568
we'll still leak, no?

zli · 2016-05-23T14:47:45Z

However, there's still a window for problems...

Indeed. I though that part of code is pretty safe as the main activity (copy/mirror) must have been completed by then. But yes, there is still chance the remote query might fail. I can rework a little bit to have the try..with cover that section.

... aggregates two similar branching logics, no semantics change. Signed-off-by: Zheng Li <dev@zheng.li>

lindig · 2016-05-24T09:50:38Z

ocaml/xapi/xapi_vm_migrate.ml

+        let len = (Int64.to_float vconf.size) /. (Int64.to_float total_size) in
+        fun x -> start +. x *. len
+      in
+


Minor suggestion to make this code clearer. The name so_far suggests that we have something done and need to do more. I also think it is clearer to expose the argument x early. I'd write:

let mapfn x = let total = Int64.to_float total_size in let done_ = Int64.to_float !so_far ./ total in let remaining = Int64.to_float vconf.size /. total in done_ +. x *. remaining

Integrated as a separate commit, thanks!

zli · 2016-05-24T13:05:26Z

@jonludlam , it's now updated.

Add pull req title & description
Changed the exception handling to cover larger range, also did some surgery with the code organization
More code refactoring as suggested by @lindig

jonludlam · 2016-06-07T12:39:50Z

ocaml/xapi/xapi_vm_migrate.ml

      let dest_vdi_ref = XenAPI.VDI.get_by_uuid remote.rpc remote.session vdi_uuid in
-      vconf.location,dest_vdi_ref,None,None
+      remote_vdi := vconf.location;
+      remote_vdi_reference := dest_vdi_ref


This will cause us to try to destroy a remote VDI that wasn't made by us in the exception handler, wont it?

Well spot! The exception handling logic should be aware if the remote disk wasn't created by mirror/copy and action accordingly. Code is updated.

…task failure ... previously we didn't do this which caused issue like CA-199734. A bit of code re-organization and refactoring is done as part of this. Mainly, we use reference cells to record configuration variables. These ref cells are gradually filled up with actual configurations along the process, so that the initialization status of these cells also reflects the overall progress, i.e. which steps have already been done and which haven't. With this information, the error handling logic can decide what to do or undo in case of failures. The benefits are, there is no need for multi layers of exceptions handling, also a single entry point for a unified cleanup function. Signed-off-by: Zheng Li <dev@zheng.li>

The code change was suggested by Christian Lindig <christian.lindig@citrix.com> for better naming and style. Signed-off-by: Zheng Li <dev@zheng.li>

jonludlam · 2016-06-08T10:02:09Z

Hi @zli. I've been having some philosophical discussions with @simonjbeaumont and @johnelse about this PR. There has been a fair bit of refactoring recently to move to a continuation-passing style implementation in this module. Although, as you say, the logic of the cleanup seems quite reasonable and clear, we've been wondering how it would look when it's refactored into a 'with_dp', 'with_remote_vdi' or some similar formulation. Essentially having smaller functions that do more specific things with their own smaller exception handlers all chained together. I'd really like to see this in comparison with your current more imperative implementation. We can keep this pull request open and then make a decision about which one is clearer, as, although I think it likely that the CPS style will end up being simpler, I'm honestly not sure at this point whether it will be better or not.

I'm quite keen to hear other peoples views on this too.

jonludlam · 2016-06-08T10:02:48Z

I've marked this as blocked until we can compare with the CPS style fix.

jonludlam · 2016-06-15T09:11:36Z

@zli - please prepare an alternative PR as described above. Thanks!

zli · 2016-06-17T08:22:37Z

@jonludlam: sure, will create a separate one.

jonludlam · 2016-06-21T22:27:05Z

Closed in favour of the updated PR

CA-199734: Small refactoring

bf19530

... aggregates two similar branching logics, no semantics change. Signed-off-by: Zheng Li <dev@zheng.li>

lindig reviewed May 24, 2016
View reviewed changes

zli changed the title ~~CA-199734~~ CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures May 24, 2016

simonjbeaumont assigned jonludlam May 31, 2016

jonludlam reviewed Jun 7, 2016
View reviewed changes

jonludlam added the needs updating label Jun 7, 2016

zli added 2 commits June 7, 2016 14:42

CA-199734: code improvement on mapfn

7018afc

The code change was suggested by Christian Lindig <christian.lindig@citrix.com> for better naming and style. Signed-off-by: Zheng Li <dev@zheng.li>

zli removed the needs updating label Jun 7, 2016

jonludlam added the blocked label Jun 8, 2016

zli mentioned this pull request Jun 20, 2016

CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures #2678

Merged

jonludlam closed this Jun 21, 2016

CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures #2664

CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures #2664

Uh oh!

Conversation

zli commented May 23, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonludlam commented May 23, 2016

Uh oh!

jonludlam commented May 23, 2016

Uh oh!

zli commented May 23, 2016

Uh oh!

lindig May 24, 2016

Choose a reason for hiding this comment

Uh oh!

zli May 24, 2016

Choose a reason for hiding this comment

Uh oh!

zli commented May 24, 2016

Uh oh!

jonludlam Jun 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zli Jun 7, 2016

Choose a reason for hiding this comment

Uh oh!

jonludlam commented Jun 8, 2016

Uh oh!

jonludlam commented Jun 8, 2016

Uh oh!

jonludlam commented Jun 15, 2016

Uh oh!

zli commented Jun 17, 2016

Uh oh!

jonludlam commented Jun 21, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zli commented May 23, 2016 •

edited

Loading

jonludlam Jun 7, 2016 •

edited

Loading