- 
                Notifications
    You must be signed in to change notification settings 
- Fork 293
CA-199734: best-effort DP destroy and other cleanups for mirror/copy task failures #2664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @zli Please make a better title and description, as these get into the commit history. | 
| This is certainly a big improvement over the original. However, there's still a window for problems, I think - if an exception happens here: https://github.com/zli/xen-api/blob/a24e501f867f0f05d2b7bec8bb53d06ac6451baf/ocaml/xapi/xapi_vm_migrate.ml#L541-L568 | 
| 
 Indeed. I though that part of code is pretty safe as the main activity (copy/mirror) must have been completed by then. But yes, there is still chance the remote query might fail. I can rework a little bit to have the try..with cover that section. | 
... aggregates two similar branching logics, no semantics change. Signed-off-by: Zheng Li <dev@zheng.li>
| let len = (Int64.to_float vconf.size) /. (Int64.to_float total_size) in | ||
| fun x -> start +. x *. len | ||
| in | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion to make this code clearer. The name so_far suggests that we have something done and need to do more. I also think it is clearer to expose the argument x early. I'd write:
let mapfn x =
  let total     = Int64.to_float total_size in
  let done_     = Int64.to_float !so_far ./ total in
  let remaining = Int64.to_float vconf.size /. total in
    done_ +. x *. remaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integrated as a separate commit, thanks!
| @jonludlam , it's now updated. 
 | 
| let dest_vdi_ref = XenAPI.VDI.get_by_uuid remote.rpc remote.session vdi_uuid in | ||
| vconf.location,dest_vdi_ref,None,None | ||
| remote_vdi := vconf.location; | ||
| remote_vdi_reference := dest_vdi_ref | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will cause us to try to destroy a remote VDI that wasn't made by us in the exception handler, wont it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well spot! The exception handling logic should be aware if the remote disk wasn't created by mirror/copy and action accordingly. Code is updated.
…task failure ... previously we didn't do this which caused issue like CA-199734. A bit of code re-organization and refactoring is done as part of this. Mainly, we use reference cells to record configuration variables. These ref cells are gradually filled up with actual configurations along the process, so that the initialization status of these cells also reflects the overall progress, i.e. which steps have already been done and which haven't. With this information, the error handling logic can decide what to do or undo in case of failures. The benefits are, there is no need for multi layers of exceptions handling, also a single entry point for a unified cleanup function. Signed-off-by: Zheng Li <dev@zheng.li>
The code change was suggested by Christian Lindig <christian.lindig@citrix.com> for better naming and style. Signed-off-by: Zheng Li <dev@zheng.li>
| Hi @zli. I've been having some philosophical discussions with @simonjbeaumont and @johnelse about this PR. There has been a fair bit of refactoring recently to move to a continuation-passing style implementation in this module. Although, as you say, the logic of the cleanup seems quite reasonable and clear, we've been wondering how it would look when it's refactored into a 'with_dp', 'with_remote_vdi' or some similar formulation. Essentially having smaller functions that do more specific things with their own smaller exception handlers all chained together. I'd really like to see this in comparison with your current more imperative implementation. We can keep this pull request open and then make a decision about which one is clearer, as, although I think it likely that the CPS style will end up being simpler, I'm honestly not sure at this point whether it will be better or not. I'm quite keen to hear other peoples views on this too. | 
| I've marked this as blocked until we can compare with the CPS style fix. | 
| @zli - please prepare an alternative PR as described above. Thanks! | 
| @jonludlam: sure, will create a separate one. | 
| Closed in favour of the updated PR | 
This pull req does best-effort DP destroy and other cleanup actions in case of mirror/copy task failures. Along the way, we also made small amount of refactoring to make the code cleaner and more concise. See the summary of each commit for details.
Signed-off-by: Zheng Li dev@zheng.li