Skip to content

Commit

Permalink
AWS disk modification wait method updated to return faster
Browse files Browse the repository at this point in the history
Summary:
We saw that increasing disk size on AWS can take up to a couple of hours. The main reason for that is the optimization that AWS is doing on the disk behind the scene. [[ This page | https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-volume-modifications.html ]] says this operation can up to 24 hours.

In this diff, the AWS disk modification wait method has changed such that it returns as soon as the volume modification state is "optimizing" rather than "completed".

Please note the added disk size is accessible while it is in the "optimizing" state, but it may not have the optimized performance until the state is "completed".

Test Plan:
The resizeNode operation that uses the changed method in this diff was called with different values
while the sample app was running.

Reviewers: arnav, sanketh

Reviewed By: sanketh

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D12414
  • Loading branch information
shahrooz1997 committed Jul 28, 2021
1 parent 652d3cd commit d1f8fc0
Showing 1 changed file with 19 additions and 14 deletions.
33 changes: 19 additions & 14 deletions managed/devops/opscli/ybops/cloud/aws/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1077,27 +1077,32 @@ def _update_dns_record_set(hosted_zone_id, domain_name_prefix, ip_list, action):


def _wait_for_disk_modifications(ec2_client, vol_ids):
num_vols_completed = 0
# This function returns as soon as the volume state is optimizing, not completed.
num_vols_to_modify = len(vol_ids)
# It should retry for a 6 hour limit
retry_num = int((6 * 3600) / AbstractCloud.SSH_WAIT_SECONDS) + 1
# Loop till the progress is at 100 or the limit is reached
while retry_num != 0:
# It should retry for a 1 hour time limit.
retry_num = int((1 * 3600) / AbstractCloud.SSH_WAIT_SECONDS) + 1
# Loop till all volumes are modified or the limit is reached.
while retry_num > 0:
num_vols_modified = 0
response = ec2_client.describe_volumes_modifications(VolumeIds=vol_ids)
# The response format can be found here:
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.describe_volumes_modifications
for entry in response['VolumesModifications']:
if entry['Progress'] == 100:
if entry['ModificationState'] != 'completed':
raise YBOpsRuntimeError(("Disk {} could not be modified.").format(
entry['VolumeId']))
else:
num_vols_completed += 1
for entry in response["VolumesModifications"]:
if entry["ModificationState"] == "failed":
raise YBOpsRuntimeError(("Mofication of disk {} failed.").format(
entry['VolumeId']))

if entry["ModificationState"] == "optimizing" or \
entry["ModificationState"] == "completed":
# Modifying completed.
num_vols_modified += 1

# This means all volumes have completed modification.
if num_vols_completed == num_vols_to_modify:
if num_vols_modified == num_vols_to_modify:
break

time.sleep(AbstractCloud.SSH_WAIT_SECONDS)
retry_num -= 1

if retry_num == 0:
if retry_num <= 0:
raise YBOpsRuntimeError("wait_for_disk_modifications failed. Retry limit reached.")

0 comments on commit d1f8fc0

Please sign in to comment.