Skip to content

Conversation

@aschroed
Copy link
Member

@aschroed aschroed commented Mar 7, 2022

Adds some generally useful aws functions (currently all s3 operations) - haven't had time to complete unit tests for all functions though several functions are covered.

Perhaps similar functions may have been added to other utils packages by others?

I'd like to be able to use these functions for open data operations as I refactor some of that code. so would love to get this merged in

@aschroed aschroed requested a review from netsettler March 7, 2022 21:14
logger = logging.getLogger(__name__)


def s3_bucket_head(*, bucket_name, s3=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reorder arguments here so s3 comes first, and rename it s3_client

:return: dict: head response or None
"""
try:
s3 = s3 or s3Utils().s3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not do this this way as it will implicitly pull in a bunch of other things when calling s3Utils() - I would just do boto3.client('s3'). This comment applies to many functions throughout here.

Comment on lines 36 to 46
def s3_bucket_exists(*, bucket_name, s3=None):
""" Does a bucket exist?

:param bucket_name: name of the bucket - string
:param s3: AWS s3 client
:return: boolean - True if exists, False if not
"""
return bool(s3_bucket_head(bucket_name=bucket_name, s3=s3))


def s3_bucket_object_count(bucket_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the s3 convention inconsistently - I would recommend having it everywhere and writing a small helper that does the client initialization, if necessary.

"""
bucket = boto3.resource('s3').Bucket(bucket_name)
# get only head of objects so we can count them
return sum(1 for _ in bucket.objects.all())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work recursively or will it only count folders at top level? Thinking about how it would count things in our files/wfoutput buckets for example which all have a folder based hierarchy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something s3 does not store objects with a hierarchy (folders are just a convenience in console display?) so this would count all objects in the given bucket regardless of folder. I expect if empty folders were created then these would be included in the count so maybe checking that 0 size objects are not included may be warranted? If I am correct about this then what may indeed be useful is to support passing in a prefix and only return a count of those objects that share said prefix - for example in the dcic account in the '4dn-dcic-open-data-transfer-logs' bucket if a parameter '2022-08-01' was provided then a count of the 32 object in that folder/with that shared prefix would be returned.

return bool(s3_object_head(object_key=object_key, bucket_name=bucket_name, s3=s3))


def s3_put_object(*, object_key, obj, bucket_name, acl=None, s3=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should handle the kms_key_id case here as well, despite it not being used in Fourfront

@coveralls
Copy link

coveralls commented Oct 17, 2022

Pull Request Test Coverage Report for Build 3275617373

  • 49 of 134 (36.57%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.5%) to 72.461%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dcicutils/bucket_utils.py 36 121 29.75%
Totals Coverage Status
Change from base Build 3198667051: -0.5%
Covered Lines: 5736
Relevant Lines: 7916

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants