Skip to content

Latest commit

 

History

History
275 lines (196 loc) · 9.42 KB

S3.md

File metadata and controls

275 lines (196 loc) · 9.42 KB

AWS S3 - Simple Storage Service

AWS Object Storage Service.

Main features:

  • Object based (it can't edit one chunk of the file, it treats the file as a whole)
  • Provides a Rest and a SOAP (deprecated) interfaces

Definitions:

  • Objects:
    • Entities stored in S3 (image, audio, files...).
    • Objects data and its metadata.
    • The object created in a region never leaves that region (you can transfer or replicate it (CRR)).
  • Bucket:
    • A container that stores objects.
    • Bucket have regions but can be globally accessed.
  • Endpoint:
    • URL entry point for a web service. (Most AWS offer a regional entry point).
  • Key:
    • Objects unique identifier. The bucket, key (and optional version) identifies an object.

S3 vs EFS vs EBS

  • S3 is object storage designed for multiple purposes, not limited to EC2.
  • EFS provides scalable network file storage for EC2 service users.
  • EBS is block storage for EC2 instances (like a hard drive).

ARN

The following common Amazon Resource Name (ARN) format identifies resources in AWS:

arn:{partition}:s3:{region}:{namespace}:{relative-id}

The following ARN identifies the /dev/info.doc object in the lorem bucket:

arn:aws:s3:::lorem/dev/info.doc

The following identifies all objects under prefix 05/hr for access point test in account 123456789012 in us-east-2 region

arn:aws:s3:us-east-2:123456789012:accesspoint/test/object/05/hr/*`

Storage classes

Standard Intelligent-Tiering Standard-IA One Zone-IA Glacier Glacier Deep Archive
Purpose General Cost optimization Infrequently access Infrequently access Long-term low-cost Long-term low-cost
Replication (AZ) >= 3 >= 3 >= 3 1 >= 3 >= 3
Availability 99.99% 99.9% 99.9% 99.5% Minutes 12 hours
Min. charge N/A N/A / 30 days 128KB / 30 days 128KB / 30 days 40KB / 90 days 40KB / 180 days
Costs Retrieval fee *Monthly monitoring Lower cost than standard 20% less than Standard-IA Lower Lowest
Retrieval fee N/A N/A Per GB Per GB Per GB Per GB
Life Cycle Policies Yes Yes Yes Yes - -

Special characteristics

  • S3 Standard
    • Low latency and high throughput.
  • S3 Intelligent-Tiering
    • Moves no accessed objects automatically to lower tiers.
    • Charges a monthly monitoring and automation fee.
  • S3 Standard-IA (Standard Infrecuent Access)
    • Good for less frequent accessed objects.
  • S3 One Zone-IA
    • Good option for secondary backups.
  • S3 Glacier
    • Good option for rarely accessed data.
  • S3 Glacier Deep Archive
    • Lowest cost but longest retrieval time than Glacier.

Object Lifecycle Management

Manages the storage of objects and it associated costs. Is used to manage the storage classes of the object depending upon it usage (Transition) or its expiration time (Expiration).

Limitations

  • 5,500 GET requests per second per partitioned prefix in a bucket.
  • 3,500 PUT/COPY/POST/DELETE requests per second per partitioned prefix in a bucket
  • There are no limits to the number of prefixes in a bucket.
  • You can increase your read or write performance by using parallelization.

Data consistency model

Definitions:

  • Read after Write Consistent: "Consistent".
  • Eventually Read after Write Consistent: S3 will provide the final data when the object is finally created.
  • Eventually Consistent: S3 may take some time to provide the final data.

How S3 handles data consistency:

  • New object: "Read after Write Consistent".
  • Read while creating: "Eventually Read after Write Consistent".
  • Update object: "Eventually Consistent".
  • Delete object: "Eventually Consistent".

Versioning

Allows to mantain multiple versions of an object in the same bucket.

Deleteion can be MFA required.

States:

  • Un-versioned. Default. Objects has null version ID.
  • Versioning-enabled. Deleted objects have a "Deleted mark".
  • Versioning-suspended.

Object retrieval

Typical time for data retreival

img

Select from objects using Queries

Extract records from an stored CSV or JSON file using SQL expressions. Also supports GZIP compressed files.

Object upload

Transfer Acceleration

Enables file transfers over long distances. Useful when:

  • Customers needs to upload files worldwide to a centralized bucket.
  • Transfer large amounts of data across continents on a regular basis.
  • The available upload bandwidth is underutilized.

Multipart Upload

Enables upload of large objects (>100MB) in parts.

It's required for files larger than 5GB, and limited to files of 5TB.

Events

S3 can send notifications when some event occurs in your bucket.

Events can be sent to:

Encryption

Can be server-side (SSE) or client-side (CSE). Server-side uses AES-256.

To server-side encrypt an object at rest, the request should include the header x-amz-server-side-encryption. A bucket policy can be created to deny all other requests.

Starting January 5, 2023, all new object uploads to S3 will be automatically encrypted with SSE-S3 at no additional cost.

Encryption types

  • SSE (server-side).
    • SSE-S3.
      • Using S3 Managed Keys (Default since 2023-01-05).
      • Amazon handles key management and key protection.
    • SSE-C.
      • Using Custom Provided Keys.
      • Encryption keys are managed by the customer.
    • SSE-KMS.
      • Using KMS Managed Keys.
      • Allows to have separated permissions for the use of the envelope key and audit it use.
  • CSE (client side).
    • AWS-KMS-managed key.
      • Using KMS Managed Keys (CSE-KMS).
      • The client first request a new symmetric encryption key to KMS and KMS returns two versions of a randomly generated data key:
        • A plaintext data key to encrypt the data.
        • A cipher blob of the same data key to upload to S3 as object metadata.
    • Client-side Master Key.
      • Using a key stored within your application.

Access Control List (ACL)

Manage access on buckets and objects. Now it is advised to disable it except for special cases.

Access Analyzer

Can be enabled to get a list of all buckets with public access.

It provides the following information:

  • Bucket name
  • Discovered by Access analyzer
  • Shared through
  • Status
  • Access Level

Cross-Region Replication (CRR)

Automatic asynchronous copy of objects between regions.

Multiple rules can be added based upon the subset of objects to be matched. Can be based on:

  • Objects tag.
  • Object key prefixes.
  • Any combination of both.

Notes:

  • It replicates objects, metadata and tags.
  • It not replicate replicas.
  • Only applies to new objects since the replication config is set.
  • Both source and destination buckets must have versioning enabled.
  • Permissions & Security:
    • S3 must have permissions to replicate objects from the origin to the destination bucket on your behalf (rule must be created). For cross-account scenario, create a IAM Role.
    • Only replicates objects for which the bucket owner has permissions to read objects and Access Control List.
    • It works with SSE-S3 or SSE-KMS encryption.
  • Deletion:
    • Without versioning, S3 adds a delete marker and replicates to the destination.
    • With versioning, S3 deletes the objects version but does not replicate the deletion in the destination.

Static Website Hosting

In bucket properties we can set a bucket as an HTML static page with an index and an error page.

Remember to enable CORS it it will be accessed from a differente domain!

Logging

Server access logging

  • Provides records for the buckets requests.
  • Disabled by default.
  • Source and target bucket should be in the same region.
  • S3 Log Delivery must have permission to write on the target bucket.
  • Delivered on a "Best Effort Basis" (most within a few hours).

Object level logging

  • Provides records for the objects requests.
  • It's enabled on bucket level.
  • Requires CloudTrail service setup.
  • Leverages CloudTrail service offerings.

Object lock

Object locks apply to a specific version of an object in a versioned bucket.

Prevents accidental or inappropriate deletion of data.

Inventories

Multiple Inventory lists can be created for a bucket with inventory files saved in encrypted format.

Batch Operations

For creating Job request, the following parameters need to be specified:

  • Operation
  • Manifest
  • Priority
  • RoleArn
  • Report
  • Tags (optional)
  • Description (optional)

Best practices

  • Security:
    • Disable access control lists (ACLs).
    • Use policies and disable public access.
    • Implement least privilege access.
    • Consider encryption of data at rest.
    • Enforce encryption of data in transit (HTTPS (TLS)).
      • Use the aws:SecureTransport condition on bucket policies.
    • Consider S3 Object Lock
  • Performance:
    • Use CloudFront or ElastiCache for caching.
    • Use Transfer Acceleration for data transport over long distances.
    • Control KMS limits if you are using server-side with KMS (SSE-KMS).
    • Retry Requests for Latency-Sensitive Applications