Lecture-14 ( Object Storage and CDN-S3, Glacier and CloudFront )
S3-101:
- S3 provides secure, highly-scalable object storage. Spread through multiple devices and facilities. Designed to sustain the loss of 2 facilities concurrently.
- Files size limit is upto 5 TB.
- Object based storage (images, video etc are each considered as a one Object). But for installing OS or running a DB, we need block based S3 buckets are managed GLOBALLY.
- Unlimited storage.
- Files are stored in Buckets (Folder).
- S3 has universal namespace (names must be unique globally)
- Creating a bucket actually create a DNS address like this
https://s3-eu-west-1.amazonaws.com/acloudguru
eu-west-1 is region name and
acoulguru is Bucket name. Rest of the URL is fixed.
So, the URL format will be always:
https://s3-RegionName.amazonaws.com/BucketName
- After every successful file upload we will get a HTTP 200 response.
- Read After Write consistency for PUTS of new Objects. :-means we can immediately read the file after write.
- Eventual Consistency for overwrite PUTS and DELETES (some time for propagation) :-means in case of update/delete it may take some time for the changes to be in effect.
- Updates of files in S3 are atomic. That means we will get either NEW or OLD version of file. (not a partial or corrupted file)
- Key-value storage. (name of the file – data of file)
- Files will have Version ID, Metadata (file creation/update date, author etc.) and Subresources (it includes Access control list and Torrent)
- S3 supports bittorrent protocol.
- S3 is designed to be Lexicographical (stored in alphabetic order). Suppose for log file created with date & time as file name will have very similar file names. For this, files will be stored in same sort of area, which is a performance bottleneck. So, we can add a random SALT at the start of the file name. This way the objects will be stored evenly across S3.
- Amazon guarantee 99.99% availability in SLA (Service Level Agreement).
- Amazon guarantees 99.999999999% durability for S3 information. We call it 11 x 9’s
- Tiered storage available.
- File’s Lifecycle Management: There are various TIER for file storage in S3. We can automatically transfer a file to different tier after certain days. Lets say after 30days we can schedule a file to move to another tier and after 90days archive it (if we want).
- S3-IA(Infrequent Access) for data that is accessed less but needs rapid access. Lower fee than S3 but there is a retrieval fee.
- Reduced Redundancy Storage: Durability 99.99% & availability 99.99%. It is cheaper than S3. Use Case: we use this for regeneratable data. For example, we can store images in S3 but their thumbnails in RRS.
- Glacier: Very cheap, but used for archival only, It takes 3-5 hours to restore from Glacier.
| Standard S3 | Standard Infrequent Access | Reduced Redundancy Storage | |
| Durability | 99.999999999% | 99.999999999% | 99.99% |
| Availability | 99.99% | 99.9% | 99.99% |
| Failure Tolerance | 2 facilities | 2 facilities | 1 facility |
| SSL support | Yes | Yes | Yes |
| First byte latency | Miliseconds | Miliseconds | Miliseconds |
| Lifecycle Management policies | Yes | Yes | Yes |
- Standard S3 Standard IA Vs. Glacier
| Standard S3 | Standard Infrequent Access | Glacier | |
| Designed for durability | 99.999999999% | 99.999999999% | 99.999999999% |
| Designed for Availability | 99.99% | 99.9% | N/A |
| Availability SLA | 99.9% | 99% | No SLA for glacier |
| Min size | N/A | 128KB | N/A |
| Min. Storage Duration | N/A | 30 Days | 90 Days |
| First Byte latency | Miliseconds | Miliseconds | Sec/m/hr |
| Storage class | Object | Object | object |
- S3 Charge Schedule:
Storage: How much data we are storing.
Requests: Number of requests made to S3 objects.
Storage Management Pricing: We can tag files with HR, ADMIN etc. The charge is on per Tag basis. Tags of Bucket do not inherited to object/files inside.
Data Transfer Pricing: Data coming into S3 is free. But there will be charge for moving around the data inside S3.
Transfer Acceleration Fee: Fast data transfer among users and S3 buckets over long distance. It is done by Amazon CloudFront’s globally distributed edge locations.
LECTURE-17 ( Cross Region Replication )
- Versioning must be enabled to set a replication policy.
- Existing objects will not be automatically replicated. Only new uploads and updated files will be replicated.
- To copy the existing objects we need to execute the command in CLI
aws s3 cp –recursive s3://sajib s3://sajibusbucket
This only replicates the files not the permissions.
- If we delete one file from source, it will also delete the file in the 2nd But, if we delete the DELETE MARKER (or any individual version) in the 1st bucket, it not delete the DELETE MARKER in the 2nd file. markers are replicated.
Testing on different scenarios after making a replication rule:
New bucket: Empty. Old files didn’t move. Then, modified content (uploading a new version) of an existing file, still nothing appeared in 2nd bucket. Modified the permission of file, still nothing in 2nd bucket.
Copying files through AWS CLI: Only the latest versions of the files were copied. No permissions were copied.
After updating a file: Yes, the file automatically copied in the 2nd bucket.
After changing a file public in original bucket: Yes, the change in permission also copied to the 2nd bucket.
After deleting an old version of a file: It didn’t take effect in 2nd bucket.
After deleting the latest version of a file: Yes, the file also deleted from 2nd bucket. A delete marker added. But, deleting the delete market in original file will not delete the delete marker in the 2nd file.
LECTURE-18 ( Lifecycle )
LECTURE-19 ( CDN )
The origin of the file that CDN will distribute can be a S3 bucket, EC2 instance, elastic load balancer or even route 53.
TTL: Time To Live
LECTURE-20 ( Cloudfront Lab )
Two types of distribution: Web and RTMP (for video streaming)
We can have multiple origin in same distribution.
LECTURE-21 ( Security and Encription )
Two ways to restrict access to our buckets and objects.
- By Bucket policy: It will apply on all of the objects inside bucket.
- By Access Control List: It is applied on individual objects.
S3 buckets can be configured to create access logs which log all requests made to the S3 bucket.
Two types of encryption:
In Transit: SSL/TLS (HTTPS)
At Rest: 4 different types of method.
Three (3) Server-side Encryption:
- S3 managed keys SSE-S3. Each object is encrypted with a unique key, additionally Amazon encrypts the key with a master key and regularly rotates the Master key.
- AWS key management service (KMS), managed keys SSE-KMS. Needs additional charge. It gives audit trail, let us know where it was used and by whom.
- Server side encryption with customer provided keys SSE-C.
One (1) client-side:
- Client-side encryption.
LECTURE-22 ( Storage Gateway )
AWS storage gateway is an on premise virtual appliance that can be used to cache s3 locally at a customer site. Storage Gateway is a service that connects an on-premise software appliance with cloud-based storage to provide seamless and secure integration between and organizations on-premises IT environment and AWS’s storage infrastructure. The service enables you to securely store data to the AWS cloud for scalable and cost effective storage.
AWS storage gateway software appliance is available for download as a virtual machine (VM) image that we install on a host in our datacenter. Storage Gateway supports either VMWare ESXi or Microsoft Hyper-V.
Four types of storage gateway:
File Gateway (NFS): Stores flat files like docs, pdfs, images into S3.
Volumes Gateway (iSCSI): This is block based storage. The volume interface presents your applications with disk volumes using the iSCSI block protocol. We can install OS&Databases on it. So, its like a HDD. Volume gateways take virtual hard disks that are on premise and we back them up to virtual Hard Disks that exist within AWS. Snapshots are incremental.
- Stored Volumes (Gateway stored volume): This way, we are storing entire copy of our data on site and we asynchronously back up that data to AWS.
- Cached Volumes (Gateway stored volume): This way we’re only storing the most recently accessed data on site and the rest of the data is backed off into Amazon. 1GB-32TB in size for cached volumes.
Tape Gateway (Gateway virtual tape library): Its backup and archiving solution. Allows us to create virtual tapes and send them to S3 and we can use lifecycle policy to send them to glacier. Uses popular applications like NetBackup, Backup Exec, Veeam etc.
LECTURE-23 ( Snowball )
Before snowball there was a import-export disk. AWS import/export disk accelerates moving large amounts of data into and out of the AWS cloud using portable storage devices for transport.
For transferring a large data, we could send a physical storage device containing the data to Amazon and they transfer the data to S3 by high speed internal network. It bypasses entire internet connection. But problem is to deal with lot of different type of devices and connection. So, in reinvent 2015, Amazon introduced three snowballs,
Snowball: 50TB. Physical storage device.
Snowball Edge: 100TB. Has compute capacity. Its more or less a AWS datacenter we can bring on premises. A AWS datacenter in a box.
Snowmobile: Petabytes and Exabytes of data. Upto 100 PB per snowmobile, a 45ft long shipping container. We can do both import/export through snowball.
LECTURE-24 ( LAB )
We need to download snowball command line tool (client). Download manifest and copy the client unlock code.
./snowball start -i IPAddress -m MenifestFileName -u UnlockCode
./snowball cp FileToBeCopied s3://BucketName
LECTURE-25 ( Transfer Accelaration )
It utilizes the Cloudfront Edge Network to accelerate uploads to S3. Instead of upload to S3, we can use a distinct URL to upload directly to an edge location which will then transfer that file to S3.
Distinct URL sample: acloudguru.s3-accelarate.amazonaws.com
LECTURE-26 ( a static website )
The format of the URL of static site would be:
http://BucketName.s3-website.ca-central -1.amazonaws.com
http://sajibwebsite.s3-website.ca-central-1.amazonaws.com
LECTURE-27 ( S3 Summary )
- S3 is object based storage.
- File size is 0 Bytes – 5 TB. Unlimited storage.
- Files are stored in buckets. Bucket names are universal.
- Read after write consistency for new objects, eventual consistency for update/delete an object.
- Storage tiers:
- Edge locations are not just READ only, we can write to them too (i.e. put an object on them). Then it will replicate up to the origin.
- Objects are cached for the life of the TTL (Time to Live).
- We can clear the cached objects but it will be charged. Use Case: I have updated my object and I don’t want my users to wait default 24 hours wait to get the updated object.
- S3 buckets can be configured to create access logs which logs all requests to S3 bucket. We can log it inside another bucket
- SSE-S3, SSE-KMS, SSE-C, client side encryption.
- We can load files to S3 much faster by enabling multipart upload.