最近挨踢小茶通过了AWS Certified Developer – Associate考试，现在将考试的时候整理的一些笔记分享出来，也方便其他需要的朋友来学习。
其中，除了DynamoDB内容之外，大部分内容都包含在AWS Solutions Architect Associate考试笔记里面，但是这里的更加详细一些。
- AWS Certified Developer - Associate Exam Guide
- Elastic BeanStalk - Youtube
- AWS FAQS
- Notes from other students: https://bitbucket.org/carlocarbone/aws-certification-study-notes
- Stored on SSD storage
- Spread across 3 geographically distinct data centers
- Individual attributes have no explicit size limit, but the total value of an item (including all attribute names and values) cannot exceed 400KB
Eventual Consistent Reads (Default)
Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data. (Best Read Performance)
Strongly Consistent Reads
A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.
- Items (think of a row of data in table)
- Attributes (think of a column of data in a table)
- Single Attribute (think Unique ID) - Partition Key (Hash Key) composed of one attribute
- Composite (think unique ID and a date range) - Partition Key & Sort Key (Hash & Range) composed of two attributes
- Partition Key
- Two items can not have the same partition key
- Partition Key and Sort Key (Composite)
- Two items can have the same partition key, but they must have a different sort key
- Local Secondary Index
- Has the SAME Partition key, different sort key
- Can ONLY be created when creating a table. They cannot be removed or modified later.
- Global Secondary Index
- Has DIFFERENT Partition key and different sort key
- Can be created at table creation or added LATER
- Used to capture any kind of modification of the DynamoDB tables
- DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table in the last 24 hours. You can access a stream with a simple API call and use it to keep other data stores up-to-date with the latest changes to DynamoDB or to take actions based on the changes made to your table.
- DynamoDB Triggers is a feature which allows you to execute custom actions based on item-level updates on a DynamoDB table. You can specify the custom action in code.
- The custom logic for a DynamoDB trigger is stored in an AWS Lambda function as code. To create a trigger for a given table, you can associate an AWS Lambda function to the stream (via DynamoDB Streams) on a DynamoDB table. When the table is updated, the updates are published to DynamoDB Streams. In turn, AWS Lambda reads the updates from the associated stream and executes the code in the function.
Query VS Scan
- A Query operation finds items in a table using only primary key attribute values. You must provide a partition key attribute name and a distinct value to search for.
- A Scan operation examines every item in the table. By default, a Scan returns all of the data attributes for every item; however, you can use the ProjectionExpression parameter so that the Scan only returns some of the attributes, rather than all of them.
- Query results are always sorted by the sort key in ascending order. Set ScanIndexForward parameter to false to reverse it.
- Try to use a query operation over a Scan operation as it's more efficient.
- Unit of Read Provisioned Throughput
- All reads are rounded up to increments of 4KB
- Eventually Consistent Reads(default) consist of 2 reads per second
- Strongly Consistent Reads consist of 1 read per second
- Unit of Write provisioned throughput
- All writes are 1 KB
- All writes consist of 1 write per second
- No difference of consistent mode (Eventually Consistent or Strongly Consistent) of write
Example question 1:
You have an application that requires to read 5 items of 10 KB per second using eventual consistency. What should you set the read throughput to?
10 KB rounded up to nearest increment of 4 KB is 12 KB
12 KB/4 KB = 3 KB units per item
3 x 5 read items = 15
Using eventual consistency we get 15 / 2 = 7.5
Throughput should be integer so 8 units of read throughput is needed
Example question 2:
You have an application that requires to read 5 items of 10 KB per second using strong consistency. What should you set the read throughput to?
10 KB rounded up to the nearest increment of 4 KB is 12 KB
12 KB/4 KB = 3 read units per item
3 x 5 read items = 15
Using strong consistency we DON'T divide by 2
15 units of read throughput is needed
Example question 3:
You have an application that requires to write 5 items, with each item being 10 KB in size per second. What should you set the write throughput to?
5 x 10 KB = 50 write units
Write throughput of 50 Units
400 HTTP Status Code - ProvisionedThroughputExceededException
You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes.
Steps taken for Web Identity Providers
- User Authenticates with ID provider (Facebook, Google)
- They are passed a Token by their ID provider
- You code calls AssumeRoleWithWebIdentity API and provides the providers token and specifies the ARN for the IAM role
- App can now access DynamoDB from between 15 minutes to 1 hour (1 hour is default)
Atomic VS Idempotent (conditional update)
- Idempotent: Use conditional update because the data should be 100% exactly right, should be use this on banking/voting system.
- Atomic: If the data can be not exactly correct, such as visiting counters we can use Atomic update.
If your application needs to read multiple items, you can use the BatchGetItem API. A single request can retrieve up to 1 MB of data, which can contain as many as 100 items. In addition, a single request can retrieve items from multiple tables.
Q: What are Projections?
The set of attributes that is copied into a local secondary index is called a projection. The projection determines the attributes that you will be able to retrieve with the most efficiency. When you query a local secondary index, Amazon DynamoDB can access any of the projected attributes, with the same performance characteristics as if those attributes were in a table of their own. If you need to retrieve any attributes that are not projected, Amazon DynamoDB will automatically fetch those attributes from the table.
When you define a local secondary index, you need to specify the attributes that will be projected into the index. At a minimum, each index entry consists of: (1) the table partition key value, (2) an attribute to serve as the index sort key, and (3) the table sort key value.
Beyond the minimum, you can also choose a user-specified list of other non-key attributes to project into the index. You can even choose to project all attributes into the index, in which case the index replicates the same data as the table itself, but the data is organized by the alternate sort key you specify.
Q: What is DynamoDB Fine-Grained Access Control?
Fine Grained Access Control (FGAC) gives a DynamoDB table owner a high degree of control over data in the table. Specifically, the table owner can indicate who (caller) can access which items or attributes of the table and perform what actions (read / write capability). FGAC is used in concert with AWS Identity and Access Management (IAM), which manages the security credentials and the associated permissions.
Q: How often can I change my provisioned throughput?
You can increase your provisioned throughput as often as you want. You can decrease it four times per day. A day is defined according to the GMT time zone. For example, if you decrease the provisioned throughput for your table four times on December 12th, you won’t be able to decrease the provisioned throughput for that table again until 12:01am GMT on December 13th.
Support language: IIS, PHP, Python, Ruby, Node.JS, Tomcat
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket and eventual consistency for overwrite PUTS and DELETES in all regions.
- S3: Durable, immediately available, frequently accessed
- S3-IA: Durable, immediately available, infrequently accessed
- Data that is deleted from Standard - IA within 30 days will be charged for a full 30 days
- has a minimum object size of 128KB
- S3-RRS(Reduced Redundancy Storage): Data that is easily reproducible, such as thumb nails etc
- Glacier: Archived data, where you can wait 3-5 hours before accessing
Cross Region Replication
- Versioning must be enabled on both the source and the destination buckets.
- Regions must be unique (cannot replicate bucket within the same region)
- Files in a existing bucket are not replicated automatically. All subsequent updated files will be replicated automatically.
- You cannot replicate to multiple buckets or use daisy chaining(at this time)
- Delete markers are replicated.
- Deleting individual versions or delete markers will not be replicated
- Can be used in conjunction with versioning.
- Can be applied to current versions and previous versions
- Following actions can now be done:
- Transition to Standard-IA (128kb and 30 days after the creation date)
- Archive to the Glacier Storage Class (30 days after IA, if relevant)
- Permanently Delete
- Edge locations are not just READ only, you can write to them too (ie put an object on to them).
- Objects are cached for the life of the TTL (Time to Live)
- You can clear cached objects, but you will be charged
- In Transit:
- At Rest
- Sever Side Encryption
- S3 Managed Keys - SSE-S3
- AWS Key Management Services, Managed Keys - SSE-KMS
- Server Side Encryption with Customer Provided Keys - SSE-C
- Client Side Encryption
- Sever Side Encryption
- Default encryption used in S3 is Advanced Encryption Standard(AES) 256
- File Gateway (NFS) - for flat files, stored directly on S3
- Volumes Gateway (iSCSI)
- Stored Volumes - Store your primary date locally, while asynchronously backing up that data to AWS.
- Cached Volumes - Use S3 as your primary storage while retaining frequently accessed data locally in your storage gateway.
- Gateway Virtual Tape Library (VTL) - Used for backup and uses popular backup applications like NetBackup, Backup Exec, Veam etc.
S3 Link Formats
- Bucket Format - https://s3-eu-west-1.amazonaws.com/acloudguru-website/index.html
- Website Format - http://acloudguro-website.s3-website-eu-west-1.amazonaws.com
Cross Origin Resource Sharing(CORS)
Specifying Server-Side Encryption Using the REST API
At the time of object creation—that is, when you are uploading a new object or making a copy of an existing object—you can specify if you want Amazon S3 to encrypt your data by adding the x-amz-server-side-encryption header to the request
S3 Object Tagging
S3 Object Tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object. With these, you’ll have the ability to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics. These object-level tags can then manage transitions between storage classes and expire objects in the background.
S3 Analytics - Storage Class Analysis
With storage class analysis, you can analyze storage access patterns and transition the right data to the right storage class. This new S3 Analytics feature automatically identifies infrequent access patterns to help you transition storage to Standard-IA. You can configure a storage class analysis policy to monitor an entire bucket, a prefix, or object tag. Once an infrequent access pattern is observed, you can easily create a new lifecycle age policy based on the results. Storage class analysis also provides daily visualizations of your storage usage on the AWS Management Console that you can export to a S3 bucket to analyze using business intelligence tools of your choice such as Amazon QuickSight.
You can simplify and speed up business workflows and big data jobs using S3 Inventory which provides a scheduled alternative to Amazon S3’s synchronous List API. S3 Inventory provides a CSV (Comma Separated Values) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix.
Request Rate and Performance Considerations
Amazon S3 best practices for optimizing performance depending on your request rates. If your workload in an Amazon S3 bucket routinely exceeds 100 PUT/LIST/DELETE requests per second or more than 300 GET requests per second, follow the guidelines in this topic to ensure the best performance and scalability.
- Default Visibility Timeout is 30 seconds
- Maximum Timeout/visibility is 12 hours
- You can extend the visibility timeout by using ChangeMessageVisibility action to specify a new timeout value
- Maximum retention period for SQS message is 14 days, The default is 4 days
- How large can an SQS message - 256KB
Long Polling (Vs Short polling or Standard polling)
While the traditional SQS short polling returns immediately, even if the queue being polled is empty, SQS long polling doesn't return a response until a message arrives in the queue, or the long poll times out.
What is the maximum long poll time out? - 20 seconds
Enabling Long Polling Using the API
The following table lists the API actions to use.
Create an SNS topic first using SNS. Then create and subscribe multiple SQS queues to the SNS topic. Now whenever a message is sent to the SNS topic, the message will be fanned out to the SQS queues, i .e. SNS will deliver the message to all the SQS queue that are subscribed to the topic.
Standard queues provide at-least-once delivery, which means that each message is delivered at least once.
FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it. Duplicates are not introduced into the queue.
Amazon SQS offers standard as the default queue type. A standard queue lets you have a nearly-unlimited number of transactions per second. Standard queues guarantee that a message is delivered at least once. However, occasionally (because of the highly-distributed architecture that allows high throughput), more than one copy of a message might be delivered out of order. Standard queues provide best-effort ordering which ensures that messages are generally delivered in the same order as they are sent.
The most important features of this queue type are FIFO (first-in-first-out) delivery and exactly-once processing: the order in which messages are sent and received is strictly preserved and a message is delivered once and remains available until a consumer processes and deletes it; duplicates are not introduced into the queue. FIFO queues also allow multiple ordered message groups within a single queue. FIFO queues are limited to 300 transactions per second (TPS) per API action, but have all the capabilities of standard queues.
A single FIFO queue currently supports throughput of up to 300 transactions per second (TPS) per API action
- Instantaneous, push-based delivery (no polling)
- Simple APIs and easy integration with applications
- Flexible message delivery over multiple transport protocols
- Inexpensive, pay-as-you-go model with no up-front costs
- Web-based AWS Management Console offers the simplicity of point-and-click interface
- Potocols includes:
- Amazon SQS
- Messages can be customized for each protocol
How Integration Between AD FS and AWS Works:
- The flow is initiated when a user (let’s call him Bob) browses to the ADFS sample site (https://Fully.Qualified.Domain.Name.Here/adfs/ls/IdpInitiatedSignOn.aspx) inside his domain. When you install ADFS, you get a new virtual directory named adfs for your default website, which includes this page
- The sign-on page authenticates Bob against AD. Depending on the browser Bob is using, he might be prompted for his AD username and password.
- Bob’s browser receives a SAML assertion in the form of an authentication response from ADFS.
- Bob’s browser posts the SAML assertion to the AWS sign-in endpoint for SAML (https://signin.aws.amazon.com/saml). Behind the scenes, sign-in uses the AssumeRoleWithSAML API to request temporary security credentials and then constructs a sign-in URL for the AWS Management Console.
- Bob’s browser receives the sign-in URL and is redirected to the console.
- No Transitive Peering
- Think of a VPC as a logical datacenter in AWS
- Consists of IGW's (Or Virtual Private Gateways), Route Tables, Network Access Control Lists, Subnets, Security Groups
- 1 Subnet = 1 Availability Zone
- Security Groups are Stateful, Network Access Control Lists are Stateless
- Can peer VPCs both in the same account or different accounts
- How many VPC's are allowed in each AWS Region by default - 5 VPC
- Preferred by the enterprise
- Scale automatically up to 10 Gbps
- No need to patch
- Not associated with security groups
- Automatically assigned a public IP address
- Remember to update your route tables
NAT vs Bastions
- A NAT is used to provide internet traffic to EC2 instances in private subnets
- A Bastion is used to securely administer EC2 instances (using SSH or RDP) in private subnets.
Default region for all SDKs is US-EAST-1
- Simple - default policy. when you have only 1 web server
- Weighted - 20% traffic to Instance-1 and 80% traffic to Instance-2. For A/B testing.
- Latency - allows you to route your traffic based on the lowest network latency for your end user (ie which region will give them the fastest respnse time).
- Failover - to create an active/passive setup
- Geolocation - your traffic will be sent based on the geographic location of your users (ie the location from which DNS queries orginate).
Alias record VS CNAME - Alias record is AWS resourse
Each Amazon Route 53 account is limited to a maximum of 500 hosted zones and 10,000 resource record sets per hosted zone.
- Consists of a domain, workers and deciders
- By default, if Cloud Formation encounters an error, it will terminate and rollback all resourses created on failure
Configure Sticky Sessions for Your Classic Load Balancer - By default, a Classic Load Balancer routes each request independently to the registered instance with the smallest load. However, you can use the sticky session feature (also known as session affinity), which enables the load balancer to bind a user's session to a specific instance. This ensures that all requests from the user during the session are sent to the same instance.
- You cannot have multiple SSL certificates (for multiple domain names) on a single ELB
- AWS services that use Key value pairs: SNS, SWF, DynamoDB, S3
- Which of these AWS services do not use key value pairs? - Route53