AWS Certified Developer - Associate考试笔记

2017 年 8 月 11 日 114394点热度 1人点赞 2条评论

最近挨踢小茶通过了AWS Certified Developer – Associate考试，现在将考试的时候整理的一些笔记分享出来，也方便其他需要的朋友来学习。

其中，除了DynamoDB内容之外，大部分内容都包含在AWS Solutions Architect Associate考试笔记里面，但是这里的更加详细一些。

AWS Certified Developer - Associate Exam Guide
Elastic BeanStalk - Youtube
AWS FAQS
Notes from other students: https://bitbucket.org/carlocarbone/aws-certification-study-notes

DynamoDB

Stored on SSD storage
Spread across 3 geographically distinct data centers
Individual attributes have no explicit size limit, but the total value of an item (including all attribute names and values) cannot exceed 400KB

Eventual Consistent Reads (Default)
Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data. (Best Read Performance)

Strongly Consistent Reads
A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.

Basics

Tables
Items (think of a row of data in table)
Attributes (think of a column of data in a table)

Primary Keys

Single Attribute (think Unique ID) - Partition Key (Hash Key) composed of one attribute
Composite (think unique ID and a date range) - Partition Key & Sort Key (Hash & Range) composed of two attributes

DynamoDB Index

Partition Key
- Two items can not have the same partition key
Partition Key and Sort Key (Composite)
- Two items can have the same partition key, but they must have a different sort key
Local Secondary Index
- Has the SAME Partition key, different sort key
- Can ONLY be created when creating a table. They cannot be removed or modified later.
Global Secondary Index
- Has DIFFERENT Partition key and different sort key
- Can be created at table creation or added LATER

DynamoDB Stream

Used to capture any kind of modification of the DynamoDB tables
DynamoDB Streams provides a time-ordered sequence of item-level changes made to data in a table in the last 24 hours. You can access a stream with a simple API call and use it to keep other data stores up-to-date with the latest changes to DynamoDB or to take actions based on the changes made to your table.

DynamoDB Triggers

DynamoDB Triggers is a feature which allows you to execute custom actions based on item-level updates on a DynamoDB table. You can specify the custom action in code.
The custom logic for a DynamoDB trigger is stored in an AWS Lambda function as code. To create a trigger for a given table, you can associate an AWS Lambda function to the stream (via DynamoDB Streams) on a DynamoDB table. When the table is updated, the updates are published to DynamoDB Streams. In turn, AWS Lambda reads the updates from the associated stream and executes the code in the function.

Query VS Scan

A Query operation finds items in a table using only primary key attribute values. You must provide a partition key attribute name and a distinct value to search for.
A Scan operation examines every item in the table. By default, a Scan returns all of the data attributes for every item; however, you can use the ProjectionExpression parameter so that the Scan only returns some of the attributes, rather than all of them.
Query results are always sorted by the sort key in ascending order. Set ScanIndexForward parameter to false to reverse it.
Try to use a query operation over a Scan operation as it's more efficient.

DynamoDB Read/Write Throughput Calculation

Provisioned Throughput:

Unit of Read Provisioned Throughput
- All reads are rounded up to increments of 4KB
- Eventually Consistent Reads(default) consist of 2 reads per second
- Strongly Consistent Reads consist of 1 read per second
Unit of Write provisioned throughput
- All writes are 1 KB
- All writes consist of 1 write per second
- No difference of consistent mode (Eventually Consistent or Strongly Consistent) of write

Example question 1:

You have an application that requires to read 5 items of 10 KB per second using eventual consistency. What should you set the read throughput to?

10 KB rounded up to nearest increment of 4 KB is 12 KB
12 KB/4 KB = 3 KB units per item

3 x 5 read items = 15
Using eventual consistency we get 15 / 2 = 7.5

Throughput should be integer so 8 units of read throughput is needed

Example question 2:

You have an application that requires to read 5 items of 10 KB per second using strong consistency. What should you set the read throughput to?

10 KB rounded up to the nearest increment of 4 KB is 12 KB
12 KB/4 KB = 3 read units per item

3 x 5 read items = 15
Using strong consistency we DON'T divide by 2

15 units of read throughput is needed

Example question 3:

You have an application that requires to write 5 items, with each item being 10 KB in size per second. What should you set the write throughput to?

5 x 10 KB = 50 write units

Write throughput of 50 Units

400 HTTP Status Code - ProvisionedThroughputExceededException
You exceeded your maximum allowed provisioned throughput for a table or for one or more global secondary indexes.

Steps taken for Web Identity Providers

User Authenticates with ID provider (Facebook, Google)
They are passed a Token by their ID provider
You code calls AssumeRoleWithWebIdentity API and provides the providers token and specifies the ARN for the IAM role
App can now access DynamoDB from between 15 minutes to 1 hour (1 hour is default)

Atomic VS Idempotent (conditional update)

Idempotent: Use conditional update because the data should be 100% exactly right, should be use this on banking/voting system.
Atomic: If the data can be not exactly correct, such as visiting counters we can use Atomic update.

Batch Operations
If your application needs to read multiple items, you can use the BatchGetItem API. A single request can retrieve up to 1 MB of data, which can contain as many as 100 items. In addition, a single request can retrieve items from multiple tables.

Q: What are Projections?

The set of attributes that is copied into a local secondary index is called a projection. The projection determines the attributes that you will be able to retrieve with the most efficiency. When you query a local secondary index, Amazon DynamoDB can access any of the projected attributes, with the same performance characteristics as if those attributes were in a table of their own. If you need to retrieve any attributes that are not projected, Amazon DynamoDB will automatically fetch those attributes from the table.

When you define a local secondary index, you need to specify the attributes that will be projected into the index. At a minimum, each index entry consists of: (1) the table partition key value, (2) an attribute to serve as the index sort key, and (3) the table sort key value.

Beyond the minimum, you can also choose a user-specified list of other non-key attributes to project into the index. You can even choose to project all attributes into the index, in which case the index replicates the same data as the table itself, but the data is organized by the alternate sort key you specify.

Q: What is DynamoDB Fine-Grained Access Control?

Fine Grained Access Control (FGAC) gives a DynamoDB table owner a high degree of control over data in the table. Specifically, the table owner can indicate who (caller) can access which items or attributes of the table and perform what actions (read / write capability). FGAC is used in concert with AWS Identity and Access Management (IAM), which manages the security credentials and the associated permissions.

Q: How often can I change my provisioned throughput?

You can increase your provisioned throughput as often as you want. You can decrease it four times per day. A day is defined according to the GMT time zone. For example, if you decrease the provisioned throughput for your table four times on December 12th, you won’t be able to decrease the provisioned throughput for that table again until 12:01am GMT on December 13th.

Elastic Beanstalk

Support language: IIS, PHP, Python, Ruby, Node.JS, Tomcat

S3

Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket and eventual consistency for overwrite PUTS and DELETES in all regions.

S3: Durable, immediately available, frequently accessed
S3-IA: Durable, immediately available, infrequently accessed
- Data that is deleted from Standard - IA within 30 days will be charged for a full 30 days
- has a minimum object size of 128KB
S3-RRS(Reduced Redundancy Storage): Data that is easily reproducible, such as thumb nails etc
Glacier: Archived data, where you can wait 3-5 hours before accessing

Cross Region Replication

Versioning must be enabled on both the source and the destination buckets.
Regions must be unique (cannot replicate bucket within the same region)
Files in a existing bucket are not replicated automatically. All subsequent updated files will be replicated automatically.
You cannot replicate to multiple buckets or use daisy chaining(at this time)
Delete markers are replicated.
Deleting individual versions or delete markers will not be replicated

Lifecycle Management

Can be used in conjunction with versioning.
Can be applied to current versions and previous versions
Following actions can now be done:
- Transition to Standard-IA (128kb and 30 days after the creation date)
- Archive to the Glacier Storage Class (30 days after IA, if relevant)
- Permanently Delete

CDN/CloudFront

Edge locations are not just READ only, you can write to them too (ie put an object on to them).
Objects are cached for the life of the TTL (Time to Live)
You can clear cached objects, but you will be charged

Encryption

In Transit:
- SSL/TLS
At Rest
- Sever Side Encryption
  - S3 Managed Keys - SSE-S3
  - AWS Key Management Services, Managed Keys - SSE-KMS
  - Server Side Encryption with Customer Provided Keys - SSE-C
- Client Side Encryption
Default encryption used in S3 is Advanced Encryption Standard(AES) 256

Storage Gateways

File Gateway (NFS) - for flat files, stored directly on S3
Volumes Gateway (iSCSI)
- Stored Volumes - Store your primary date locally, while asynchronously backing up that data to AWS.
- Cached Volumes - Use S3 as your primary storage while retaining frequently accessed data locally in your storage gateway.
Gateway Virtual Tape Library (VTL) - Used for backup and uses popular backup applications like NetBackup, Backup Exec, Veam etc.

S3 Link Formats

Bucket Format - https://s3-eu-west-1.amazonaws.com/acloudguru-website/index.html
Website Format - http://acloudguro-website.s3-website-eu-west-1.amazonaws.com

Cross Origin Resource Sharing(CORS)

Specifying Server-Side Encryption Using the REST API
At the time of object creation—that is, when you are uploading a new object or making a copy of an existing object—you can specify if you want Amazon S3 to encrypt your data by adding the x-amz-server-side-encryption header to the request

S3 Object Tagging
S3 Object Tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object. With these, you’ll have the ability to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics. These object-level tags can then manage transitions between storage classes and expire objects in the background.

S3 Analytics - Storage Class Analysis
With storage class analysis, you can analyze storage access patterns and transition the right data to the right storage class. This new S3 Analytics feature automatically identifies infrequent access patterns to help you transition storage to Standard-IA. You can configure a storage class analysis policy to monitor an entire bucket, a prefix, or object tag. Once an infrequent access pattern is observed, you can easily create a new lifecycle age policy based on the results. Storage class analysis also provides daily visualizations of your storage usage on the AWS Management Console that you can export to a S3 bucket to analyze using business intelligence tools of your choice such as Amazon QuickSight.

S3 Inventory
You can simplify and speed up business workflows and big data jobs using S3 Inventory which provides a scheduled alternative to Amazon S3’s synchronous List API. S3 Inventory provides a CSV (Comma Separated Values) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix.

Request Rate and Performance Considerations
Amazon S3 best practices for optimizing performance depending on your request rates. If your workload in an Amazon S3 bucket routinely exceeds 100 PUT/LIST/DELETE requests per second or more than 300 GET requests per second, follow the guidelines in this topic to ensure the best performance and scalability.

SQS

Default Visibility Timeout is 30 seconds
Maximum Timeout/visibility is 12 hours
You can extend the visibility timeout by using ChangeMessageVisibility action to specify a new timeout value
Maximum retention period for SQS message is 14 days, The default is 4 days
How large can an SQS message - 256KB

Long Polling (Vs Short polling or Standard polling)
While the traditional SQS short polling returns immediately, even if the queue being polled is empty, SQS long polling doesn't return a response until a message arrives in the queue, or the long poll times out.
What is the maximum long poll time out? - 20 seconds

Enabling Long Polling Using the API
The following table lists the API actions to use.

Use this action	Use...
`ReceiveMessage`	`WaitTimeSeconds` parameter
`CreateQueue`	`ReceiveMessageWaitTimeSeconds` attribute
`SetQueueAttributes`	`ReceiveMessageWaitTimeSeconds` attribute

Fanning Out
Create an SNS topic first using SNS. Then create and subscribe multiple SQS queues to the SNS topic. Now whenever a message is sent to the SNS topic, the message will be fanned out to the SQS queues, i .e. SNS will deliver the message to all the SQS queue that are subscribed to the topic.

Standard queues provide at-least-once delivery, which means that each message is delivered at least once.
FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it. Duplicates are not introduced into the queue.

Standard Queues
Amazon SQS offers standard as the default queue type. A standard queue lets you have a nearly-unlimited number of transactions per second. Standard queues guarantee that a message is delivered at least once. However, occasionally (because of the highly-distributed architecture that allows high throughput), more than one copy of a message might be delivered out of order. Standard queues provide best-effort ordering which ensures that messages are generally delivered in the same order as they are sent.

sqs-what-is-sqs-standard-queue-diagram

FIFO Queues

The most important features of this queue type are FIFO (first-in-first-out) delivery and exactly-once processing: the order in which messages are sent and received is strictly preserved and a message is delivered once and remains available until a consumer processes and deletes it; duplicates are not introduced into the queue. FIFO queues also allow multiple ordered message groups within a single queue. FIFO queues are limited to 300 transactions per second (TPS) per API action, but have all the capabilities of standard queues.

sqs-what-is-sqs-fifo-queue-diagram

A single FIFO queue currently supports throughput of up to 300 transactions per second (TPS) per API action

SNS

Instantaneous, push-based delivery (no polling)
Simple APIs and easy integration with applications
Flexible message delivery over multiple transport protocols
Inexpensive, pay-as-you-go model with no up-front costs
Web-based AWS Management Console offers the simplicity of point-and-click interface
Potocols includes:
- HTTP
- HTTPS
- Email
- Email-JSON
- Amazon SQS
- Application
Messages can be customized for each protocol

IAM

How Integration Between AD FS and AWS Works:

The flow is initiated when a user (let’s call him Bob) browses to the ADFS sample site (https://Fully.Qualified.Domain.Name.Here/adfs/ls/IdpInitiatedSignOn.aspx) inside his domain. When you install ADFS, you get a new virtual directory named adfs for your default website, which includes this page
The sign-on page authenticates Bob against AD. Depending on the browser Bob is using, he might be prompted for his AD username and password.
Bob’s browser receives a SAML assertion in the form of an authentication response from ADFS.
Bob’s browser posts the SAML assertion to the AWS sign-in endpoint for SAML (https://signin.aws.amazon.com/saml). Behind the scenes, sign-in uses the AssumeRoleWithSAML API to request temporary security credentials and then constructs a sign-in URL for the AWS Management Console.
Bob’s browser receives the sign-in URL and is redirected to the console.

VPC

No Transitive Peering
Think of a VPC as a logical datacenter in AWS
Consists of IGW's (Or Virtual Private Gateways), Route Tables, Network Access Control Lists, Subnets, Security Groups
1 Subnet = 1 Availability Zone
Security Groups are Stateful, Network Access Control Lists are Stateless
Can peer VPCs both in the same account or different accounts
How many VPC's are allowed in each AWS Region by default - 5 VPC

NAT Gateways

Preferred by the enterprise
Scale automatically up to 10 Gbps
No need to patch
Not associated with security groups
Automatically assigned a public IP address
Remember to update your route tables

NAT vs Bastions

A NAT is used to provide internet traffic to EC2 instances in private subnets
A Bastion is used to securely administer EC2 instances (using SSH or RDP) in private subnets.

SDK

Default region for all SDKs is US-EAST-1

Route53

Routing Policy:

Simple - default policy. when you have only 1 web server
Weighted - 20% traffic to Instance-1 and 80% traffic to Instance-2. For A/B testing.
Latency - allows you to route your traffic based on the lowest network latency for your end user (ie which region will give them the fastest respnse time).
Failover - to create an active/passive setup
Geolocation - your traffic will be sent based on the geographic location of your users (ie the location from which DNS queries orginate).

Alias record VS CNAME - Alias record is AWS resourse
Each Amazon Route 53 account is limited to a maximum of 500 hosted zones and 10,000 resource record sets per hosted zone.

SWF

Consists of a domain, workers and deciders

Cloud Formation

By default, if Cloud Formation encounters an error, it will terminate and rollback all resourses created on failure

ELB

Configure Sticky Sessions for Your Classic Load Balancer - By default, a Classic Load Balancer routes each request independently to the registered instance with the smallest load. However, you can use the sticky session feature (also known as session affinity), which enables the load balancer to bind a user's session to a specific instance. This ensures that all requests from the user during the session are sent to the same instance.

http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html

General

You cannot have multiple SSL certificates (for multiple domain names) on a single ELB
AWS services that use Key value pairs: SNS, SWF, DynamoDB, S3
Which of these AWS services do not use key value pairs? - Route53