31skills: Must-Know Checklists & Resources For IT Professionals.

  • First of all, be sure to understand the importance of the cultural points: read more here The 15-point DevOps Check List
  • You should master *nix systems and have a good understanding of how Linux distributions work.
  • Be at ease with Terminal. You may have GUIs to manage your servers but you have to LOVE the terminal no matter what’s the case, it is faster, secure and honestly it is easier once you master it.
  • How to get the CPU/system info (cat /proc/version, /proc/cpuinfo, uptime, et. al.)
  • How cron jobs works. Set cron jobs on specific days/time/month.
  • How to know what OS you are running on your machine (cat /etc/lsb-release)
  • Learn the difference between different *nix OSs and How to know what OS you are running on your machine (e.g. cat /etc/lsb-release)
  • Difference between shells: sh/dash/bash/ash/zsh ..
  • How to set and unset ENV variables. Exporting ENV variable is temporary, how to export permanent variables ?
  • What are shell configuration files : ~/.bashrc, .bash_profile, .environment .. How to “source” settings for program initialization files.
  • Knowing Vim, its configuration (.vimrc) and some of its basic tips is a must.
  • How logging works in *nix systems, what are logging levels and how to work with log management tools (rsyslog, logstash, fluentd, logwatch, awslogs ..)
  • How swapping works. What is swappiness. (swapon -s, /proc/sys/vm/swappiness, sysctl vm.swappiness ..)
  • How to view/set network configuration on a system
  • How to set static/dynamic IP address on a machine with different subnets? (Hint: CIDR)
  • How to view/set/backup your router settings?
  • How DNS works ? How to set-up a DNS server (Bind, Unbound, PowerDNS, Dnsmasq ..) ? What is the difference between recursive and authoritative DNS ? How to troubleshoot DNS (nslookup, dig ..etc)
  • Get familiar with DNS and A, AAAA, C, CNAME, TXT records
  • How SSH works, how to debug it and how you can generate ssh keys and do passwordless login to other machines
  • How to set-up a web server (Apache, Nginx ..)
  • What are the difference between Nginx and Apache ? When to use Nginx ? When to use Apache ? You may use both of them in the same web application, when and how ?
  • How to set-up a reverse-proxy (Nginx ..)
  • How to set-up a caching server ( Squid, Nginx ..)
  • How to set-up a load balancer ( HAproxy, Nginx ..)
  • What is an init system ? Do you know Systemd (used by Ubuntu since 15.04) , Upstart (developed by Ubuntu), SysV ..
  • Get familiar with Systemd and how to analyze and manage services using commands like systemctl and journalctl
  • Compiling any software from its source (gcc, make and other related stuff)
  • How to compress/decompress a file in different formats via terminal (mostly: tar/tar.gz)
  • How to set-up firewalls (iptables at least ufw) : set rules,list rules, route traffic, block a protocol/port ..
  • Learn most used port numbers on which services runs by default (like: SSH (22), Web (80), HTTP/S (443) etc.)
  • Learn how to live debug and trace running application in production servers.
  • Be at ease with scripting languages. Bash is a must (Other scripting languages are very useful like Python, Perl..).
  • Learn how to use at least one of the configuration management and remote execution tools (Ansible, Puppet, SaltStack, Chef ..etc). Your choice should be based on criterias like: syntax, performance, templating language, push vs pull model, performance, architecture, integration with other tools, scalability, availability ..etc.
  • Learn how to configure and use continuous integration and continuous delivery tools like Jenkins, Travis CI, Buildbot, GoCd. Integrating this tools with other tools (like Selenium, build tools, configuration management software, Docker, Cloud providers’ SDKs ..etc) is helpful.
  • Learn distributed version control system Git and its basic commands (pull/push/commit/clone/branch/merge/logs…etc.). Understand git workflows. Do you know how to revert a Git repository to a previous commit ?
  • How to use SSH-keys. Try Github, Bitbucket or Gitlab .. to configure passwordless access to the repo/account
  • Get familiar with tools like Vagrant to help you create distributable and portable development environments.
  • Start looking into infrastructure as code and infrastructure provisioning automation tools like Terraform and Packer
  • Start looking into containers and Docker. It’s underlying architecture (cgroups and namespaces). How it works?
  • Start getting familiar with basic Docker commands (logs/inspect/top/ps/rm). Also look into docker hub (push/pull image)
  • Start looking into container orchestration tools: Docker Swarm, Kubernetes, Mesosphere DC/OS, AWS ECS
  • Dive into DB (MySQL or any other which you like)
  • Learn about Redis/Memcache and similar tools
  • What is your backup strategy ? How do you test if a backup is reliable
  • Do you know ext4, ntfs, fat ? Do you know Union filesystems ?
  • Develop your Cloud Computing skills. Start by choosing a cloud infrastructure provider: Amazon Web Services, Google Cloud Platform, Digitalocean, Microsoft Azure. Or create your own private cloud using OpenStack.
  • What about staging servers ? What is your testing strategy Unit Testing ? End-to-end ? So you really need staging servers ? Google “staging servers must die”.
  • Read about PaaS/Iaas/Saas/CaaS/FaaS/DaaS and serverless architecture
  • Learn how to use and configure Cloud resources from your CLI using Cloud Shells or from your programs using Cloud SDKs
  • Are you familiar with the OSI Model and the TCP/IP model specifications ? What are the difference between TCP and UDP ? Do you know vxlan ?
  • Master useful commands like process monitoring commands (ps, top, htop, atop ..), system performance commands (nmon, iostat, sar, vmstat ..) and network troubleshooting and analysis (nmap, tcpdump, ping, traceroute, airmon, airodump ..).
  • Get to know HTTP status codes (2xx, 3xx, 4xx, 5xx)
  • Are you familiar with IDEs (Sublime Text, Atom, Eclipse ..) ?
  • Network packet analysis: tcpdump, Wireshark ..
  • What happens exactly when you hit google.com in the browser? From your browser’s cache, local DNS cache, local network configuration(hosts file), routing, DNS, network, web protocols, caching systems to web servers (Most basic question yet difficult if goes deep).
  • Get familiar with the mumble-jumble of Kernel versions and how to patch them.
  • Get familiar with how SSL/TLS works and how digital certificates works (https)
  • Get familiar with secure protocols: TLS, STARTTLS, SSL, HTTPS, SCP, SSH, SFTP, FTPS ..
  • Know the difference between PPTP, OpenVPN, L2TP/IPSec
  • How to generate checksums (md5, SHA ..) to validate the integrity of any file
  • Get to know the difference between Monolithic and Microservices architecture.
  • Get to know the pro/cons of Microservices architecture and start building similar architectures
  • Do you know what is ChatOps ? Have you tried working with one of the known frameworks ? Hubot, Lita, Cog ?
  • How do you make zero downtime deployment ? What is your strategy to make rollbacks, self-healing, auto-scalability ?
  • Get familiar with APIs and services: RESTfull, RESTful-like, API gateways, Lambda functions, serverless computing,SOA, SOAP, JMS, CRUD ..
  • Read about stateless and stateful applications
  • Read about DevOps glossary (Google it)
  • How to secure your infrastructure, network and running applications ?
  • Learn how set-up, configure and use some monitoring systems (Nagios, Zabix, Sensu, prometheus..etc)
  • Read about Open Source and how you can contribute to Open source projects. For more info: http://jvns.ca/blog/2016/10/26/a-few-questions-about-open-source/
  • You should be able to do post-mortem if something bad happens to your systems. Make a detailed documentation about what went wrong and how we can prevent not to let it happen in future again.
  • Try to learn the approach how experts from StackOverflow are solving any problem. Always remember, it’s the technology which keeps on changing not the basics. Basics always remains the same.
  • Make Google, StackOverflow, Quora and other professional forums your friends.
  • Try to establish good development practices and a solid architecture.
  • Learn how to scale at production level.
  • Follow open-source projects (Kubernetes/Docker etc.) or what excites you.
  • Ask questions/doubts/issues on mailing-lists/public forums/tech meetups and learn from those.
  • Follow like-minded folks from the community and be updated with latest tech trends.
  • Follow some decent tech companies engineering blogs (We follow: Google/Uber/Quora/Github/Netflix). This is the place from where you can learn straight from the experts and get a chance to see their approach to solve any problem.
  • Browse a few aggregators like Reddit, hackernews, medium .. etc.
  • Follow like-minded developers and tech companies on twitter. ( I am always reading articles and watching talks/conferences, post-mortems are some of my favorite content. I also follow a few github repos to see what’s going on with the technology that I use.)
  • Read various technology related blogs and subscribe to DevOps Newsletters.
  • If possible attend local area meetups and conferences. You will get a chance to learn a lot from seniors and others.
  • Learn about scalability and highly distributed systems. How to keep them UP and Running all the time?
  • Last but not the least… Read Books.

Original Post: Original Url: https://hackernoon.com/the-must-know-checklist-for-devops-system-reliability-engineers-f74c1cbf259d

Analyze Data with Presto and Airpal on Amazon EMR | AWS Big Data Blog

Airpal - a web-based query execution tool open-sourced by Airbnb that leverages Presto to facilitate data analysis. Airpal has many helplful features. For example, you can highlight syntax, export results to  CSV for download, view query history, save queries, use a Table Finder to search for appropriate tables, and use Table Explorer to visualize the schema of a table. We have created an AWS CloudFormation script that makes it easy to set up Airpal on an AmazonEC2 instance on AWS.



How to pass AWS Professional Certification - Cloud7Works Exam Strategy Presentation

Here is a presentation that Cloud7Works presented in a DC AWS Tech Meet Up on how to clear AWS Professional Certification. Please drop any comments or questions if you have. We will be adding exam notes and some planning in the next posts.


Vagrant vs Docker

Docker = containers technology. The idea is that you have isolated work loads on a linux machine in that it uses the existing kernel and it's Processes (creating a container). We can refer to these containers as a Virtual Environment and not a Virtual Machine.

Vagrant = Managed VMs (virtual machines) . It allows you to script and package the Virtual Machine configuration and the provisioning setup. It is designed to run on top of almost any VM tool.
At its core, Vagrant is a simple wrapper around Virtualbox or VMware (and other hypervisor technologies). You can do "Infrastructure as Code" or DevOps. In that your entire virtual machine can be both created and managed all with a single vagrant file without even opening up the console for either virtual box or vmware.
Virtual Machines are full blown isolated "virtual' operating systems.
You use Vagrant to deploy to VirtualBox, VMWare, AWS, etc.

I think with a lot of the noise out there this is yet another technology in which it really shouldn't be (WHICH should I use) but more so... two DIFFERENT technologies. So depending on your needs. I see them both being used in the same shops however fitting two different needs.

VAGRANT will provide you with a full Virtual Machine, including the OS. It's great at providing you a Linux environment for example when you're on MacOS, or Windows.
DOCKER is a lightweight Virtual Environment of sorts and not a virtual machine. It will allow you to build contained architectures faster and cheaper than with Vagrant. Why? Because containers make better use of overhead. 

Another way of thinking about it:
Vagrant is a virtual machine manager because it allows us to script out the virtual machine configuration as well as the provisioning. However, it is STILL a virtual machine depending on Virtual Box, AWS, VMware, and many other with a huge overhead. It requires a LOT more hard drive space, and takes a lot of ram, and performance depending on your resources are sometimes not that great.

Docker however. uses kernel namespacing via lxc. (linux containers) It means that you are using the same kernel as the host and the same file system. You use a Dockerfile with the docker build command in order to handle the provisioning and configuration of your container. 

Ref: https://linuxacademy.com/cp/community/view/id/1593/posted/1


What is puppet

  1. Infrastructure automation and configuration management tool.
  2. Enforces the defined state of the infrastructure
  3. Can automate tasks on thousands of machines
  4. Enables infrastructure as code
  5. Allows configuration consistency across nodes
  6. Enables quick provisioning of new machines in new environments
  7. Allows DevOps admins to write declarative instructions using the puppet language
  8. Code is written inside of classes and classes are assigned to node
  9. Puppet is written in Ruby Language.
  10. Acceptance testing can be done by Beaker, a product developed by puppet.
  11. Fundamentally what all we are doing with puppet is managing resources on a large and automated scale while caring as little as possible about the platform and distribution.

Definitions & Features

  1. Puppet Nodes: Nodes are any virtual or physical system that is able to run puppet agent and is specifically supported by puppet agent.
  2. Puppet has puppet.conf file:
    1. This the main config file, which contains 4 sections, main, master, agent and user.
    2. Settings are loaded at service start time.
    3. In agent section, you can define runinterval, which tells puppet how often agent daemon runs.
  3. In puppet you have resources. Every resource is an instance of resource type.
  4. Resource types can be: file, package, service, .
  5. A system configuration is a collection of resources.
  6. A catalog in puppet describes the desired state for each resource on a system.
  7. Puppet doesn't enforce resources from top down, instead on dependency relationships.
  8. Resource meta parameters can be applied on any resource type.
    1. Require (need a referenced resource to be applied first;
    2. Before(request to be applied before a referenced resource)
    3. Subscribe(listen for puppet changes)
    4. Notify (send a notification when puppet changes the containing resource)
    5. Other: schedule, alias, audit, noop, loglevel, tag.    
  9. You define variables with $var in puppet; Three scopes for variables: Top, node, class. Puppet always finds the closest scope to the code being executed.
  10. Profiles are a grouping of technology configurations and class declarations.
  11. Role is based on a business function. I.e., webserver, database server etc.,
  12. Puppet module is a command used to help manage building and downloading modules from the puppet forge .



Migrating to AWS- Coca Cola

Below is a keynote speech of Cocacola moving to AWS.

Key points

  1. Saved 40% in costs moving to AWS
  2. Used Elastic Bean Stalk, Cloud Formation, Auto Scaling
  3. Used Splunk for realtime logs
  4. Used Memcache for session handling. Last slide has link to code for doing such.
  5. Used WINDUP to perform analysis on WAR file and detect what needs to be modified in legacy code.
  6. Used Agile for PM.
  7. Other technologies used: 12factor.net



AWS Kinesis

Fully managed service for real time processing of streaming data at massive scale.



  • Captures and stores tera bytes of data from thousands of sources such as clickstreams, transactions and social media.
  • Kinesis provides Kinesis Client Library(KCL) that can be used to build applications that power real time dashboards, generate alerts, implement dynamic pricing, and advertising.
  • Kinesis integrates with S3, EMR, and RedShift.
  • Kinesis allows for parallel processing of same stream data.
  • Can dynamically adjust the through put of input data from thousands to millions transactions per second.

Use Cases

  • Collect log and event data from sources such as servers, desktops, and mobile devices. You can then build Amazon Kinesis Applications to continuously process the data, generate metrics, and power live dashboards.
  • Continuously receive high volume logs generated by your applications or services and build Amazon Kinesis Applications to analyze the logs in real-time and trigger alerts in case of exceptions.
  • Have your mobile applications push data to Amazon Kinesis from hundreds of thousands of devices, making the data available to you as soon as it is produced on the mobile devices.



  • Data records of an Amazon Kinesis stream are accessible for up to 24 hours from the time they are added to the stream.
  • The maximum size of a data blob (the data payload before Base64-encoding) within one put data transaction is 50 kilobytes (KB). 
  • Each shard can support up to 1000 put data transactions per second and 5 read data transactions per second.

Use Amazon Kinesis for below use cases:

  • Routing related data records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
  • Ordering of data records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
  • Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
  • Ability to consume data records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis stores data for up to 24 hours, you can run the audit application up to 24 hours behind the billing application.

Use Amazon SQS for below use cases:

  • Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.
  • Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.
  • Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).
  • Leveraging Amazon SQS's ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.

AWS Storage Gateway

The AWS Storage Gateway is a service connecting an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization's on-premises IT environment and AWS's storage infrastructure.


Ref: Amazon.com


  • Provides low-latency performance by maintaining frequently accessed data on-premises while securely storing all of your data encrypted in Amazon Simple Storage Service (Amazon S3) or Amazon Glacier.
  • AWS Storage Gateway supports three configurations
    • Gateway-Cached Volumes: Gateway-Cached Volumes provide substantial cost savings on primary storage, minimize the need to scale your storage on-premises, and provide low-latency access to your frequently accessed data.
    • Gateway-Stored Volumes: In the event you need low-latency access to your entire data set, you can configure your gateway to store your primary data locally, and asynchronously back up point-in-time snapshots of this data to Amazon S3.

Video Links

Excellent intro and architecture overview : https://www.youtube.com/watch?v=Ut5TG1ueU1E (35 min)

How to do it: https://www.youtube.com/watch?v=Bb8nk0oWJbU ( 10 mins)


Basic; Videos (2); Use Cases; Product Overview; FAQ; When to use it and not;

Amazon EMR

Amazon EMR (Elastic MapReduce) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. Amazon EMR uses Hadoop, an open source framework, to distribute your data and processing across a resizable cluster of Amazon EC2 instances.

  • Using EMR, run multiple clusters with different sizes, specs and node types
  • Transient Cluster: Shut down the cluster when the job is done. Use it when (data load time + processing time )*no.of jobs < 24 hrs
  • Alive Clusters: Cluster stays around after the job is done. Able to share data between multiple jobs. Use it when (data load time + processing time )*no.of jobs > 24 hrs
  • Core Nodes: Runs TaskTrackers and DataNode (HDFS). You can add core nodes, but cannot remove them.
  • Task Nodes: Runs Task Trackers but no Datanodes. Reads from Core node HDFS. Can add/remove nodes. Use for speeding up job processing. Need extra horse power to pull data from S3.
  • You can use Amazon S3 as HDFS. Permanent data (Results) are stored in S3, while intermediate results are in HDFS. This way when the job is done, you can delete the cluster.
    • This way, S3 can share data with multiple clusters. HDFS cannot do this.
    • Don't use S3 when you are processing the same data set more than once.
  • If job is CPU/Memory bounded data then locality doesn't make a difference
  • Use S3 and HDFS for I/O intensive work loads. Store data in S3, pull using s3distcp, and process in HDFS.

  • In order to add nodes elastically follow this architecture.
    • Monitor cluster capacity with Elastic Cloud Watch
    • Have a SNS topic to notify elastic bean stalk to deploy an application
    • The application adds the corresponding nodes to your cluster.

Best Practices

  • Use M1.xlarge and larger nodes for production work loads
  • Use CC2 for memory and CPU intensive
  • Use CC2/C1.xlarge for CPU intensive
  • Hs1 for HDFS workloads
  • Hi1 and Hs1 for Disk I/O
  • Prefer smaller cluster of larger nodes than larger cluster of small nodes.
  • Estimated no of nodes = (Total Mappers *Time to process sample files)/(Instance mapper capacity *Desired Processing time)

Video Links

EMR Best Practices and Deep Dive


Cloud Hardware Security Module (HSM)

The AWS Cloud HSM service allows you to protect your encryption keys within HSMs designed and validated to government standards for secure key management.


  • You can securely generate, store, and manage the cryptographic keys used for data encryption such that they are accessible only by you. 
  • AWS CloudHSM protects your cryptographic keys with tamper-resistant HSM appliances that are designed to comply with international (Common Criteria EAL4+) and U.S. Government (NIST FIPS 140-2) regulatory standards for cryptographic modules. 
  • By placing CloudHSMs in your VPC near your EC2 instances, you can reduce network latency and increase the performance of your AWS applications that use HSMs.
  • Use the CloudHSM service to support a variety of use cases and applications, such as database encryption, Digital Rights Management (DRM), Public Key Infrastructure (PKI), authentication and authorization, document signing, and transaction processing.
  • Applications use standard cryptographic APIs, in conjunction with HSM client software installed on the application instance, to send cryptographic requests to the HSM
  • CloudHSM must be provisioned inside a VPC.


Best Practices:


Video links:

Encryption and Key Management in AWS (Basics of encryption on client/server side: Until 22:00 mins; HSM : 22:50 – 40:00; Netflix Use case: 46:00)

Cloud Trail

A web service that records AWS API calls for your account and delivers log files to you.


  • Log contains: request, response, ipAddress, time of call.
  • Uses s3, so you can keep policies to delete old log files.
  • Integrates with Splunk, Loggly, AlertLogic.
  • Can aggregate log files from multiple aws a/c's.
  • Used for security analysis, compliance with regulatory standards, track changes to AWS Resources and trouble shoot operational issues.
  • By default, CloudTrail log files are encrypted using S3 Server Side Encryption (SSE) and placed into your S3 bucket. You can control access to log files by applying IAM or S3 bucket policies. You can add an additional layer of security by enabling S3 Multi Factor Authentication (MFA) Delete on your S3 bucket. 
  • CloudTrail delivers an event within 15 minutes of the API call.
  • S3 logging is not handled through cloud trail, you need to enable server access logging as mentioned here.

Amazon Cognito

Cognito is a simple user identity and data synchronization service that helps you manage and synchronize app data for your users across multiple devices



  • Simple backend infrastructure to manage users, their data and states.
  • Provides Amazon, Facebook, Google or our own user identity system integration for Identity Management
  • Mainly synchs and stores user data (can be anything) multiple mobile or connected devices.
  • Supports unauthenticated identities.
  • Provides AWS MobileSDK and Server side API calls for synching data.
  • Amazon Cognito Sync store is a key/value pair store linked to an Amazon Cognito Identity.
  • Each user information store can have a maximum size of 20 MB.
  • By Default Amazon Cognito maintains the last-written version of the data and synch's to local data store of cognito accordingly.
  • Only when you call the synchronize() method you will be charged, not for read/writes to local datastore.

When to Use



Search in Mobile.awsblog.com for cognito for all related articles.

AWS Mobile Analytics

Amazon Mobile Analytics is a service that lets you easily collect, visualize, and understand app usage data at scale. Many mobile app analytics solutions deliver usage data several hours after the events occur. Amazon Mobile Analytics is designed to deliver usage reports within 60 minutes of receiving data from an app so that you can act on the data more quickly.

  • Integrates with Amazon Cognito
  • Integrates with Android SDK
  • Can be used with IOS, Android, and FireOS apps.
  • Custom events can also be generated. Custom Events are events defined entirely by you. They help track user actions specific to your app or game. The Custom Events Report provides a view of how often custom events occur and can be filtered based on Custom Event Attributes and their associated values.

Use Cases


Best Practices:



Demo Link: Here

AWS Pro Exam Details

FAQ : here

No.Of Questions: 80 -100 (No hard numbers anywhere. See my post here)

Time for the exam: 170 mins.

Pass Percentage: N/A (I am keeping 75% as my pass percentage, because most of the exams pass percentage is around 65%, even for CSA exam it's 65%)

Practice exam # of questions/time: 40Q/90 mins (2.25 mins/Q)

I think the main exam may have 75 questions, this is because I know the total time limit is 170 mins, and the practice exam gives 2.25 mins per question. If you divide the total time to time per question, you come to the 75 question limit. I may be wrong but something to think about.

AWS Professional Exam Prep - 01

I've been preparing for the AWS professional exam for the last 1 month and thought to blog my preparation and strategies here. I'm going to blog about

  1. AWS & All its services with main focus on
    1. What is the service about
    2. When to Use the service
    3. What are its best practices
    4. Use cases for the service with links to any video from AWS.
  2. 6 White papers as discussed in this exam blue print
  3. 6 Topics and notes on each topic as discussed in exam blue print.
  4. At the end I am going to take a practice exam.

To complete above 4, I am planning to keep 72 hours of time which includes the practice exam. Breakdown of hours as below

  1. 20 hours (Total 36 services; Some services I really know in-depth, so will allocate less time to it.)
  2. 12 hours (2 hours for each white paper)
  3. 30 hours (5 hours for each topic)

Currently I feel that I might fail the practice exam, but I believe it will expose me to what is needed to prepare for the exam. Also I've spent quite a bit of time on details for each service in AWS CSA exam. So this time I am going to focus more on architecture aspects and usability of aws services to applications. Some of the things I would like to do later are:

  1. Listen to aws videos on use cases of aws services to different applications/systems etc.,
  2. Read most of the white papers and customer white papers
  3. Cross check exam blue print to see if I have covered all the topics.

My exam is on 27th, so I am planning to take two practice tests one on 10/13 and one on 10/20. That's the plan. My mantra is:

Amazon Webservices Architect Certification

I have recently cleared the Amazon web services certified architect associate (CSA) certification and I wanted to pen my thoughts here so that it will be useful for future aspirants. Below are my opinions based on my research and preparation. My Background: I am a JEE architect in my current role and my work involves Designing/Developing web based applications. I am also an Oracle Certified Enterprise Architect, which has no bearing for writing CSA exam and wouldn't give you an edge for it.

Motivation for AWS CSA: Last year while browsing for architect certifications, I found Amazon CSA exam. The exam content, the usability to real world applications (than just theory), and the recognition were the key motivations for me to write the exam.

My Approach:

  1. Google'd blogs/articles of people who have taken the exam and how they got succeeded.
  2. Peered with one of my friend who was also interested.
  3. Enrolled in LinuxAcademy.com AWS CSA course.
  4. Kept a target of 1 week to complete the entire course (around 6 hours) - Round 1 prep.
  5. For round 2, I realized that I need to go deep dive on Virtual Private Cloud (VPC). So I went through the entire documentation literally line by line and was using wiki, networking videos on youtube and all resources that I can find to understand the terminology used (Subnets, CIDR, EIP, Ip address formulas etc.,). Initially it was tough-believe me, but I was persistent and have decided that I will understand VPC to the best. So it took me 3 days to complete the entire VPC sections with notes. I read for about 16 hours close to (Took 1 day off from work-- just to keep the tempo). After VPC documentation, I went through VPC FAQ and ended the section with a Quiz. I believe the best quiz is to go through FAQ, and try answering them one by one with out looking at answers.
  6. Same as VPC, I went deep dive on EC2, and S3. S3 is very deep with lot of details on how to secure the bucket. I loved reading that portion, this is because the way it is designed is amazing, you can secure the bucket, the object, and entire S3 at different levels (IAM, Bucket level policy and ACL). There are couple of rules that if you can remember would be helpful.
  7. For EC2 and S3 I have practiced most of the scenarios mentioned in LinuxAcademy.com course. I even took their developer course, since there were some classes which are in developer but not in CSA and both the courses help you get the complete details about a topic.
  8. After completing all the relevant topics for exams. I read three white papers: AWS Storage options, AWS Security best practices and Disaster Recovery. AWS Storage Options is an excellent paper, by the end of it, given a AWS scenario you can confidently pick which Storage option is good for the company (Cost and other factors).
  9. After completing all the reading, I took all the quiz, exams from CSA and developer in Linux Academy. I was skipping questions which were not relevant to CSA from developer course.

So, how to pass:

  1. First believe that you want to pass. I know this sentence might sound silly, but you got to be adamant that you want to pass this exam (if you are coming from my background and little AWS experience). This is because, the exam is not an easy one, and also not super hard one either. But if you prepare less and go you will fall for the traps. So by believing you want to pass, you will come up with a plan of how to ace it. In my case, I took a pen and paper and jotted down all the topics I wanted to cover and associated time against each topic (like 2 hours etc.,) and I was always thinking how to clear it or how would I lose it. Their is a slogan in AWS, "design for failure", same thing here, "design for success":)
  2. Time it. You need to have a plan for how you will be completing the 55 questions in 80 minutes. What would you do if you face a tough question, what is your time plan, or by how much time would you be half way through the exam. You need to decide these before you go to the exam. My Mantra was simple, skip it if it is tough (Mark it for review). My goal was to complete the first pass in 40 mins with giving my best to each question. So you would read a question and eliminate the wrong ones and pick the best and move on. This usually takes 1 min. If it takes more, check mark the review later button. I was able to complete my first pass in 40 mins with 17 questions for review. I completed those in 20 mins and I went through all the questions once in the last 20 mins. So timing is very important, some questions are designed tough and no one might know the answer to it, spending time on it is not advisable.
  3. Five things to do: Concept, Practice, Use Case, FAQ, and Quiz. For any topic these five are very important. Get the concept first, for this take the course and understand the basics of it, then go to documentation and read about it. Next is practice, fire up AWS console and try it, like creating a bucket, load balancer or auto scaling, VPC ingress, security etc., For use cases, AWS provides lots of use cases and architecture diagrams, you can go through them and get a big picture of the technology and usage. Finally complete the topic with FAQ sections and Quiz from linux academy. Also documentation for some topics are very vast, I only read documentation for top 3 things in AWS (S3, EC2, VPC). So it's upto you, plan it based on your time.

It took me a month to prepare, with more time on weekends and 5 days continuos prep(10 hours each day) before going to the exam. After completing the exam I feel very confident in using AWS technology and in the last 1 week we have ported an existing web application to a AWS based application. I am planning to take the DEV/SYSOPS certifications some time in the next 1-2 months, so that I am comfortable using the AWS CLI/SDK.

Finally, good luck to all who are aspiring to become AWS CSA and I wish you the best.