AWS Kinesis

Fully managed service for real time processing of streaming data at massive scale.

 

Features:

  • Captures and stores tera bytes of data from thousands of sources such as clickstreams, transactions and social media.
  • Kinesis provides Kinesis Client Library(KCL) that can be used to build applications that power real time dashboards, generate alerts, implement dynamic pricing, and advertising.
  • Kinesis integrates with S3, EMR, and RedShift.
  • Kinesis allows for parallel processing of same stream data.
  • Can dynamically adjust the through put of input data from thousands to millions transactions per second.

Use Cases

  • Collect log and event data from sources such as servers, desktops, and mobile devices. You can then build Amazon Kinesis Applications to continuously process the data, generate metrics, and power live dashboards.
  • Continuously receive high volume logs generated by your applications or services and build Amazon Kinesis Applications to analyze the logs in real-time and trigger alerts in case of exceptions.
  • Have your mobile applications push data to Amazon Kinesis from hundreds of thousands of devices, making the data available to you as soon as it is produced on the mobile devices.

     

Limitations

  • Data records of an Amazon Kinesis stream are accessible for up to 24 hours from the time they are added to the stream.
  • The maximum size of a data blob (the data payload before Base64-encoding) within one put data transaction is 50 kilobytes (KB). 
  • Each shard can support up to 1000 put data transactions per second and 5 read data transactions per second.

Use Amazon Kinesis for below use cases:

  • Routing related data records to the same record processor (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
  • Ordering of data records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
  • Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
  • Ability to consume data records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis stores data for up to 24 hours, you can run the audit application up to 24 hours behind the billing application.

Use Amazon SQS for below use cases:

  • Messaging semantics (such as message-level ack/fail) and visibility timeout. For example, you have a queue of work items and want to track the successful completion of each item independently. Amazon SQS tracks the ack/fail, so the application does not have to maintain a persistent checkpoint/cursor. Amazon SQS will delete acked messages and redeliver failed messages after a configured visibility timeout.
  • Individual message delay. For example, you have a job queue and need to schedule individual jobs with a delay. With Amazon SQS, you can configure individual messages to have a delay of up to 15 minutes.
  • Dynamically increasing concurrency/throughput at read time. For example, you have a work queue and want to add more readers until the backlog is cleared. With Amazon Kinesis, you can scale up to a sufficient number of shards (note, however, that you'll need to provision enough shards ahead of time).
  • Leveraging Amazon SQS's ability to scale transparently. For example, you buffer requests and the load changes as a result of occasional load spikes or the natural growth of your business. Because each buffered request can be processed independently, Amazon SQS can scale transparently to handle the load without any provisioning instructions from you.