Question:
I have worked a bit with Kafka in the past and lately there is a requirement to port part of the data pipeline on AWS Kinesis Stream. Now I have read that Kinesis is effectively a fork of Kafka and share many similarities.
However I have failed to see how can we have multiple consumers reading from the same stream, each with their corresponding offset. There is a sequence number given to each data record, but I couldn’t find anything specific to consumer(Kafka group Id?).
Is it really possible to have different consumers with different ingestion rate over same AWS Kinesis Stream?
Answer:
Yes.
You can have multiple Kinesis Consumer Applications. Let’s say you have 2.
- First consumer application (I think it is “consumer group” in Kafka?) can be “first-app” and store it’s positions in the DynamoDB “first-app-table”. It can have as many nodes (ec2 instances) as you want.
- Second consumer application can also work on the same stream, and store it’s positions on another DynamoDB table let’s say “second-app-table”.
Each table will contain “what is the last processed position on shard X for app Y” information. So the 2 applications store checkpoints for the same shards in a different place, which makes them independent.
About the ingestion rate, there is a “idleTimeBetweenReadsInMillis” value in consumer applications using KCL, that is the polling interval for Amazon Kinesis API for Get operations. For example first application can have “2000” poll interval, so it will poll stream’s shards every 2 seconds to see if any new record came.
I don’t know Kafka well but as far as I remember; Kafka “partition” is “shard” in Kinesis, likewise Kafka “offset” is “sequence number” in Kinesis. Kinesis Consumer Library uses the term “checkpoint” for the stored sequences. Like you said, the concepts are similar.