First of all thank you very much for considering my question. Hope it’s not too silly.
I am just wondering whether there is a way to filtering data on Kinesis Stream at the point of getting the data record out of the stream. The AWS official doc says the partition key is used to
“allows the consumer that processes a particular shard to be designed with the assumption that records with the same partition key would only be sent to that consumer”
There is no way to specify (neither using the REST API, nor using KCL) which partition key that I am interested in reading data record of directly.
Data record with same partition key will be hashed to same shards but how we could know which shard it is by just knowing the partition key ?
Ultimate question is: How Can I create a consumer that only receiving data of a particular partition key ? / How can I create consumer that only receiving data that it is interested in.
Thank you very much for your time considering my question and sharing you thoughts !
UPDATE 2021-02-10 :
Had this conclusion eariler than this date but just happen to revisit this question at this date.
For the benefit of those who just read it or started using Kinesis:
I think “Sharding in general” is (or was, not sure the current state of sharding) not designed for implementing business logic but mainly for handling the scaling of data volume (a big data technique – in my simple understanding)
Again, not sure about Kinesis today but the requirement still stands and I guess Kafka is the answer to this question however, however Kafka might still not provide you the functionality you need out of box.
You can use SNS or asynchronous re-invocations of your function.
Read more here where I answered a similar question: https://stackoverflow.com/a/51281888/1988232