I am trying to evaluate using Kinesis for stream processing log files. There is a separate process that uploads new logs into a S3 bucket – I can’t touch that process. I want to know if there’s a good way to stream new files that show up in the S3 log bucket into a Kinesis stream for processing. All documentation I’ve found so far covers using S3 as an output for the stream.
My current solution is to have a machine that constantly polls S3 for new files, downloads the new file to the local machine and streams it in using the Log4j appender. This seems inefficient. Is there a better way?
I realize this is a really old question, but have a look at AWS Lambda. It’s perfect for your use case, as illustrated here.
In your case, you would setup the s3 event such that each new object added to the bucket invokes your lambda function. In the lambda function you then write a few lines of code that read in the file and send the contents to the PutRecord (or PutRecords for batch) method for the Kinesis stream.
Not only will this work for your use case, but it’s also awesome since it checks off a few buzzwords: “serverless” and “realtime”!