I’m using Kinesis Firehose to copy application logs from CloudWatch Logs into S3 buckets.
- Application logs are written to CloudWatch
- A Kinesis subscription on the log group pulls the log events into a Kinesis stream.
- A firehose delivery stream uses a Lambda function to decompress and transform the source record.
- Firehose writes the transformed record to an S3 destination with GZIP compression enabled.
However, there is a problem with this flow. Often I’ve noticed that the Lambda transform function fails because the output data exceeds the 6 MiB response payload limit for Lambda synchronous invocation. It makes sense this would happen because the input is compressed but the output is not compressed. Doing it this way seems like the only way to get the file extension and MIME type set correctly on the resultant object in S3.
Is there any way to deliver the input to the Lambda transform function uncompressed?
This would align the input/output sizes. I’ve already tried reducing the buffer size on the Firehose delivery stream, but the buffer size limit seems to be on compressed data, not raw data.
No, it doesn’t seem possible to change whether the input from CloudWatch Logs is compressed. CloudWatch Logs will always push GZIP-compressed payloads onto the Kinesis stream.
For confirmation, take a look at the AWS reference implementation kinesis-firehose-cloudwatch-logs-processor of the newline handler for CloudWatch Logs. This handler accepts GZIP-compressed input and returns the decompressed message as output. In order to work around the 6 MiB limit and avoid
body size is too long error messages, the reference handler slices the input into two parts: payloads that fit within the 6 MiB limit, and the remainder. The remainder is re-inserted into Kinesis using