I’m running a SELECT Athena query on an S3 bucket manifest. I then want to use the results of that query, in .csv format, in an S3 Batch operation.
My query runs fine and I am able to access the .csv output via S3 Batch, but since the first row is actually column headers, S3 Batch to throws an unrecoverable error because it thinks that the manifest is now referring to multiple buckets.
How can I easily strip the column headers out of my results? I would prefer to just do it in SQL. The file size makes using standard unix tools prohibitive. I could use AWS Glue, but this seems like overkill for just suppressing headers in a SQL query.
Here’s a hacky way to get around it
SELECT bucket as "my-bucket-name", key as "fakekey"
This will make your header look like the rest of the file which will not break the S3 Batch copy job. You will have just one failed record of fakekey