Amazon S3 Select From not working

Question:

Amazon S3 has a new feature called select from which allows one to run simple SQL queries against simple data files – like CSV or JSON. So I thought I’d try it.

I created and uploaded the following CSV to my S3 bucket in Oregon (I consider this file to be extremely simple):

I indicated this was CSV with a header row and issued the following SQL:

select * from s3object s

…which worked as expected, returning:

Then I tried one of the provided sample queries, which failed:

…the error message was “Some headers in the query are missing from the file. Please check the file and try again.”.

Also tried the following, each time receiving the same error:

So anytime my query references a column, either by name or number, either in the SELECT or WHERE clauses, I get the “headers in the query are missing”. The AWS documentation provides no follow up information on this error.

So my question is, what’s wrong? Is there an undocumented requirement about the column headers? Is there an undocumented way to reference columns? Does the “Select From” feature have a bug in it?

Answer:

I did the following:

  • Created a file with the contents you show above
  • Entered S3 Select on the file, and ticked File has header row
  • Changed no other settings

These queries did NOT work:

The reason they didn’t work is that the file contains headers, so the columns have actual names.

These queries DID work:

When I treated the last two queries as strings, they returned the row as expected:

Leave a Reply