Question:
I have a file in S3 with the following data:
1 2 3 4 |
name,age,gender jill,30,f jack,32,m |
And a redshift external table to query that data using spectrum:
1 2 3 4 5 6 7 8 9 10 |
create external table spectrum.customers ( "name" varchar(50), "age" int, "gender" varchar(1)) row format delimited fields terminated by ',' lines terminated by \n' stored as textfile location 's3://...'; |
When querying the data I get the following result:
1 2 3 4 5 |
select * from spectrum.customers; name,age,g jill,30,f jack,32,m |
Is there an elegant way to skip the header row as part of the external table definition, similar to the tblproperties ("skip.header.line.count"="1")
option in Hive? Or is my only option (at least for now) to filter out the header rows as part of the select statement?
Answer:
Answered this in: How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.
This works in Redshift:
You want to use table properties ('skip.header.line.count'='1')
Along with other properties if you want, e.g. 'numRows'='100'
.
Here’s a sample:
1 2 3 4 5 6 7 8 9 10 |
create external table exreddb1.test_table (ID BIGINT ,NAME VARCHAR ) row format delimited fields terminated by ',' stored as textfile location 's3://mybucket/myfolder/' table properties ('numRows'='100', 'skip.header.line.count'='1'); |