Question:
I’ve loaded tab separated files into S3 that with this type of folders under the bucket:
bucket –> se –> y=2013 –> m=07 –> d=14 –> h=00
each subfolder has 1 file that represent on hour of my traffic.
I then created an EMR workflow to run in interactive mode with hive.
When I log in to the master and get into hive I run this command:
1 2 3 4 5 6 7 |
CREATE EXTERNAL TABLE se ( id bigint, oc_date timestamp) partitioned by (y string, m string, d string, h string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://bi_data'; |
I get this error message:
FAILED: Error in metadata: java.lang.IllegalArgumentException: The
bucket name parameter must be specified when listing objects in a bucketFAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
Can anybody help?
UPDATE
Even if I try to use string fields only, I get the same error.
Create table with strings:
1 2 3 4 5 6 7 |
CREATE EXTERNAL TABLE se ( id string, oc_date string) partitioned by (y string, m string, d string, h string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://bi_data'; |
Hive version 0.8.1.8
Answer:
So, the solution is that I had two mistakes:
- When writing only the bucket name you should have a trailing slash in the S3 path.
reference here - The underscore is also an issue, the bucket name should be DNS compliant.
Hope I helped someone with this.