Athena and S3 Inventory. HIVE_BAD_DATA: Field size’s type LONG in ORC is incompatible with type varchar defined in table schema

Question:

I’m trying to understand how to work with s3 inventory.
I’m following this tutorial

After loading inventory lists into my table I’m trying to query it and find two issues.

1) SELECT key, size FROM table;
Size column for all records show a magic number(value) 4923069104295859283

2) select * from table;
Query Id: cf07c309-c685-4bf4-9705-8bca69b00b3c.

Receiving error:

Here is my table schema:

Answer:

The following command of any of your orc file coming from the inventory generated by AWS S3 will give you the actual structure of your inventory:

gt; hive –orcfiledump ~/Downloads/017c2014-1205-4431-a30d-2d9ae15492d6.orc

Processing data file /tmp/017017c2014-1205-4431-a30d-2d9ae15492d6.orc [length: 4741786]
Structure for /mp/017c2014-1205-4431-a30d-2d9ae15492d6.orc
File Version: 0.12 with ORC_135
Rows: 223473
Compression: ZLIB
Compression size: 262144
Type: struct<bucket:string,key:string,size:bigint,last_modified_date:timestamp,e_tag:string,storage_class:string,is_multipart_uploaded:boolean,replication_status:string,encryption_status:string>

It appears that the example provided by aws here expects that your inventory is not just for the current version but for all versions of objects in your bucket.

The right table structure for Athena is then for an encrypted bucket:

Leave a Reply