Question:
I create a table in Athena with below structure
1 2 3 4 5 6 7 8 9 |
CREATE EXTERNAL TABLE s3_json_objects ( devId string, type string, status string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( 'ignore.malformed.json' = 'true') LOCATION 's3://mybucket/folder1/data/athena_test/'; |
S3 bucket objects contains JSON structure like this
123456 {"devId": "00abcdef1122334401","type": "lora","status": "huihuhukiyg"}
However below SQL working correctly and return the correct result for only count
1 2 |
SELECT count(*) as total_s3_objects FROM "athena_db"."s3_json_objects" |
BUT whenever I query below SQL select statement to fetch the
JSON values from S3, It’s returns result sets with empty values for columns
1234 SELECT devid FROM "athena_db"."s3_json_objects"SELECT json_extract(devid , '$.devid') as Id FROM "athena_db"."s3_json_objects"SELECT * FROM "athena_db"."s3_json_objects"
Also, I review these links before post this question on StackOverflow and AWS Athena doc
Can’t read json file via Amazon Athena
AWS Athena json_extract query from string field returns empty values
Any comments or suggestions would be much appreciated.
Answer:
The JSON must be in a single line, as mentioned in this page of the AWS Athena documentation. You can have multiple JSON objects on separate lines, but each complete object must only span one line.
Example (this could all be in one S3 object):
1 2 3 |
{"devId": "a1", "type": "b1", "status": "c1"} {"devId": "a2", "type": "b2", "status": "c2"} |