Question:
I have an Athena table partitioned by year, month, day with the following definition
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE EXTERNAL TABLE `my_table`( `price` double) PARTITIONED BY ( `year` int, `month` int, `day` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
I need to query it between dates. The options, as I see it, for example are:
1 2 3 4 |
SELECT avg(price) FROM my_table WHERE year = 2018 AND month = 1 |
Result: Run time: 4.89 seconds, Data scanned: 20.72MB
1 2 3 4 5 6 7 8 |
SELECT avg(price) FROM my_table WHERE cast(date_parse(concat(cast(year as varchar(4)),'-', cast(month as varchar(2)),'-', cast(day as varchar(2)) ), '%Y-%m-%d') as date) BETWEEN Date '2018-01-01' AND Date '2018-01-31' |
Result: Run time: 8.64 seconds, Data scanned: 20.72MB
So, I guess Athena is smart enough to use the partitioning feature even when casting the concatenated partitioned columns, so why does it take approx. twice the time? What exactly is going on in the backstage?
Many thanks.
Answer:
Athena will use a filterPredicate in that case, you can use EXPLAIN ANALYSE statement to check this:
1 2 3 |
EXPLAIN ANALYZE SELECT count(*) FROM "db"."table" where year||month||day >= '20220629'; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
... - ScanFilterProject[table = awsdatacatalog:HiveTableHandle{schemaName=db, tableName=table, analyzePartitionValues=Optional.empty}, grouped = false, filterPredicate = ("concat"("concat"("year", "month"), "day") >= CAST('20220629' AS varchar))] => [[]] CPU: 2.57s (99.04%), Output: 12424 rows (0B) Input avg.: 49.11 rows, Input std.dev.: 54.32% LAYOUT: db.table month := month:string:-1:PARTITION_KEY :: [[06], [07]] year := year:string:-1:PARTITION_KEY :: [[2022]] day := day:string:-1:PARTITION_KEY :: [[05], [06], [07], [11], [12], [13], [14], [15], [16], [17], [18], [19], [29], [30]] Input: 12424 rows (5.68kB), Filtered: 0.00% ... |