loading Redshift from S3 (with partitions)

Question:

My S3 file (s3://MY_s3/stage/my_s3_file) has a partition on load_dt:

S3 Structure:

The actual files are under load_dt partitions like 000024 & 000036.

The COPY command works fine if I am not keeping load_dt in Redshift table, but when I am adding load_dt in Redshift table the COPY command fails due to data error as the input layout & target layout are not matching (with extra load_dt on target).

The hive table which creates the S3 file shows partitioned load_dt at the end.

How to make this COPY command work with load_dt being on target Redshift?

I even tried changing S3 locations to s3://MY_s3/stage/my_s3_file/load_dt but no luck.

Answer:

I think I found the answer for my case.

I was not able to load the hive partition because Hive store that partition value in Hive metadata , that’s why the partition column was not there in S3 file.

Now I am adding a new column to S3 through Hive as Load_Dt_New so the S3 file would have the required column for my Redshift COPY command to work.

Leave a Reply