AWS Glue Job – Convert CSV to Parquet

Question:

I am trying to convert about 1.5 GB of GZIPPED CSV into Parquet using AWS Glue. The script below is an autogenerated Glue job to accomplish that task. It seems to take a very long time (I’ve waited hours for 10 DPUs and never seen it end or produce any output data)

I’m wondering if anyone has any experience converting 1.5 GB + GZIPPED CSV into Parquet – is there a better way to accomplish this conversion?

I have TB’s of data to convert. It is concerning that it seems to take so long to convert GBs.

My Glue Job Logs have thousands of entries like:

AWS Autogenerated Glue Job Code:

Answer:

Yes, I’ve recently figured out that Spark DataFrames – versus Glue’s DynamicFrames – are the significantly faster way to go.

Leave a Reply