AWS Glue Grok Pattern, timestamp with milliseconds

Question:

I need to define a grok pattern in AWS Glue Classifie to capture the datestamp with milliseconds on the datetime column of file (which is converted as string by AWS Glue Crawler. I used the DATESTAMP_EVENTLOG predefined in AWS Glue and tried to add the milliseconds into the pattern.

Classification: datetime

Grok pattern: %{DATESTAMP_EVENTLOG:string}

Custom patterns:

I still could not succeed to implement pattern. Any ideas?snapshot of grok

Answer:

The misconception with the Classifiers is that they are for specifying file formats, in addition to the inbuilt ones like JSON, CSV, etc. And NOT for specifying individual data type parse formats.

As user @lilline suggests the best way to change a data type is with an ApplyMapping function.

When creating a Glue Job you can select the option: A proposed script generated by AWS Glue

Then when selecting the table from the Glue Catalog as a source, you can make changes to the datatypes, column names, etc.

The output code might looking something like the following:

Effectively casting the updateddateutc string to a timestamp.

In order to create a Classifier you would need to specify each individual column in the file.

Leave a Reply