Amazon Redshift: Copying Data Between Databases

Question:

I am looking to Copy data within databases on Amazon Redshift. Before this, I was copying data from a Redshift database to a PostgreSQL hosted on an EC2 instance for analytical purpose. I had ruby script that would do it using dblink EXTENSION.

But now since the data is un-managable on PostgreSQL instance, we have decided to copy the data to a separate database, on the same Redshift cluster.

Bump! What I found was that dblink is not available for Postgre 8.0 (Redshift Version). Is there a way I can get this working on Redshift.

I don’t want to use the Copy command and add a layer of S3 in between or even data-pipeline. You see, it’s the same cluster, just different databases.

In case, there is no other way, shall I use data-pipeline or Copy to S3 and then back to Redshift. Or is it essentially the same in the end?

P.S. Any help/leads would be appreciated. I would do my hard work. Just need a heads up.

Answer:

This seems now possible (since June 3, 2016)

See: https://blogs.aws.amazon.com/bigdata/post/Tx1GQ6WLEWVJ1OX/JOIN-Amazon-Redshift-AND-Amazon-RDS-PostgreSQL-WITH-dblink

In the above article, they give an example to copy from redshift to an other database:

I’ll annotate with (postgres cluster) and (redshift cluster) for clarity

Copying data from Amazon Redshift to RDS PostgreSQL

The SQL to create the table: (postgres cluster)

[…]

The following query shows how to insert data into the tables using the dblink function:

copy is from (redshift cluster) to (postgres cluster)

Edit: I’ve used this in production to load tables with many millions of rows daily, and I haven’t had a single issue related to it. So I would recommend it as the preferred method for loading from Redshift to RDS-postgres (As opposed to COPYing, or worse, loading the data in memory with an ORM)

Leave a Reply