Question:
Basically I want to pg_dump
my RDS database to S3 using AWS Data Pipeline,
I am not 100% sure if this is possible I got up to the stage where the SqlDataNode
wants a selectQuery
at which point i am wondering what to do.
Below is my template so far:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
AWSTemplateFormatVersion: "2010-05-15" Description: RDS to S3 Dump Parameters: RDSInstanceID: Description: "Instance ID of RDS to Dump from" DatabaseName: Description: "Name of the Database to Dump" Type: String Username: Description: "Database Username" Type: String Password: Description: "Database password" Type: String NoEcho: true RDSToS3Dump: Type: "AWS::DataPipeline::Pipeline" Properties: Name: "RDSToS3Dump" Description: "Pipeline to backup RDS data to S3" Activate: true ParameterObjects: - name: "SourceRDSTable" type: "SqlDataNode" Database: !Ref DatabaseName - name: !Ref DatabaseName type: "RdsDatabase" databaseName: !Ref DatabaseName username: !Ref Username password: !Ref Password rdsInstanceId: !Ref RDSInstanceID - name: "S3OutputLocation" type: "S3DataNode" filePath: #TODO: S3 Bucket here parameterized? Will actually need to create one. - name: "RDStoS3CopyActivity" type: "CopyActivity" input: "SourceRDSTable" output: "S3OutputLocation" #TODO: do we need a runsOn? |
Answer:
As mentioned in another answer, AWS Data Pipeline only allows you to dump tables and not the entire DB. If you really want to use pg_dump
to dump the entire contents of your DB to S3 using AWS CloudFormation
, you can
use Lambda-backed custom resources. Going down that route, you’ll have to write a Lambda function that:
- Connects to the DB
- Takes the dump of your DB using
pg_dump
- Uploads it to S3