Using Airflow to orchestrate data pipelines
May 6, 2021•236 words
(this blog post is auto-generated via gpt-neo)
Using airflow to orchestrate your data pipelines
In many organizations, data pipelines are used to transfer data with as little manual intervention as possible to keep the time to value of the process as low as possible. The following video demonstrates how to set up and orchestrate a data pipeline on AWS with AWS CloudFormation.
Note
It’s important to pay attention to AWS documentation when setting up or updating any AWS resource. For example, using CloudFormation to update resource tags is not recommended. The new resource will be created at an unpredictable moment, or incorrectly updated, as resources that AWS recommends that you use to create resources are not created by CloudFormation.
While setting up a CloudFormation resource that can be updated after creation is a recommended, highly-guarded practice, AWS recommends against creating new resources, rather than simply updating existing ones, as the new resource will not be able to be deleted when the existing resource is deleted.
In this example, you will use an AWS Data Pipeline resource (called Source) to transfer data from one database to another database. This resource will transfer and maintain your data in your source database. After the data has been transferred, it will be dropped from the source database.
Note
The example in this post will use the PostgreSQL database, although any database is acceptable when using the Amazon RDS service.