🌱 PoC ETL from Azure Storage to CosmosDB

PoC to transfer a CSV file from Azure Storage to Azure CosmosDB.

TL/DR

Concept

Azure Storage Blob  Azure Data Factory  CosmosDB

[File: 295f22f3-158f-4a63-9b34-64646a66c862]

[File: aec6259e-38b4-4fe0-9d42-d50858df816b]

Deployment

Create a resource group and deploy the template:

az group create --name poc-datafactory --location "East US"

az deployment group create \
    --resource-group poc-datafactory \
    --template-file poc.bicep \
    --parameters dataFactoryName=etl

There are four available parameters, all optional:

  • location defaults to resourceGroup location
  • dataFactoryName defaults to auto-generated string
  • storageAccountName defaults to auto-generated string
  • databaseAccountName defaults to auto-generated string

All other entities are named after their type, eg:

resource databaseContainer 'Microsoft.DocumentDB/databaseAccounts/sqlDatabases/containers@2022-05-15' = {
  parent: database
  name: 'databasecontainer'  

[File: cbdb8dc3-49ce-4419-81cd-61107f84f043]

Testing

  1. Upload a CSV file to the storage container
    1. Must be delimited by ;
    2. Must contain a header row
    3. Must contain the name, protein and rating fields
  2. Manually trigger the pipeline inside Data Factory
  3. Check the output inside the Cosmos Database

Clean up

Delete the entire resource group to prevent waste:

az group delete --name poc-datafactory

Next

  • [ ] Automate a trigger based on uploaded files events
  • [ ] Expose mappping and translators as parameters to the pipeline execution level
  • [ ] Stress test transfering to multiple outputs

Resources


🌱 Seedlings são ideias que recém tive e precisam de cultivo, não foram revisadas ou refinadas. O que é isso?


You'll only receive email when they publish something new.

More from Myreli
All posts