Transformations
Arch support data transformations using dbt-core using the compatible adapter (e.g. the Postgres adapter). Currently the Arch team manages onboarding your project to the platform and providing you access but the roadmap includes features to make this self serve in the near future.
Onboarding a dbt Project Repository
To onbaord a dbt project to your Arch account navigate to the dbt projects page and click the Create dbt Project
button.
You'll be prompted to select a public or private git repository that contains your dbt project.
Connecting a Git Repository
Public
Input a git http url like: https://github.com/meltano/meltano.
Private
Select a git repository from the drop down menu if you've previously registered it with Arch.
Otherwise select Create New Private Repository
which will create a new git repository connection.
The url should be in SSH form like: ssh://git@github.com/meltano/private_repo
format.
Private git repositories are currently supported using GitHub and GitLab deploy keys. See the GitHub or GitLab docs for details on creating your key pair and configuring your public key.
Configure Info
Select the git branch that you want Arch to track. This can be a branch name, a version tag, a commit SHA, etc. Then optionally project a relative path to the dbt project if it's in a subdirectory. This will be the directory that contains your dbt_project.yml file.
Review and Submit
Finally review your info and submit!
Creating a Transform
Similar to pipelines you can configure dbt transformations to run on a schedule in Arch. You select a tenant and create a transform. You'll be asked to choose a dbt project, the dbt commands to run, and a schedule. Additional commands can be created as needed and configured to run in any particular order.
dbt Using Arch Managed Storage
Access
You have full access to all the data in your Arch managed database instances. Arch provides access to you via users and roles in a restricted but flexible way to default towards safety.
Development
After you have sources configured in Arch you can start to transform the raw data using dbt.
You will be provided with a development dbt user that has read access on your Arch source data and write access on only a dbt_dev
schema.
This allows you to safely run dbt locally and develop your models using the raw data provided by Arch sources without any risk to your production data.
Production
Once you're ready to put your dbt project into production you will get your dbt project configured, provide git repository access to Arch, set commands, and schedules.
Arch will manage running your dbt models and populating the production dbt schema named dbt
.
By default, only Arch will have write access on the production dbt schema but elevated permissions for admin users can be provided.
Multi Tenant Projects
dbt was mostly designed to be used in a single tenant fashion so we've provided some examples of how it can be leveraged in multi tenant situations. Arch has no restrictions around how your dbt project is structured so these are only provided as a reference but are not required.
The concept is that you can build your Arch transformation components as dbt packages that are generic and reusable across clients. These packages can be imported into your client specific dbt projects in order to avoid duplicate logic and to reduce the maintenance burden. For example if you have 50 clients using the same source staging logic and you want to add a new column, instead of updating all 50 projects you can update the shared model in one place and let it propogate to all client projects that import that base dbt package. On top of the generic shared packages you can build your additional models with client specific logic.
In addition, when you have a large repository of dbt projects you'll prefer to have a single shared connection profile which is hydrated using environment variables instead of managing a distinct profile for each component or client dbt project.
See the example multi-tenant Arch project for more details.