Skip to main content

Transformations

Arch support data transformations using dbt-core using the compatible adapter (e.g. the Postgres adapter). Currently the Arch team manages onboarding your project to the platform and providing you access but the roadmap includes features to make this self serve in the near future.

Onboarding a dbt Project Repository

To onbaord a dbt project to your Arch account navigate to the dbt projects page and click the Create dbt Project button. You'll be prompted to select a public or private git repository that contains your dbt project.

Connecting a Git Repository

Public

Input a git http url like: https://github.com/meltano/meltano.

Create dbt Project Public

Private

Select a git repository from the drop down menu if you've previously registered it with Arch.

Create dbt Private Existing Repo

Otherwise select Create New Private Repository which will create a new git repository connection. The url should be in SSH form like: ssh://git@github.com/meltano/private_repo format.

Create dbt Private New Repo

Private git repositories are currently supported using GitHub and GitLab deploy keys. See the GitHub or GitLab docs for details on creating your key pair and configuring your public key.

Configure Info

Select the git branch that you want Arch to track. This can be a branch name, a version tag, a commit SHA, etc. Then optionally project a relative path to the dbt project if it's in a subdirectory. This will be the directory that contains your dbt_project.yml file.

Create dbt Project Info

Review and Submit

Finally review your info and submit!

Create dbt Project Review

Creating a Transform

Similar to pipelines you can configure dbt transformations to run on a schedule in Arch. You select a tenant and create a transform. You'll be asked to choose a dbt project, the dbt commands to run, and a schedule. Additional commands can be created as needed and configured to run in any particular order.

Create transform

dbt Using Arch Managed Storage

Access

You have full access to all the data in your Arch managed database instances. Arch provides access to you via users and roles in a restricted but flexible way to default towards safety.

Development

After you have sources configured in Arch you can start to transform the raw data using dbt. You will be provided with a development dbt user that has read access on your Arch source data and write access on only a dbt_dev schema. This allows you to safely run dbt locally and develop your models using the raw data provided by Arch sources without any risk to your production data.

Production

Once you're ready to put your dbt project into production you will get your dbt project configured, provide git repository access to Arch, set commands, and schedules. Arch will manage running your dbt models and populating the production dbt schema named dbt. By default, only Arch will have write access on the production dbt schema but elevated permissions for admin users can be provided.

Multi Tenant Projects

dbt was mostly designed to be used in a single tenant fashion so we've provided some examples of how it can be leveraged in multi tenant situations. Arch has no restrictions around how your dbt project is structured so these are only provided as a reference but are not required.

The concept is that you can build your Arch transformation components as dbt packages that are generic and reusable across clients. These packages can be imported into your client specific dbt projects in order to avoid duplicate logic and to reduce the maintenance burden. For example if you have 50 clients using the same source staging logic and you want to add a new column, instead of updating all 50 projects you can update the shared model in one place and let it propogate to all client projects that import that base dbt package. On top of the generic shared packages you can build your additional models with client specific logic.

In addition, when you have a large repository of dbt projects you'll prefer to have a single shared connection profile which is hydrated using environment variables instead of managing a distinct profile for each component or client dbt project.

See the example multi-tenant Arch project for more details.