Skip to main content

Understanding Dagster for Arch Users

This guide helps you understand Dagster concepts by relating them to what you already know from Arch.

Core Concepts Mapping

Arch → Dagster

Arch ConceptDagster EquivalentWhat It Means
PipelineJobA collection of tasks that run together
Transform/ExtractAssetA data artifact produced by your code
ScheduleScheduleTime-based pipeline triggers
Pipeline StateAsset MaterializationRecord of when data was last updated
OrchestrationDagster DaemonBackground process managing runs

Key Dagster Concepts

Learn even more detail about Dagster Concepts in their documentation.

Assets

An asset represents a logical unit of data such as a table, dataset, or machine learning model. Assets can have dependencies on other assets, forming the data lineage for your pipelines. As the core abstraction in Dagster, assets can interact with many other Dagster concepts to facilitate certain tasks.

Jobs

Jobs are collections of assets that run together. Your Meltano + dbt pipelines are packaged as jobs.

Runs

Each execution of a job creates a run with:

  • Unique run ID
  • Logs and status
  • Asset materialization records

How Your Pipelines Work in Dagster

Meltano Integration

# Your Meltano taps and targets are wrapped as Dagster assets
@asset
def salesforce_data():
# Runs: meltano run tap-salesforce target-snowflake
pass

dbt Integration

# Your dbt models become individual assets
@dbt_assets(manifest=dbt_manifest)
def my_dbt_assets():
pass

The Dagster UI

  1. Assets: View all your data assets and their status
  2. Jobs: See and launch your pipelines
  3. Runs: Monitor active and historical runs
  4. Schedules: Manage automated triggers

Asset Graph

The asset graph shows:

  • Data lineage (what depends on what)
  • Materialization status (when last updated)
  • Dependencies between Meltano and dbt

Key Differences from Arch

1. Asset-Centric View

  • Arch: Pipeline-centric (focus on the process)
  • Dagster: Asset-centric (focus on the data produced)

2. Observability

  • Arch: Limited visibility into dependencies
  • Dagster: Full lineage and dependency tracking

3. Failure Handling

  • Arch: Retry entire pipeline
  • Dagster: Retry specific assets or downstream dependencies

Next Steps