Understanding Dagster for Arch Users
This guide helps you understand Dagster concepts by relating them to what you already know from Arch.
Core Concepts Mapping
Arch → Dagster
Arch Concept | Dagster Equivalent | What It Means |
---|---|---|
Pipeline | Job | A collection of tasks that run together |
Transform/Extract | Asset | A data artifact produced by your code |
Schedule | Schedule | Time-based pipeline triggers |
Pipeline State | Asset Materialization | Record of when data was last updated |
Orchestration | Dagster Daemon | Background process managing runs |
Key Dagster Concepts
Learn even more detail about Dagster Concepts in their documentation.
Assets
An asset represents a logical unit of data such as a table, dataset, or machine learning model. Assets can have dependencies on other assets, forming the data lineage for your pipelines. As the core abstraction in Dagster, assets can interact with many other Dagster concepts to facilitate certain tasks.
Jobs
Jobs are collections of assets that run together. Your Meltano + dbt pipelines are packaged as jobs.
Runs
Each execution of a job creates a run with:
- Unique run ID
- Logs and status
- Asset materialization records
How Your Pipelines Work in Dagster
Meltano Integration
# Your Meltano taps and targets are wrapped as Dagster assets
@asset
def salesforce_data():
# Runs: meltano run tap-salesforce target-snowflake
pass
dbt Integration
# Your dbt models become individual assets
@dbt_assets(manifest=dbt_manifest)
def my_dbt_assets():
pass
The Dagster UI
Main Navigation
- Assets: View all your data assets and their status
- Jobs: See and launch your pipelines
- Runs: Monitor active and historical runs
- Schedules: Manage automated triggers
Asset Graph
The asset graph shows:
- Data lineage (what depends on what)
- Materialization status (when last updated)
- Dependencies between Meltano and dbt
Key Differences from Arch
1. Asset-Centric View
- Arch: Pipeline-centric (focus on the process)
- Dagster: Asset-centric (focus on the data produced)
2. Observability
- Arch: Limited visibility into dependencies
- Dagster: Full lineage and dependency tracking
3. Failure Handling
- Arch: Retry entire pipeline
- Dagster: Retry specific assets or downstream dependencies