Troubleshooting Guide
Quick Diagnosis Flowchart
Pipeline Failed?
├── Check Runs Page → Red Status?
│ ├── Yes → Check Error Logs
│ └── No → Check Asset Status
├── Data Missing?
│ ├── Check Asset Freshness
│ └── Verify Dependencies
└── Performance Issue?
├── Compare Run Duration
└── Check Resource Usage
Common Issues After Migration
1. "No module named 'meltano'" Error
Symptom:
ModuleNotFoundError: No module named 'meltano'
Solution:
- This is managed by the platform
- Should not occur in normal operation
- Contact support immediately
2. Missing Historical Data
Symptom:
- Dashboards show gaps
- Reports missing historical trends
Explanation:
- Historical run information not migrated
- Data in warehouse remains intact
- Only Dagster run history is new
Solution:
- Historical data still in your warehouse
- Create manual references if needed
- Document cutover date
3. Schedule Running at Wrong Time
Symptom:
- Pipeline runs at unexpected hour
- Missing scheduled runs
Check:
# View schedule definition
Schedules page → Your Schedule → Details
Common Causes:
- Timezone differences (Arch vs Dagster)
- Cron expression interpretation
- Daylight saving time
Fix:
- Verify timezone in deployment settings
- Adjust cron expression if needed
- Account for UTC vs local time
Meltano-Specific Issues
State Management Problems
"State lock file exists"
Meaning: Previous run didn't complete cleanly
Solution:
- Wait 5 minutes (auto-cleanup)
- Check for active runs
- Contact support if persists
"Bookmark not found"
Meaning: First run or state reset
Solution:
- Normal for first run
- Will extract all data
- State created automatically
Connection Errors
"Connection refused"
psycopg2.OperationalError: connection refused
Check:
- Source system status
- Network connectivity
- Credential rotation
- Firewall rules
Quick Test:
- Try running a simple query
- Check other extractors
- Verify from UI console
dbt-Specific Issues
Model Compilation Errors
"Model not found"
Compilation Error in model 'my_model'
Model 'my_model' not found
Causes:
- Model file missing
- Incorrect ref() syntax
- Model in different schema
Fix:
- Verify model file exists
- Check model naming
- Review dbt_project.yml
"Permission denied"
Database Error in model 'my_model'
permission denied for schema analytics
Solution:
- Warehouse permissions unchanged
- Check role assignments
- Verify schema access
Dependency Issues
Circular Dependencies
Found a cycle: model_a → model_b → model_a
Fix:
- Review model relationships
- Break cycle with staging model
- Restructure dependencies
Performance Issues
Slow Extractions
Diagnosis:
- Compare with Arch timing
- Check data volume growth
- Review extraction logs
Common Causes:
- No incremental state
- API rate limiting
- Large backfill
Solutions:
- Verify state is working
- Check API quotas
- Consider batching
dbt Models Taking Longer
Check:
-- In your warehouse
SELECT * FROM information_schema.query_history
WHERE query_text LIKE '%your_model%'
ORDER BY start_time DESC;
Optimize:
- Review model SQL
- Check for missing indexes
- Analyze query plans
- Consider incremental models
Asset Materialization Issues
"Upstream asset not materialized"
Meaning: Dependency hasn't run successfully
Fix:
- Materialize upstream asset first
- Check dependency chain
- Run full pipeline
"Asset check failed"
Example:
@asset_check
def row_count_check(context, asset):
# Validation failed
pass
Resolution:
- Review check logic
- Verify data quality
- Adjust thresholds
- Skip check if needed
Authentication Issues
"Invalid credentials"
For Dagster UI:
- Reset password via login page
- Check account status
- Verify email address
For Data Sources:
- Credentials managed by platform
- No user action needed
- Report to support
Session Timeouts
Symptom: Logged out frequently
Normal Behavior:
- 12-hour session timeout
- Security feature
- Cannot be modified
Error Message Decoder
Dagster Errors
Error | Meaning | Action |
---|---|---|
DagsterExecutionError | Asset failed to materialize | Check logs |
DagsterResourceError | Resource configuration issue | Contact support |
DagsterTypeError | Data type mismatch | Review asset output |
DagsterInvariantViolation | System constraint violated | Report bug |
Platform Errors
Error | Meaning | Action |
---|---|---|
K8s pod error | Compute resource issue | Auto-retry, then support |
Timeout exceeded | Run took too long | Optimize or increase limit |
Memory limit exceeded | Out of memory | Reduce data size or contact support |
Getting Help Effectively
Information to Gather
Run Information
- Run ID (from URL)
- Failure time
- Asset name
- Error message
Context
- Recent changes made
- First occurrence?
- Affecting all runs?
- Specific to asset?
Logs
- Copy error text
- Include stack trace
- Note step that failed
Support Template
Issue: [Brief description]
Run ID: [From URL]
Asset: [Name]
Error: [Message]
First seen: [Date/time]
Frequency: [Always/Sometimes]
Recent changes: [What changed]
Prevention Strategies
Daily Checks
- Monitor asset freshness
- Review failed runs
- Check schedule health
- Validate critical data
Weekly Review
- Performance trends
- Error patterns
- Resource usage
- Update documentation
Before Making Changes
- Test in development
- Review dependencies
- Plan rollback strategy
- Monitor first run
Emergency Procedures
Critical Pipeline Down
Immediate Actions
- Screenshot error
- Note exact time
- Check all pipelines
- Alert stakeholders
Diagnosis
- Recent changes?
- Source system up?
- Partial success?
- Alternative path?
Escalation
- Use emergency contact
- Provide run ID
- Share error details
- Available for call
Data Corruption
Stop the Bleeding
- Pause schedules
- Prevent propagation
- Document scope
Recovery
- Identify last good state
- Plan restoration
- Test fix carefully
- Gradual rollout
FAQ
Q: Why can't I see historical runs from Arch? A: Run history starts fresh in Dagster. Data remains in warehouse.
Q: Can I modify Meltano configurations? A: No, Meltano configs are managed. Use dbt for transformations.
Q: How do I add a new data source? A: Contact support to add new Meltano taps.
Q: Why is my pipeline slower than Arch? A: Check if running full extraction vs incremental.
Q: Can I run pipelines locally? A: No, compute is managed in cloud infrastructure.