Troubleshooting¶

This section provides solutions to common issues you may encounter when using SparkForge.

Common Issues¶

Pipeline Initialization Errors¶

Problem: PipelineBuilder fails to initialize

Solutions: - Ensure SparkSession is properly configured and active - Verify schema name is valid and accessible - Check that quality thresholds are between 0 and 100

Validation Failures¶

Problem: Data validation fails with low quality scores

Solutions: - Review validation rules for correctness - Check data source quality - Adjust quality thresholds if appropriate - Investigate specific validation failures

Problem: ValidationError during step construction

Solutions: - Ensure all step types have non-empty validation rules - Verify transform functions are callable and properly defined - Check that source dependencies are correctly specified - For existing tables, ensure proper configuration

# ❌ This will raise ValidationError
BronzeStep(name="events", rules={})

# ✅ This is correct
BronzeStep(
    name="events",
    rules={"user_id": [F.col("user_id").isNotNull()]}
)

Execution Timeouts¶

Problem: Pipeline steps timeout during execution

Solutions: - Increase timeout settings in ExecutionConfig - Optimize transformation functions - Consider using parallel execution mode - Check resource availability

Memory Issues¶

Problem: OutOfMemoryError during execution

Solutions: - Increase Spark driver and executor memory - Optimize DataFrame operations - Use caching strategically - Consider data partitioning

Getting Help¶

If you encounter issues not covered here:

Check the Quick Start Guide for basic setup
Review the User Guide for detailed usage patterns
Look at Examples for working code samples
Consult the API Reference for method documentation

For additional troubleshooting information, see the complete guide: TROUBLESHOOTING.md