Troubleshooting¶
This section provides solutions to common issues you may encounter when using SparkForge.
Common Issues¶
Pipeline Initialization Errors¶
Problem: PipelineBuilder fails to initialize
Solutions: - Ensure SparkSession is properly configured and active - Verify schema name is valid and accessible - Check that quality thresholds are between 0 and 100
Validation Failures¶
Problem: Data validation fails with low quality scores
Solutions: - Review validation rules for correctness - Check data source quality - Adjust quality thresholds if appropriate - Investigate specific validation failures
Problem: ValidationError during step construction
Solutions: - Ensure all step types have non-empty validation rules - Verify transform functions are callable and properly defined - Check that source dependencies are correctly specified - For existing tables, ensure proper configuration
# ❌ This will raise ValidationError
BronzeStep(name="events", rules={})
# ✅ This is correct
BronzeStep(
name="events",
rules={"user_id": [F.col("user_id").isNotNull()]}
)
Execution Timeouts¶
Problem: Pipeline steps timeout during execution
Solutions: - Increase timeout settings in ExecutionConfig - Optimize transformation functions - Consider using parallel execution mode - Check resource availability
Memory Issues¶
Problem: OutOfMemoryError during execution
Solutions: - Increase Spark driver and executor memory - Optimize DataFrame operations - Use caching strategically - Consider data partitioning
Getting Help¶
If you encounter issues not covered here:
Check the Quick Start Guide for basic setup
Review the User Guide for detailed usage patterns
Look at Examples for working code samples
Consult the API Reference for method documentation
For additional troubleshooting information, see the complete guide: TROUBLESHOOTING.md