7 Raster to Matrix Profile Representation Framework
This documentation describes a workflow for converting soil profile data between shapefile, raster, and matrix representations, enabling flexible transitions between spatial and computational formats.
7.1 Framework Overview
The framework provides a pipeline for converting between three key representations: 1. Spatial vector data (shapefiles) 2. Rasterized profile data (GeoTIFF) 3. Matrix-based computational format (NumPy/JSON)
7.2 Data Processing Clusters
7.3 Vector to Raster Conversion
Source Code: vector_raster_conversion.py
Converts soil profile shapefiles into rasterized representations with proper coordinate transformation.
7.3.1 Key Functions
rasterize_soil_profile()
: Main conversion function- Transforms vector geometries to pixel space
- Handles horizon mapping and indexing
- Creates verification plots
- Preserves horizon metadata in GeoTIFF tags
7.3.2 Usage Example
= "HS 2-2-combined.shp"
shp_path = "HS 2-2-combined_raster.tif"
output_raster_path = 220
width_cm = 100
depth_cm
= rasterize_soil_profile(
horizon_mapping, raster_data
shp_path,
output_raster_path,
width_cm,
depth_cm )
Example output:
Raster statistics:
Unique values in result: [ 1 2 3 4 5 6 7 8 9 10 11 12]
Horizon mapping: {
'doe_CF': 1,
'doe_Wf': 2,
'doe_air': 3,
...
}
7.4 Raster Processing
Source Code: raster_processing.py
Handles post-processing of rasterized profiles, including air space management and data cleaning.
7.4.1 Key Functions
process_raster()
: Post-processes raster data- Removes air spaces
- Preserves horizon tags
- Maintains data integrity
7.4.2 Usage Example
= "HS 2-2-combined_raster.tif"
input_tif = "HS 2-2-combined_raster_processed.tif"
output_tif = process_raster(input_tif, output_tif, na_value=3) processed_path
Example output:
7.5 Matrix Generation
Source Code: matrix_generation.py
Creates efficient matrix representations from processed raster data.
7.5.1 Key Functions
create_horizon_string_matrix()
: Creates string-based matrixmatrix_to_ranges()
: Converts to efficient range representationsave_horizon_json()
: Serializes to compressed JSON format
7.5.2 Usage Example
= "HS 2-2-combined_raster_processed.tif"
raster_path = create_horizon_string_matrix(raster_path)
horizon_matrix = save_horizon_json(horizon_matrix, "horizon_ranges.json") ranges
Example output:
Number of unique patterns: 91
Horizon row ranges:
doe_CF: rows 21-67 (appears in 25 rows)
doe_Wf: rows 3-99 (appears in 84 rows)
...
7.6 Data Format Specifications
7.6.1 Shapefile Requirements
- Must contain horizon identifiers
- Should have clear horizon boundaries
- Coordinates in consistent units
7.6.2 Raster Format
- GeoTIFF with horizon mapping in tags
- One band containing horizon indices
- No data value for air/empty space
7.6.3 Matrix Format
- NumPy array of horizon strings
- Empty values represented as ’’
- Consistent dimensions (depth × width)
7.6.4 JSON Output Format
{
"0-10": [["horizon1", 10]],
"11-20": [["horizon2", 5], ["horizon3", 5]]
}
7.7 Storage Efficiency
The framework achieves significant compression:
Matrix size: 1,760,000 bytes (22,000 elements)
JSON size: 25,837 bytes (1,004 elements)
Compression ratio: 0.01x
7.8 Technical Requirements
Required Python packages: - geopandas - rasterio - numpy - matplotlib - shapely
Install via:
pip install geopandas rasterio numpy matplotlib shapely
7.9 Implementation Notes
7.9.1 Coordinate Transformation
- Preserves relative positions
- Scales to specified dimensions
- Handles both horizontal and vertical transformations
7.9.2 Data Quality
- Validates horizon mappings
- Verifies rasterization results
- Tracks pattern statistics
7.9.3 Best Practices
- Verify shapefile quality before processing
- Check horizon mapping consistency
- Review verification plots
- Validate pattern statistics
- Compare compression results
7.10 Advanced Topics
7.10.1 Extending the Framework
The workflow can be adapted for: - Different coordinate systems - Various horizon classification schemes - Additional metadata preservation - Custom visualization needs
7.10.2 Performance Optimization
Consider: - Batch processing for multiple profiles - Memory-efficient processing for large datasets - Parallel processing options - Custom compression schemes