Alaska Soil Data Bank Project Metadata - 5 Alaska Soil Data Bank (AKSDB) Data Model Documentation

5.1 Table of Contents

Introduction
Related Data Models
Core Concepts
Data Levels
Data Dictionary and Thesaurus
Key Data Structures
Metadata and Annotations
Data Quality and Standards
Implementation Notes

5.2 Introduction

The Alaska Soil Data Bank (AKSDB) project aims to acquire, curate, and harmonize non-NRCS legacy soil data across Alaska. Unlike similar efforts in the conterminous United States that primarily rely on USDA-NRCS NASIS and KSSL databases, Alaska presents unique challenges due to significant gaps in NASIS/KSSL coverage and the existence of numerous diverse non-NRCS datasets with limited spatial overlap.

The AKSDB data model has been developed with several key principles: - Transparent sourcing of raw data - No modification of source data on ingestion - Enhanced metadata enrichment - Transparent harmonization via scripting - Delivery of publicly available datasets to WoSIS/NASIS/ISCN

5.3 Related Data Models

The AKSDB data model draws inspiration from several existing soil data frameworks:

NASIS (National Soil Information System)
- Primary influence for pedon and site concepts
- Provides foundation for field key standardization
- Used as alignment target for eventual data export
- References:
  - NASIS Overview
  - NASIS Database Metadata
WoSIS (World Soil Information Service)
- Primary influence for dataset concepts
- Provides foundation for standardization approaches
- Offers established controlled vocabulary framework
- References:
SOC-DRaH and SOC-DRaH2
- Early iterations of soil carbon data harmonization
- Used for International Soil Carbon Network (ISCN) data compilation
- Provides baseline for metadata handling
- References:
Soil-DRaH
- Most recent iteration of harmonization approaches
- Introduces level-based data processing concept
- Provides template for data annotations
- Reference:
  - Repository

5.4 Core Concepts

5.4.1 Dataset Concept

Table: Dataset Structure

Field                       | Description
---------------------------|-------------
dataset_iid                | Primary identifier for the dataset
dataset_sub_iid            | Optional identifier for subset within compilation datasets
version                    | Version of the dataset
date                       | Date of dataset snapshot/receipt
license_file              | URL link to associated license
publication_date          | Date of formal publication (if applicable)
reference                 | Citation or reference information

5.4.2 Pedon Concept

Table: Pedon Structure

Field                       | Description
---------------------------|-------------
dataset_iid_ref            | Foreign key link to dataset
dataset_sub_iid_ref        | Foreign key link to dataset subset (if applicable)
dataset_peiid              | Unique pedon identifier
dts_pedon                  | Observation date in YYYY-MM-DD format
lat                       | Latitude (WGS 84)
lon                       | Longitude (WGS 84)
tax_order                 | Taxonomic order (US Soil Taxonomy)
tax_suborder              | Taxonomic suborder
tax_grtgrp                | Great group taxonomy 
tax_subgrp                | Subgroup taxonomy
o_thick_surf              | Surface organic layer thickness (cm)
o_thick_cum40             | Cumulative organic layer thickness to 40cm
ph_10                     | Soil pH at 10cm
ph_30                     | Soil pH at 30cm
ec_10                     | Soil EC at 10cm
ec_30                     | Soil EC at 30cm

5.4.3 Horizon Concept

Table: Horizon Structure

Field                       | Description
---------------------------|-------------
dataset_peiid_ref          | Foreign key link to pedon
dataset_hziid              | Unique horizon identifier
hz_name                    | Horizon designation
hz_seq                     | Horizon sequence number
hz_dept                    | Top depth (cm)
hz_depb                    | Bottom depth (cm)
text_field                 | Field texture class
text_mod_field             | Field texture modifier
sand_pct_field             | Field-estimated sand percent
clay_pct_field             | Field-estimated clay percent
grpct_field               | Field-estimated gravel percent
cbpct_field               | Field-estimated cobble percent
stpct_field               | Field-estimated stone percent
blpct_field               | Field-estimated boulder percent
ph_field                  | Field pH measurement
ec_field                  | Field EC measurement

5.4.4 Sample Concept

Table: Sample Structure 

Field                       | Description
---------------------------|-------------
dataset_hziid_ref          | Foreign key link to horizon(s)
dataset_peiid_ref          | Foreign key link to pedon
samp_dept                  | Sample top depth (cm)
samp_depb                  | Sample bottom depth (cm)
sand_pct_lab               | Lab-measured sand percent
clay_pct_lab               | Lab-measured clay percent
silt_pct_lab               | Lab-measured silt percent
soct_lab                   | Lab-measured organic carbon percent
sicpct_lab                 | Lab-measured inorganic carbon percent 
tcpct_lab                  | Lab-measured total carbon percent
tnpct_lab                  | Lab-measured total nitrogen percent
loipct_lab                 | Lab-measured loss on ignition percent
ph_lab                     | Lab-measured pH
ec_lab                     | Lab-measured EC
grpct_lab                  | Lab-measured gravel percent
cbpct_lab                  | Lab-measured cobble percent
cftot_pct_lab             | Lab-measured total coarse fragments

5.5 Data Levels

The AKSDB implements a hierarchical data level system:

5.5.1 Level 0 (Raw Data)

Original data as delivered
No modifications from source
Requires three components:
1. Raw data tables
2. Metadata (created or linked)
3. Data dictionary (annotations file)

5.5.2 Level 1 (Validated Data)

Data format validation
Standard treatment of missing values
Removal of empty/null columns
Basic format correctness checks
Raw data values maintained

5.5.3 Level 2 (Standardized Data)

Field names standardized using annotation file
Maintains original data values
Structured according to AKSDB data model concepts
Field names mapped to controlled vocabulary

5.5.4 Level 3 (Quality Controlled)

QA/QC checks implemented
Documentation of all modifications
Includes:
- Horizon depth validation
- Value range checks
- Typo corrections
- Data quality annotations

5.5.5 Level 4 (Harmonized Data)

Multiple datasets integrated
Standardized formats and units
Ready for modeling applications
Complete provenance tracking

5.6 Data Dictionary and Thesaurus

The AKSDB implements two distinct but related concepts:

5.6.1 Data Dictionary (Annotations File)

Table: Annotation File Structure 

Field                       | Description
---------------------------|-------------
dataset_id                 | Dataset identifier
dataset_sub_id             | Dataset subset identifier
table_id                   | Raw table/file name including extension
column_id                  | Original column name
aksdb_field_key            | Standardization key
is_type                    | Data type descriptor (identifier/description/unit/method/value)
with_entry                | Content for the is_type

The annotation file follows the naming convention: <dataset_id>_annotations (all lowercase)

5.6.2 Thesaurus

Table: Thesaurus Structure

Field                       | Description
---------------------------|-------------
aksdb_field_key            | Standardized field identifier
name                       | Human-readable field name
full_name                 | Complete descriptive name
synonyms                   | List of equivalent terms
descriptor                 | Detailed field description
enforced_standards        | Required standards for Level 1+

5.7 Metadata and Annotations

5.7.1 Dataset Package Components

Raw Data Files
- Original format preserved
- No modifications on ingestion
Metadata
- XML format using EML (Ecological Metadata Language)
- Either linked (published data) or generated
- Follows EML schema and best practices
- Resources:
Annotation File
- Maps raw fields to standard vocabulary
- Documents data types and constraints
- Provides field-level metadata

5.7.2 Standardization Process

Validate data format (Level 1)
Apply standardization rules (Level 2)
Implement QA/QC (Level 3)
Harmonize across datasets (Level 4)

5.8 Data Quality and Standards

5.8.1 Quality Control Measures

Horizon depth consistency checks
Value range validation
Unit standardization
Coordinate system verification
Temporal data validation

5.8.2 Standards Enforcement

WGS 84 coordinate system
Standard date formats (YYYY-MM-DD)
Controlled vocabulary for field names
Required metadata elements
Documentation of modifications

5.8.3 Quality Annotations

Confidence levels for measurements
Data source reliability
Spatial accuracy assessments
Temporal precision indicators
Modification tracking

5.9 Implementation Notes

The AKSDB data model is implemented with several key considerations:

Flexibility
- Accommodates diverse data sources
- Handles varying levels of detail
- Supports multiple data types
Traceability
- Maintains links to source data
- Documents all transformations
- Preserves original values
Interoperability
- Compatible with NASIS conventions
- Supports WoSIS export
- Enables ISCN integration
Scalability
- Handles large datasets
- Supports incremental updates
- Enables distributed processing

The model continues to evolve as new datasets are incorporated and additional requirements are identified. Regular review and updates ensure the model remains aligned with project goals and community needs.

5.10 Style Guidelines

File and field names should be lowercase
Use underscores for word separation
Consistent naming conventions across all components