5  Alaska Soil Data Bank (AKSDB) Data Model Documentation

5.1 Table of Contents

  1. Introduction
  2. Related Data Models
  3. Core Concepts
  4. Data Levels
  5. Data Dictionary and Thesaurus
  6. Key Data Structures
  7. Metadata and Annotations
  8. Data Quality and Standards
  9. Implementation Notes

5.2 Introduction

The Alaska Soil Data Bank (AKSDB) project aims to acquire, curate, and harmonize non-NRCS legacy soil data across Alaska. Unlike similar efforts in the conterminous United States that primarily rely on USDA-NRCS NASIS and KSSL databases, Alaska presents unique challenges due to significant gaps in NASIS/KSSL coverage and the existence of numerous diverse non-NRCS datasets with limited spatial overlap.

The AKSDB data model has been developed with several key principles: - Transparent sourcing of raw data - No modification of source data on ingestion - Enhanced metadata enrichment - Transparent harmonization via scripting - Delivery of publicly available datasets to WoSIS/NASIS/ISCN

5.4 Core Concepts

5.4.1 Dataset Concept

Table: Dataset Structure

Field                       | Description
---------------------------|-------------
dataset_iid                | Primary identifier for the dataset
dataset_sub_iid            | Optional identifier for subset within compilation datasets
version                    | Version of the dataset
date                       | Date of dataset snapshot/receipt
license_file              | URL link to associated license
publication_date          | Date of formal publication (if applicable)
reference                 | Citation or reference information

5.4.2 Pedon Concept

Table: Pedon Structure

Field                       | Description
---------------------------|-------------
dataset_iid_ref            | Foreign key link to dataset
dataset_sub_iid_ref        | Foreign key link to dataset subset (if applicable)
dataset_peiid              | Unique pedon identifier
dts_pedon                  | Observation date in YYYY-MM-DD format
lat                       | Latitude (WGS 84)
lon                       | Longitude (WGS 84)
tax_order                 | Taxonomic order (US Soil Taxonomy)
tax_suborder              | Taxonomic suborder
tax_grtgrp                | Great group taxonomy 
tax_subgrp                | Subgroup taxonomy
o_thick_surf              | Surface organic layer thickness (cm)
o_thick_cum40             | Cumulative organic layer thickness to 40cm
ph_10                     | Soil pH at 10cm
ph_30                     | Soil pH at 30cm
ec_10                     | Soil EC at 10cm
ec_30                     | Soil EC at 30cm

5.4.3 Horizon Concept

Table: Horizon Structure

Field                       | Description
---------------------------|-------------
dataset_peiid_ref          | Foreign key link to pedon
dataset_hziid              | Unique horizon identifier
hz_name                    | Horizon designation
hz_seq                     | Horizon sequence number
hz_dept                    | Top depth (cm)
hz_depb                    | Bottom depth (cm)
text_field                 | Field texture class
text_mod_field             | Field texture modifier
sand_pct_field             | Field-estimated sand percent
clay_pct_field             | Field-estimated clay percent
grpct_field               | Field-estimated gravel percent
cbpct_field               | Field-estimated cobble percent
stpct_field               | Field-estimated stone percent
blpct_field               | Field-estimated boulder percent
ph_field                  | Field pH measurement
ec_field                  | Field EC measurement

5.4.4 Sample Concept

Table: Sample Structure 

Field                       | Description
---------------------------|-------------
dataset_hziid_ref          | Foreign key link to horizon(s)
dataset_peiid_ref          | Foreign key link to pedon
samp_dept                  | Sample top depth (cm)
samp_depb                  | Sample bottom depth (cm)
sand_pct_lab               | Lab-measured sand percent
clay_pct_lab               | Lab-measured clay percent
silt_pct_lab               | Lab-measured silt percent
soct_lab                   | Lab-measured organic carbon percent
sicpct_lab                 | Lab-measured inorganic carbon percent 
tcpct_lab                  | Lab-measured total carbon percent
tnpct_lab                  | Lab-measured total nitrogen percent
loipct_lab                 | Lab-measured loss on ignition percent
ph_lab                     | Lab-measured pH
ec_lab                     | Lab-measured EC
grpct_lab                  | Lab-measured gravel percent
cbpct_lab                  | Lab-measured cobble percent
cftot_pct_lab             | Lab-measured total coarse fragments

5.5 Data Levels

The AKSDB implements a hierarchical data level system:

5.5.1 Level 0 (Raw Data)

  • Original data as delivered
  • No modifications from source
  • Requires three components:
    1. Raw data tables
    2. Metadata (created or linked)
    3. Data dictionary (annotations file)

5.5.2 Level 1 (Validated Data)

  • Data format validation
  • Standard treatment of missing values
  • Removal of empty/null columns
  • Basic format correctness checks
  • Raw data values maintained

5.5.3 Level 2 (Standardized Data)

  • Field names standardized using annotation file
  • Maintains original data values
  • Structured according to AKSDB data model concepts
  • Field names mapped to controlled vocabulary

5.5.4 Level 3 (Quality Controlled)

  • QA/QC checks implemented
  • Documentation of all modifications
  • Includes:
    • Horizon depth validation
    • Value range checks
    • Typo corrections
    • Data quality annotations

5.5.5 Level 4 (Harmonized Data)

  • Multiple datasets integrated
  • Standardized formats and units
  • Ready for modeling applications
  • Complete provenance tracking

5.6 Data Dictionary and Thesaurus

The AKSDB implements two distinct but related concepts:

5.6.1 Data Dictionary (Annotations File)

Table: Annotation File Structure 

Field                       | Description
---------------------------|-------------
dataset_id                 | Dataset identifier
dataset_sub_id             | Dataset subset identifier
table_id                   | Raw table/file name including extension
column_id                  | Original column name
aksdb_field_key            | Standardization key
is_type                    | Data type descriptor (identifier/description/unit/method/value)
with_entry                | Content for the is_type

The annotation file follows the naming convention: <dataset_id>_annotations (all lowercase)

5.6.2 Thesaurus

Table: Thesaurus Structure

Field                       | Description
---------------------------|-------------
aksdb_field_key            | Standardized field identifier
name                       | Human-readable field name
full_name                 | Complete descriptive name
synonyms                   | List of equivalent terms
descriptor                 | Detailed field description
enforced_standards        | Required standards for Level 1+

5.7 Metadata and Annotations

5.7.1 Dataset Package Components

  1. Raw Data Files
    • Original format preserved
    • No modifications on ingestion
  2. Metadata
  3. Annotation File
    • Maps raw fields to standard vocabulary
    • Documents data types and constraints
    • Provides field-level metadata

5.7.2 Standardization Process

  1. Validate data format (Level 1)
  2. Apply standardization rules (Level 2)
  3. Implement QA/QC (Level 3)
  4. Harmonize across datasets (Level 4)

5.8 Data Quality and Standards

5.8.1 Quality Control Measures

  • Horizon depth consistency checks
  • Value range validation
  • Unit standardization
  • Coordinate system verification
  • Temporal data validation

5.8.2 Standards Enforcement

  • WGS 84 coordinate system
  • Standard date formats (YYYY-MM-DD)
  • Controlled vocabulary for field names
  • Required metadata elements
  • Documentation of modifications

5.8.3 Quality Annotations

  • Confidence levels for measurements
  • Data source reliability
  • Spatial accuracy assessments
  • Temporal precision indicators
  • Modification tracking

5.9 Implementation Notes

The AKSDB data model is implemented with several key considerations:

  1. Flexibility
    • Accommodates diverse data sources
    • Handles varying levels of detail
    • Supports multiple data types
  2. Traceability
    • Maintains links to source data
    • Documents all transformations
    • Preserves original values
  3. Interoperability
    • Compatible with NASIS conventions
    • Supports WoSIS export
    • Enables ISCN integration
  4. Scalability
    • Handles large datasets
    • Supports incremental updates
    • Enables distributed processing

The model continues to evolve as new datasets are incorporated and additional requirements are identified. Regular review and updates ensure the model remains aligned with project goals and community needs.

5.10 Style Guidelines

  • File and field names should be lowercase
  • Use underscores for word separation
  • Consistent naming conventions across all components