Soil Sampling Design Methods Including Spatial Coverage, Maxvol Optimal Design, Simple Random, and Stratified Sampling
The soilsampling package provides methods for designing soil sampling schemes. It implements:
- Simple Random Sampling: Uniform random selection of sampling locations
- Stratified Random Sampling: Random sampling within compact geographical strata
- Spatial Coverage Sampling: Purposive sampling at stratum centroids for optimal coverage
- Maxvol Optimal Design Sampling: Feature-based sampling using D-optimal experimental design
- Composite Sampling: Sampling from equal-area strata for combined samples
- Validation Metrics: Assess stratification quality through coverage efficiency and distance statistics
# Using pak (fastest)
pak::pak("ccarbajal16/soilsampling")
# Or using devtools
devtools::install_github("ccarbajal16/soilsampling")# Install from local directory
install.packages(".", repos = NULL, type = "source")
# Or using devtools
devtools::install_local("path/to/soilsampling")- R (>= 4.1.0)
- sf (>= 1.0-12)
- ggplot2 (>= 3.3.0)
library(soilsampling)
library(sf)
# Create a study area
poly <- st_polygon(list(rbind(
c(0, 0), c(100, 0), c(100, 50), c(0, 50), c(0, 0)
)))
study_area <- st_sf(geometry = st_sfc(poly))
# Set seed for reproducibility
set.seed(42)
# Spatial coverage sampling
samples <- ss_coverage(study_area, n_strata = 25, n_try = 5)
ss_plot(samples)
# Get coordinates
coords <- ss_to_data_frame(samples)
head(coords)Select locations uniformly at random:
samples <- ss_random(study_area, n = 30)
ss_plot_samples(samples)Create strata first, then sample randomly within each:
# Create 25 compact strata
strata <- ss_stratify(study_area, n_strata = 25, n_try = 5)
# Take 1 random sample per stratum
samples <- ss_stratified(strata, n_per_stratum = 1)
ss_plot(strata, samples = samples)Place samples at stratum centroids for optimal spatial coverage:
# Direct approach
samples <- ss_coverage(study_area, n_strata = 25, n_try = 5)
# Or use pre-computed strata
strata <- ss_stratify(study_area, n_strata = 25, n_try = 5)
samples <- ss_coverage(strata)
ss_plot(samples)Incorporate existing sampling locations:
# Existing sample locations
prior_pts <- st_as_sf(
data.frame(x = c(25, 75), y = c(25, 25)),
coords = c("x", "y")
)
# Create stratification around prior points
strata <- ss_stratify(
study_area,
n_strata = 20,
prior_points = prior_pts,
n_try = 5
)
samples <- ss_coverage(strata)
ss_plot(samples) # Prior points shown with different symbolCreate strata of equal size:
samples <- ss_coverage_equal_area(study_area, n_strata = 25, n_try = 5)For combining samples from multiple locations:
# Create 3 composite samples from 20 equal-area strata
samples <- ss_composite(study_area, n_strata = 20, n_composites = 3)
ss_plot(samples) # Different symbols for each compositeFeature-based sampling using D-optimal experimental design to maximize diversity:
# Create a grid with terrain features
strata <- ss_stratify(study_area, n_strata = 100, n_try = 5)
cells_sf <- strata$cells
# Add terrain features (in practice, extract from DEM)
coords <- st_coordinates(cells_sf)
cells_sf$elevation <- coords[,2] + rnorm(nrow(coords), 0, 5)
cells_sf$slope <- abs(rnorm(nrow(coords), 5, 2))
cells_sf$twi <- rnorm(nrow(coords), 8, 1.5)
# Select 20 optimal samples based on features
samples <- ss_maxvol(
cells_sf,
n = 20,
features = c("elevation", "slope", "twi"),
min_dist = 5, # Minimum distance between samples
normalize = TRUE, # Normalize features to equal scales
add_coords = TRUE # Include coordinates as features
)
ss_plot_samples(samples)The maxvol algorithm selects locations that maximize the determinant (volume) of the feature submatrix, ensuring maximum diversity in feature space. This is particularly effective when:
- Features explain soil-forming factors
- You need deterministic (non-random) point selection
- You want optimal coverage with few samples
The package provides validation metrics to assess the quality of your stratification:
Evaluate how uniformly cells are distributed across strata:
# Create stratification
strata <- ss_stratify(study_area, n_strata = 25, n_try = 5)
# Assess coverage efficiency
efficiency <- ss_coverage_efficiency(strata)
print(efficiency)Output metrics:
- CV (Coefficient of Variation): Lower values indicate more uniform distribution
- Min/Max Ratio: Ratio of smallest to largest stratum (1.0 = perfect equality)
- Gini Coefficient: Standard inequality measure (0 = perfect equality)
- Efficiency Score: Qualitative assessment (Excellent/Good/Fair/Poor)
Comparing stratification methods:
# Regular stratification
strata_regular <- ss_stratify(study_area, n_strata = 20, n_try = 5)
# Equal-area stratification
strata_equal <- ss_stratify(study_area, n_strata = 20,
equal_area = TRUE, n_try = 5)
# Compare efficiency
eff_regular <- ss_coverage_efficiency(strata_regular)
eff_equal <- ss_coverage_efficiency(strata_equal)
cat("Regular CV:", eff_regular$cv, "\n")
cat("Equal-area CV:", eff_equal$cv, "\n")
# Equal-area should have much lower CV (< 0.05 vs 0.15-0.30)Compute distance from cells or samples to stratum centroids:
# For stratification
dist_summary <- ss_distance_summary(strata)
print(dist_summary)
# For samples (works with coverage or stratified samples)
samples <- ss_coverage(strata)
dist_samples <- ss_distance_summary(samples, actual_distance = TRUE)
print(dist_samples)Output (per stratum):
- Mean, SD, min, max, median distances to centroid
- Helps identify dispersed or compact strata
- Default: squared distances (consistent with MSSD metric)
Use cases:
- Compare different stratification runs (
n_tryvalues) - Verify equal-area maintains compactness
- Identify strata with unusual geometry
# Read study area
study_area <- st_read("path/to/study_area.shp")
# Create sampling design
set.seed(42)
samples <- ss_coverage(study_area, n_strata = 50, n_try = 10)
# Export as CSV
coords <- ss_to_data_frame(samples)
write.csv(coords, "sampling_points.csv", row.names = FALSE)
# Export as GeoPackage
samples_sf <- ss_to_sf(samples)
st_write(samples_sf, "sampling_points.gpkg")| Function | Description |
|---|---|
ss_stratify() |
Create compact geographical strata using k-means |
| Function | Description |
|---|---|
ss_random() |
Simple random sampling |
ss_stratified() |
Stratified random sampling |
ss_coverage() |
Spatial coverage sampling (centroids) |
ss_coverage_equal_area() |
Coverage sampling with equal-area strata |
ss_maxvol() |
Maxvol optimal design sampling (feature-based) |
ss_composite() |
Composite sampling |
| Function | Description |
|---|---|
ss_plot() |
Plot stratification and/or samples |
ss_plot_samples() |
Plot sampling points only |
| Function | Description |
|---|---|
ss_coverage_efficiency() |
Assess uniformity of cell distribution across strata |
ss_distance_summary() |
Compute distance statistics from cells/samples to centroids |
| Function | Description |
|---|---|
ss_to_sf() |
Convert to sf object |
ss_to_data_frame() |
Convert to data frame with coordinates |
ss_summary() |
Get summary statistics |
ss_get_samples() |
Extract samples sf object |
ss_n_strata() |
Get number of strata |
ss_n_samples() |
Get number of samples |
ss_area() |
Get stratum areas |
ss_relative_area() |
Get relative stratum areas |
| Method | Best For | Inference Type |
|---|---|---|
ss_coverage() |
Interpolation (kriging) | Model-based |
ss_maxvol() |
Feature diversity, D-optimal design | Model-based |
ss_stratified() |
Estimating means/totals | Design-based |
ss_random() |
Simple design-based analysis | Design-based |
ss_composite() |
Reducing laboratory costs | Design-based |
The package uses pure R implementations of k-means algorithms:
-
Transfer Algorithm (default): Standard k-means that iteratively transfers cells to nearer cluster centers. Creates compact but potentially unequal-sized strata.
-
Swop Algorithm (
equal_area = TRUE): Modified k-means that swaps cells between clusters to maintain equal sizes while optimizing compactness.
Both algorithms minimize the Mean Squared Shortest Distance (MSSD) between cells and their assigned cluster centroids.
- Use multiple tries: Set
n_try = 10or higher to avoid local optima - Set a seed: Use
set.seed()for reproducible results - Adjust resolution: Use
n_cellsparameter to control grid density - Check convergence: The output includes a
convergedflag - Validate your stratification: Use
ss_coverage_efficiency()to ensure uniform strata - Compare alternatives: Test different
n_stratavalues and assess efficiency
-
de Gruijter, J.J., Brus, D.J., Bierkens, M.F.P., and Knotters, M. (2006). Sampling for Natural Resource Monitoring. Springer, Berlin.
-
Walvoort, D.J.J., Brus, D.J., and de Gruijter, J.J. (2010). An R package for spatial coverage sampling and random sampling from compact geographical strata by k-means. Computers & Geosciences 36, 1261-1267. DOI: 10.1016/j.cageo.2010.04.005
-
Petrovskaia, N., Korveh, K., and Maas, E. (2021). Optimal soil sampling design based on the maxvol algorithm. Geoderma 381, 114733. DOI: 10.1016/j.geoderma.2020.114733
GPL (>= 3)
