Python client for the USDA-NRCS Soil Data Access (SDA) web service and other National Cooperative Soil Survey data sources.
soildb
provides Python access to the USDA Soil Data Access (SDA) web
service https://sdmdataaccess.nrcs.usda.gov/.
Query soil survey data, export to pandas/polars DataFrames, and handle spatial queries.
pip install soildb
For spatial functionality:
pip install soildb[spatial]
For all optional features support:
pip install soildb[all]
- Query soil survey data from SDA
- Export to pandas and polars DataFrames
- Build custom SQL queries with fluent interface
- Spatial queries with points, bounding boxes, and polygons
- Bulk data fetching with automatic pagination
- Async I/O for high performance and concurrency
This is a basic example of building a custom query and getting the results:
from soildb import Query
query = (Query()
.select("mukey", "muname", "musym")
.from_("mapunit")
.inner_join("legend", "mapunit.lkey = legend.lkey")
.where("areasymbol = 'IA109'")
.limit(5))
# inspect query
print(query.to_sql())
result = await soildb.SDAClient().execute(query)
df = result.to_pandas()
print(df.head())
SELECT TOP 5 mukey, muname, musym FROM mapunit INNER JOIN legend ON mapunit.lkey = legend.lkey WHERE areasymbol = 'IA109'
mukey muname musym
0 408337 Colo silty clay loam, channeled, 0 to 2 percen... 1133
1 408339 Colo silty clay loam, 0 to 2 percent slopes 133
2 408340 Colo silty clay loam, 2 to 4 percent slopes 133B
3 408345 Clarion loam, 9 to 14 percent slopes, moderate... 138D2
4 408348 Harpster silt loam, 0 to 2 percent slopes 1595
You may have noticed that we need to await
the query execution result.
All soildb functions are async. Here’s how to run them in different environments like Jupyter notebooks, VSCode, or regular Python scripts.
import asyncio
import soildb
async def main():
# Your async code here
mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(main())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(main())
result
For comprehensive async usage, see the Async Programming Guide.
soildb provides several high-level functions for common tasks:
mapunits = await soildb.get_mapunit_by_areasymbol("IA109")
df = mapunits.to_pandas()
print(f"Found {len(df)} map units")
df.head()
Found 80 map units
mukey | musym | muname | mukind | muacres | areasymbol | areaname | |
---|---|---|---|---|---|---|---|
0 | 408333 | 1032 | Spicer silty clay loam, 0 to 2 percent slopes | Consociation | 1834 | IA109 | Kossuth County, Iowa |
1 | 408334 | 107 | Webster clay loam, 0 to 2 percent slopes | Consociation | 46882 | IA109 | Kossuth County, Iowa |
2 | 408335 | 108 | Wadena loam, 0 to 2 percent slopes | Consociation | 807 | IA109 | Kossuth County, Iowa |
3 | 408336 | 108B | Wadena loam, 2 to 6 percent slopes | Consociation | 1103 | IA109 | Kossuth County, Iowa |
4 | 408337 | 1133 | Colo silty clay loam, channeled, 0 to 2 percen... | Consociation | 1403 | IA109 | Kossuth County, Iowa |
If you have suggestions for new convenience functions please file a “feature request” on GitHub.
soildb also offers support for queries by location via
spatial_query()
. You can specify arbitrary geometry to target several
spatial and tabular types of results.
import asyncio
async def spatial_query_example():
from soildb import spatial_query
# Point query
async with soildb.SDAClient() as client:
response = await spatial_query(
geometry="POINT (-93.6 42.0)",
table="mupolygon",
spatial_relation="intersects"
)
df = response.to_pandas()
print(f"Point query found {len(df)} results")
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(spatial_query_example())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(spatial_query_example())
result
Point query found 1 results
mukey | areasymbol | musym | nationalmusym | muname | mukind | |
---|---|---|---|---|---|---|
0 | 411278 | IA169 | 1314 | fsz1 | Hanlon-Spillville complex, channeled, 0 to 2 p... | Complex |
soildb makes it easy to retrieve large datasets efficiently, using concurrent requests and built-in functions that automatically handle pagination.
import asyncio
async def bulk_fetch_example():
from soildb import fetch_by_keys, get_mukey_by_areasymbol
# Get mukeys for multiple areas concurrently
areas = ["IA109", "IA113", "IA117"]
mukeys_tasks = [
get_mukey_by_areasymbol([area])
for area in areas
]
# Execute all mukey requests concurrently
mukeys_results = await asyncio.gather(*mukeys_tasks)
# Flatten the results (each task returns a list)
all_mukeys = []
for mukeys in mukeys_results:
all_mukeys.extend(mukeys)
print(f"Found {len(all_mukeys)} mukeys across {len(areas)} areas")
# Fetch data in chunks automatically
response = await fetch_by_keys(
all_mukeys,
"component",
key_column="mukey",
chunk_size=100,
columns=["mukey", "cokey", "compname", "localphase", "comppct_r"]
)
df = response.to_pandas()
print(f"Fetched {len(df)} component records")
return df
# Handle different environments
try:
# Check if there's already an event loop (Jupyter, etc.)
loop = asyncio.get_running_loop()
import nest_asyncio
nest_asyncio.apply()
result = loop.run_until_complete(bulk_fetch_example())
except RuntimeError:
# No existing loop, use asyncio.run()
result = asyncio.run(bulk_fetch_example())
result.head(10)
Found 410 mukeys across 3 areas
Fetching 410 keys in 5 chunks of 100
Fetched 1067 component records
mukey | cokey | compname | localphase | comppct_r | |
---|---|---|---|---|---|
0 | 408333 | 25562547 | Kingston | <NA> | 2 |
1 | 408333 | 25562548 | Okoboji | <NA> | 5 |
2 | 408333 | 25562549 | Spicer | <NA> | 90 |
3 | 408333 | 25562550 | Madelia | <NA> | 3 |
4 | 408334 | 25562837 | Okoboji | <NA> | 5 |
5 | 408334 | 25562838 | Glencoe | <NA> | 3 |
6 | 408334 | 25562839 | Canisteo | <NA> | 2 |
7 | 408334 | 25562840 | Webster | <NA> | 85 |
8 | 408334 | 25562841 | Nicollet | <NA> | 5 |
9 | 408335 | 25562135 | Biscay | <NA> | 1 |
The component
table has a hierarchical relationship:
- mukey (map unit key) is the parent
- cokey (component key) is the child
So when fetching components, you typically want to filter by mukey to get all components for specific map units.
The specialized fetch_component_by_mukey()
convenience function
handles this, but above we use the lower-level fetch_by_keys()
with
the "mukey"
as the key_column
to achieve the same result and
demonstrate pagination over chunks with 100
rows each.
See the examples/
directory and documentation
for detailed usage patterns.
This project is licensed under the MIT License. See the LICENSE file for details.