Migrate Nutter to Databricks Python SDK #88

alexott · 2025-11-25T08:46:41Z

Overview

This PR migrates Nutter from the legacy databricks-cli SDK to the official Databricks Python SDK, bringing modern API support, improved type safety, and new compute capabilities.

🎯 Major Changes

1. SDK Migration (`databricks-cli` → `databricks-sdk`)

Dependencies Updated:

✅ Replaced databricks-api with databricks-sdk
✅ Removed legacy requests dependency
✅ Added jsonpickle for serialization
✅ Updated pytest from 5.0.1 to 9.0.1

Core API Changes:

Authentication: Migrated from custom auth config to SDK's unified authentication
API Calls: Updated from REST API wrappers to typed SDK methods
Response Objects: Replaced raw JSON dictionaries with typed dataclass objects (Run, RunTask, RunState, etc.)
Jobs API: Switched from submit_run() to submit_and_wait() for synchronous execution
Workspace API: Now returns typed ObjectInfo objects instead of JSON

Files Removed:

common/authconfig.py - Replaced by SDK authentication
common/httpretrier.py - SDK handles retries
tests/databricks/test_authconfig.py
tests/databricks/test_httpretrier.py
tests/databricks/test_utils.py - No longer needed with typed objects

2. Cluster Management Enhancements

2.1 Cluster Name Support 🆕

Users can now specify clusters by name instead of remembering cluster IDs:

# New: Use cluster name
nutter run test_pattern --cluster_name "My Test Cluster"

# Traditional: Use cluster ID (still supported)
nutter run test_pattern --cluster_id 0123-12334-tonedabc

Features:

✅ Case-insensitive cluster name matching
✅ Automatic cluster ID resolution via clusters.list() API
✅ Validates uniqueness (errors if multiple clusters have the same name)
✅ Backward compatible with positional cluster_id argument

2.2 Serverless Compute Support 🚀

Run tests on Databricks serverless compute without requiring a cluster:

# Run tests on serverless compute
nutter run test_pattern --serverless 1

# Recursive execution with serverless
nutter run dataload/ --serverless 1 --recursive

# Parallel execution with serverless
nutter run dataload/ --serverless 1 --recursive --max_parallel_tests 2

Features:

✅ Specify serverless environment version as integer
✅ No cluster required - tests run on ephemeral serverless compute
✅ Full feature parity with cluster-based execution
✅ Mutually exclusive with cluster options (validated)

3. Enhanced CLI Flexibility

Compute Options (mutually exclusive, one required):

--cluster_id - Specify cluster ID directly
--cluster_name - Specify cluster by name (auto-resolved)
--serverless - Use serverless compute (NEW)

Validation:

✅ Ensures exactly one compute option is specified
✅ Clear error messages for invalid configurations
✅ Validates serverless parameter is an integer
✅ Validates cluster_id is not empty

4. Test Suite Overhaul

Test Updates:

✅ Updated all tests to use typed SDK objects
✅ Fixed mocking to work with new SDK methods
✅ Added 13 new tests for cluster name and serverless features
✅ Fixed test hanging issues with authentication mocks
✅ Resolved PytestCollectionWarnings

Test Coverage:

242 tests passing (up from 234)
Added tests for cluster name resolution (6 tests)
Added tests for serverless compute (7 tests)
All tests run in ~3.3 seconds

5. Code Quality Improvements

Type Safety:

Using SDK's typed dataclasses throughout (Run, RunTask, RunState, etc.)
Better IDE autocomplete and type checking
Eliminated raw JSON dictionary manipulation

Error Handling:

More specific error messages
Better validation at CLI and API layers
Graceful handling of edge cases

Code Cleanup:

Removed obsolete utility functions
Simplified authentication handling
Streamlined API client implementation
Added telemetry with product version tracking

📊 Statistics

41 files changed
1,862 insertions(+)
1,692 deletions(-)

Files Modified: 41
Net Change: +170 lines
Test Coverage: 242 tests passing

🔄 Breaking Changes

None!

This migration is 100% backward compatible for end users:

✅ Existing CLI commands work unchanged
✅ Positional cluster_id argument still supported
✅ All existing features preserved
✅ Environment variable authentication works the same

New features are purely additive:

--cluster_name flag (optional)
--serverless flag (optional)

📚 Documentation Updates

✅ Updated README with new compute options
✅ Added examples for cluster name usage
✅ Added examples for serverless compute
✅ Updated CLI flags documentation
✅ Added notes about mutual exclusivity

🧪 Testing

All tests pass without issues:

$ pytest tests/
242 passed in 3.34s

Tested scenarios:

✅ Cluster ID execution (existing functionality)
✅ Cluster name resolution
✅ Serverless compute execution
✅ Validation and error handling
✅ Authentication with SDK
✅ All existing test scenarios

🚀 Benefits

Modern SDK: Official Databricks SDK with active maintenance and new features
Type Safety: Strongly typed API responses improve code reliability
Better DX: Cluster names instead of IDs improves developer experience
Serverless: Support for modern Databricks serverless compute
Future-Proof: Easy to adopt new Databricks features as they're added to SDK
Simplified Code: Removed custom authentication and retry logic

📋 Migration Notes for Contributors

If you're working on Nutter code:

Import from databricks.sdk instead of databricks_cli
Use typed SDK objects (Run, ClusterDetails, etc.) instead of JSON dicts
Authentication is handled automatically by SDK
API calls return typed objects, not raw responses

✅ Checklist

🙏 Acknowledgments

This migration enables Nutter to leverage the modern Databricks platform capabilities while maintaining full backward compatibility for existing users.

alexott added 7 commits August 16, 2023 15:38

First more or less working version on Databricks SDK

ecb9867

got it running without errors, but tests need to be fixed

c39fe58

Add SDK telemetry

c9d3db0

fixing setup.py

e0c2430

use .value instead of .name for enums

5b0fcb7

Fix all tests

9ef6c42

Add more options

6cb4a85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate Nutter to Databricks Python SDK #88

Migrate Nutter to Databricks Python SDK #88

Uh oh!

alexott commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Migrate Nutter to Databricks Python SDK #88

Are you sure you want to change the base?

Migrate Nutter to Databricks Python SDK #88

Uh oh!

Conversation

alexott commented Nov 25, 2025

Overview

🎯 Major Changes

1. SDK Migration (databricks-cli → databricks-sdk)

2. Cluster Management Enhancements

2.1 Cluster Name Support 🆕

2.2 Serverless Compute Support 🚀

3. Enhanced CLI Flexibility

4. Test Suite Overhaul

5. Code Quality Improvements

📊 Statistics

🔄 Breaking Changes

None!

📚 Documentation Updates

🧪 Testing

🚀 Benefits

📋 Migration Notes for Contributors

✅ Checklist

🙏 Acknowledgments

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. SDK Migration (`databricks-cli` → `databricks-sdk`)