Skip to content

Conversation

@alexott
Copy link

@alexott alexott commented Nov 25, 2025

Overview

This PR migrates Nutter from the legacy databricks-cli SDK to the official Databricks Python SDK, bringing modern API support, improved type safety, and new compute capabilities.

🎯 Major Changes

1. SDK Migration (databricks-clidatabricks-sdk)

Dependencies Updated:

  • ✅ Replaced databricks-api with databricks-sdk
  • ✅ Removed legacy requests dependency
  • ✅ Added jsonpickle for serialization
  • ✅ Updated pytest from 5.0.1 to 9.0.1

Core API Changes:

  • Authentication: Migrated from custom auth config to SDK's unified authentication
  • API Calls: Updated from REST API wrappers to typed SDK methods
  • Response Objects: Replaced raw JSON dictionaries with typed dataclass objects (Run, RunTask, RunState, etc.)
  • Jobs API: Switched from submit_run() to submit_and_wait() for synchronous execution
  • Workspace API: Now returns typed ObjectInfo objects instead of JSON

Files Removed:

  • common/authconfig.py - Replaced by SDK authentication
  • common/httpretrier.py - SDK handles retries
  • tests/databricks/test_authconfig.py
  • tests/databricks/test_httpretrier.py
  • tests/databricks/test_utils.py - No longer needed with typed objects

2. Cluster Management Enhancements

2.1 Cluster Name Support 🆕

Users can now specify clusters by name instead of remembering cluster IDs:

# New: Use cluster name
nutter run test_pattern --cluster_name "My Test Cluster"

# Traditional: Use cluster ID (still supported)
nutter run test_pattern --cluster_id 0123-12334-tonedabc

Features:

  • ✅ Case-insensitive cluster name matching
  • ✅ Automatic cluster ID resolution via clusters.list() API
  • ✅ Validates uniqueness (errors if multiple clusters have the same name)
  • ✅ Backward compatible with positional cluster_id argument

2.2 Serverless Compute Support 🚀

Run tests on Databricks serverless compute without requiring a cluster:

# Run tests on serverless compute
nutter run test_pattern --serverless 1

# Recursive execution with serverless
nutter run dataload/ --serverless 1 --recursive

# Parallel execution with serverless
nutter run dataload/ --serverless 1 --recursive --max_parallel_tests 2

Features:

  • ✅ Specify serverless environment version as integer
  • ✅ No cluster required - tests run on ephemeral serverless compute
  • ✅ Full feature parity with cluster-based execution
  • ✅ Mutually exclusive with cluster options (validated)

3. Enhanced CLI Flexibility

Compute Options (mutually exclusive, one required):

  1. --cluster_id - Specify cluster ID directly
  2. --cluster_name - Specify cluster by name (auto-resolved)
  3. --serverless - Use serverless compute (NEW)

Validation:

  • ✅ Ensures exactly one compute option is specified
  • ✅ Clear error messages for invalid configurations
  • ✅ Validates serverless parameter is an integer
  • ✅ Validates cluster_id is not empty

4. Test Suite Overhaul

Test Updates:

  • ✅ Updated all tests to use typed SDK objects
  • ✅ Fixed mocking to work with new SDK methods
  • ✅ Added 13 new tests for cluster name and serverless features
  • ✅ Fixed test hanging issues with authentication mocks
  • ✅ Resolved PytestCollectionWarnings

Test Coverage:

  • 242 tests passing (up from 234)
  • Added tests for cluster name resolution (6 tests)
  • Added tests for serverless compute (7 tests)
  • All tests run in ~3.3 seconds

5. Code Quality Improvements

Type Safety:

  • Using SDK's typed dataclasses throughout (Run, RunTask, RunState, etc.)
  • Better IDE autocomplete and type checking
  • Eliminated raw JSON dictionary manipulation

Error Handling:

  • More specific error messages
  • Better validation at CLI and API layers
  • Graceful handling of edge cases

Code Cleanup:

  • Removed obsolete utility functions
  • Simplified authentication handling
  • Streamlined API client implementation
  • Added telemetry with product version tracking

📊 Statistics

41 files changed
1,862 insertions(+)
1,692 deletions(-)

Files Modified: 41
Net Change: +170 lines
Test Coverage: 242 tests passing

🔄 Breaking Changes

None!

This migration is 100% backward compatible for end users:

  • ✅ Existing CLI commands work unchanged
  • ✅ Positional cluster_id argument still supported
  • ✅ All existing features preserved
  • ✅ Environment variable authentication works the same

New features are purely additive:

  • --cluster_name flag (optional)
  • --serverless flag (optional)

📚 Documentation Updates

  • ✅ Updated README with new compute options
  • ✅ Added examples for cluster name usage
  • ✅ Added examples for serverless compute
  • ✅ Updated CLI flags documentation
  • ✅ Added notes about mutual exclusivity

🧪 Testing

All tests pass without issues:

$ pytest tests/
242 passed in 3.34s

Tested scenarios:

  • ✅ Cluster ID execution (existing functionality)
  • ✅ Cluster name resolution
  • ✅ Serverless compute execution
  • ✅ Validation and error handling
  • ✅ Authentication with SDK
  • ✅ All existing test scenarios

🚀 Benefits

  1. Modern SDK: Official Databricks SDK with active maintenance and new features
  2. Type Safety: Strongly typed API responses improve code reliability
  3. Better DX: Cluster names instead of IDs improves developer experience
  4. Serverless: Support for modern Databricks serverless compute
  5. Future-Proof: Easy to adopt new Databricks features as they're added to SDK
  6. Simplified Code: Removed custom authentication and retry logic

📋 Migration Notes for Contributors

If you're working on Nutter code:

  • Import from databricks.sdk instead of databricks_cli
  • Use typed SDK objects (Run, ClusterDetails, etc.) instead of JSON dicts
  • Authentication is handled automatically by SDK
  • API calls return typed objects, not raw responses

✅ Checklist

  • All tests passing
  • Documentation updated
  • Backward compatibility maintained
  • New features added (cluster name, serverless)
  • Code quality improvements
  • No breaking changes

🙏 Acknowledgments

This migration enables Nutter to leverage the modern Databricks platform capabilities while maintaining full backward compatibility for existing users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant