Migrate Nutter to Databricks Python SDK #88
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR migrates Nutter from the legacy
databricks-cliSDK to the official Databricks Python SDK, bringing modern API support, improved type safety, and new compute capabilities.🎯 Major Changes
1. SDK Migration (
databricks-cli→databricks-sdk)Dependencies Updated:
databricks-apiwithdatabricks-sdkrequestsdependencyjsonpicklefor serializationpytestfrom 5.0.1 to 9.0.1Core API Changes:
Run,RunTask,RunState, etc.)submit_run()tosubmit_and_wait()for synchronous executionObjectInfoobjects instead of JSONFiles Removed:
common/authconfig.py- Replaced by SDK authenticationcommon/httpretrier.py- SDK handles retriestests/databricks/test_authconfig.pytests/databricks/test_httpretrier.pytests/databricks/test_utils.py- No longer needed with typed objects2. Cluster Management Enhancements
2.1 Cluster Name Support 🆕
Users can now specify clusters by name instead of remembering cluster IDs:
Features:
clusters.list()API2.2 Serverless Compute Support 🚀
Run tests on Databricks serverless compute without requiring a cluster:
Features:
3. Enhanced CLI Flexibility
Compute Options (mutually exclusive, one required):
--cluster_id- Specify cluster ID directly--cluster_name- Specify cluster by name (auto-resolved)--serverless- Use serverless compute (NEW)Validation:
4. Test Suite Overhaul
Test Updates:
Test Coverage:
5. Code Quality Improvements
Type Safety:
Run,RunTask,RunState, etc.)Error Handling:
Code Cleanup:
📊 Statistics
Files Modified: 41
Net Change: +170 lines
Test Coverage: 242 tests passing
🔄 Breaking Changes
None!
This migration is 100% backward compatible for end users:
cluster_idargument still supportedNew features are purely additive:
--cluster_nameflag (optional)--serverlessflag (optional)📚 Documentation Updates
🧪 Testing
All tests pass without issues:
$ pytest tests/ 242 passed in 3.34sTested scenarios:
🚀 Benefits
📋 Migration Notes for Contributors
If you're working on Nutter code:
databricks.sdkinstead ofdatabricks_cliRun,ClusterDetails, etc.) instead of JSON dicts✅ Checklist
🙏 Acknowledgments
This migration enables Nutter to leverage the modern Databricks platform capabilities while maintaining full backward compatibility for existing users.