[SIP-11] Proposal for deprecating the native Druid NoSQL connector

### Motivation

Superset currently supports two engine connectors for querying datasources; SQLAlchemy and the Druid REST API. The later was the initial use case for Superset, i.e., a UI for visualizing Druid datasources.

Since version [0.10.0](https://github.com/apache/incubator-druid/releases/tag/druid-0.10.0) Druid has included a built-it SQL server which has a SQLAlchemy binding provided by the [pydruid](https://github.com/druid-io/pydruid) library (courtesy of @betodealmeida and @mistercrunch) and thus the proposed change is to deprecate the REST API interface in favor of having a single interface (SQLAlchemy) to all engines. Note all future engines (there has been mentioned of adding support for Elasticsearch) would require a SQLAlchemy dialect. 

There is a non-insignificant amount of overhead in supporting both connectors including:

#### Code

From a code perspective each connector needs to define similar views and models. The [Druid](https://github.com/apache/incubator-superset/tree/master/superset/connectors/druid) connector alone comprises of around 2,000 lines of code. There is additional frontend logic which needs to construct filters, metrics, etc. for both the Druid REST API and SQLAlchemy. Note there are [74](https://github.com/apache/incubator-superset/search?q=druid&unscoped_q=druid) files (including documentation) which reference Druid in the repo. 

#### Models

In addition to code overhead each connector defines its own models and database tables:

Druid: 
- `clusters`
- `datasources`
- `columns`
- `metrics`

SQLAlchemy: 
- `dbs` 
- `tables`
- `table_columns`
- `sql_metric`

which complicates logic, i.e., the `slices` table does not have a SQLAlchemy relationship to a "datasource" table as the datasource type determines the association. This results in denormalized tables with potentially incorrect values, i.e., the `slices` table contains the `datasource_name` column for the FAB CRUD views, however this may not accurately reflect the underlying datasource name. 

#### Proposed Change

The proposed change would be to deprecate all the Druid REST logic from the codebase. This significantly simplifies and streamlines a number of facets of Superset by ensuring that all engines connect via a SQLAlchemy dialect.

Currently there is support for syncing/refreshing Druid datasource associated with the REST API connector which I suspect is leveraged by a number of organizations. [SIP-7](https://github.com/apache/incubator-superset/issues/5842) discussing "refreshing" of Superset datasources. 

Note this would be a breaking change for any organizing using a Druid version less than `0.10.0`. Also there may be some instances of post-aggregate Druid functions which are not supported in Druid SQL.

#### New or Changed Public Interfaces

There would be no new or changed public interfaces.

#### New dependencies

There would be no new dependencies.

#### Migration Plan and Compatibility

A non-trivial database migration would be required including:
- All records in the Druid tables listed above would need to be migrated to the SQLAlchemy equivalent table.  
- Existing slices would need to be updated to reference the new SQLAlchemy representation of the Druid datasource.
- Re-normalize the `slices` table.
- Update chart data to remove the obsolete `table__` or `druid__` prefixes.

#### Rejected Alternatives

None.

to: @betodealmeida @graceguo-supercat @kristw @michellethomas @mistercrunch @timifasubaa 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-11] Proposal for deprecating the native Druid NoSQL connector #6032

Motivation

Code

Models

Proposed Change

New or Changed Public Interfaces

New dependencies

Migration Plan and Compatibility

Rejected Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development