Skip to content

Conversation

david-leifker
Copy link
Collaborator

SQL Setup Upgrade System

Overview

This PR introduces a comprehensive SQL setup upgrade system for DataHub that enables automated database initialization, table creation, and user management for both MySQL and PostgreSQL databases. The system supports both traditional authentication and IAM-based authentication, making it suitable for various deployment scenarios.

Key Features

🗄️ Database Support

  • MySQL: Full support with SSL configuration options
  • PostgreSQL: Full support with database creation capabilities
  • Auto-detection: Automatically detects database type from JDBC URL configuration

🔧 Core Functionality

  • Table Creation: Automated creation of metadata_aspect_v2 table with proper schema
  • Database Creation: PostgreSQL database creation with proper permissions
  • User Management: Optional creation of application users with configurable permissions
  • CDC Support: Optional Change Data Capture (CDC) user creation for data streaming
  • IAM Authentication: Support for AWS IAM-based authentication for user creation

Environment Variables

Core Configuration

Variable Default Description
CREATE_TABLES true Enable table creation
CREATE_DB true Enable database creation (PostgreSQL only)
CREATE_USER false Enable user creation
CDC_MCL_PROCESSING_ENABLED false Enable CDC user creation
CDC_USER datahub_cdc CDC username
CDC_PASSWORD datahub_cdc CDC password
IAM_ROLE - IAM role for user creation (required if IAM enabled)

Database Connection

The system automatically extracts connection details from Spring Ebean configuration:

  • Host/Port: Extracted from JDBC URL
  • Database Name: Extracted from JDBC URL
  • Credentials: Uses ebean.username and ebean.password
  • IAM Auth: Uses ebean.useIamAuth setting

Usage Examples

Basic Table Creation

# MySQL
./gradlew runSqlSetupMysql

# PostgreSQL
./gradlew runSqlSetupPostgres

With User Creation

# Traditional authentication
MYSQL_USERNAME=datahub MYSQL_PASSWORD=datahub CREATE_USER=true ./gradlew runSqlSetupMysql

# IAM authentication
MYSQL_USERNAME=datahub IAM_ROLE=DataHubRole CREATE_USER=true ./gradlew runSqlSetupMysql

With CDC Support

CDC_MCL_PROCESSING_ENABLED=true CDC_USER=datahub_cdc CDC_PASSWORD=datahub_cdc ./gradlew runSqlSetupMysql

@github-actions github-actions bot added docs Issues and Improvements to docs product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Oct 17, 2025
mockSetupArgs.createUser = true;
mockSetupArgs.iamAuthEnabled = false;
mockSetupArgs.createUserUsername = "testuser";
mockSetupArgs.createUserPassword = "testpass";
Copy link

@aikido-pr-checks aikido-pr-checks bot Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed secret in datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/sqlsetup/CreateUsersStepTest.java - low severity
Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches.
View details in Aikido Security

args.cdcUser = "custom_cdc";
args.cdcPassword = "custom_password";
args.createUserUsername = "testuser";
args.createUserPassword = "testpass";
Copy link

@aikido-pr-checks aikido-pr-checks bot Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed secret in datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/sqlsetup/SqlSetupArgsTest.java - low severity
Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches.
View details in Aikido Security

@Test
public void testGetCreateTraditionalUserSqlMysql() throws Exception {
mockSetupArgs.createUserUsername = "mysqluser";
mockSetupArgs.createUserPassword = "mysqlpass";
Copy link

@aikido-pr-checks aikido-pr-checks bot Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed secret in datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/sqlsetup/CreateUsersStepTest.java - low severity
Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches.
View details in Aikido Security

args.iamAuthEnabled = false;
args.createUser = true;
args.createUserUsername = null; // Missing createUserUsername
args.createUserPassword = "testpass";
Copy link

@aikido-pr-checks aikido-pr-checks bot Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposed secret in datahub-upgrade/src/test/java/com/linkedin/datahub/upgrade/sqlsetup/config/SqlSetupConfigTest.java - low severity
Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches.
View details in Aikido Security

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Oct 17, 2025
Copy link

alwaysmeticulous bot commented Oct 17, 2025

✅ Meticulous spotted 0 visual differences across 1001 screens tested: view results.

Meticulous evaluated ~8 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit fb48432. This comment will update as new commits are pushed.

Copy link
Collaborator

@abedatahub abedatahub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gotten through most of it. Mostly stylistic feedback so far.


import lombok.Data;

@Data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer immutable types. Replace with the following?

  @Value
  @Builder
  @ToString(exclude = "errorMessage")

Comment on lines +7 to +11
public int tablesCreated = 0;
public int usersCreated = 0;
public boolean cdcUserCreated = false;
public long executionTimeMs = 0;
public String errorMessage;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should not be public if we've got @Data.

public String errorMessage;

@Override
public String toString() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exclude errorMessage? In any case, lombok can do that for us.


import org.testng.annotations.Test;

public class SqlSetupResultTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file should not exist at all. This is why we use Lombok.


import lombok.Data;

@Data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should prefer @Value. The fields should not be public.

}

void createDatabaseIfNotExists(String databaseName, DatabaseType dbType) throws SQLException {
if (DatabaseType.POSTGRES.equals(dbType)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see these conditionals at a few places. How about extracting an interface and having separate impls for Postgres vs Mysql?

import java.util.List;
import javax.annotation.Nullable;

public class SqlSetup implements Upgrade {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc.

import org.testng.annotations.Test;

@TestPropertySource(properties = {"entityService.impl=ebean"})
public class SqlSetupUpgradeConfigTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this test is pure maintenance burden and should be deleted. It'll never catch real bugs; all it does is give false confidence about test coverage :)

import org.testng.annotations.Test;

/** Unit tests for DatabaseType enum. */
public class DatabaseTypeTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude got really intense with unit testing this class. I think this is a serious problem. I asked claude to write down my arguments against over-testing here:

OVER_TESTING_FEEDBACK.md

PTAL and we can discuss.


import org.testng.annotations.Test;

public class SqlSetupArgsTest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another test class that we should drop.

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

devops PR or Issue related to DataHub backend & deployment docs Issues and Improvements to docs pending-submitter-response Issue/request has been reviewed but requires a response from the submitter product PR or Issue related to the DataHub UI/UX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants