Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(batch): support mysql_query for mysql batch ingestion #19071

Merged
merged 48 commits into from
Nov 1, 2024

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Oct 23, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

We support mysql_query in this PR. We extract common deserialization logic between mysql cdc parser and mysql_query (introduced in this PR). We also handle serde for bit. bit(1) will just be treat as boolean. bit(>1) will be treated as bytea.

I also added a local-inline-source-test profile, which complements ci-inline-source-test, so we can easily run source tests locally.

You may consult the table in the release notes for the column type mapping for MySQL -> RisingWave after this PR.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

This feature should be a TECHNICAL PREVIEW. See below.

This PR introduces the mysql_query tvf. Its syntax will very likely change down the road. So the interface and examples you see below will be obsolete in subsequent versions. That being said the core functionality will still exist.

Type mapping table. The only changes are for bit(1) and bit(>1). Previously both of these map to NULL. Now these are supported.

Bool and boolean are mapped to smallint as well, because they are actually an alias of tinyint(1). If we choose to map them to boolean, it means if someone declared a tinyint(1) intentionally, it will be converted to boolean as well.

Instead we just opt for a more consistent approach: all tinyint, including (bool/boolean) will map to smallint, users may convert bit(1) into RW boolean if they wish to instead.

MySQL Type RisingWave Type
bit(1) boolean
bit(>1) bytea
bool/boolean smallint
tinyint smallint
smallint smallint
mediumint int
int int
bigint bigint
float float32
double float64
decimal decimal
numeric decimal
year int
date date
time time
datetime timestamp
timestamp timestamptz
varchar varchar
char varchar
json jsonb
blob bytea
tinyblob bytea
mediumblob bytea
longblob bytea
array unsupported
enum unsupported
set unsupported
geometry unsupported
null unsupported

After this PR we support mysql_query.

Syntax:

mysql_query(hostname varchar, port varchar, username varchar, password varchar, database_name varchar, query varchar)

Example

Mysql:

CREATE TABLE test (
    id bigint primary key,
    v0 bit,
    v1 bool,
    v2 tinyint(1),
    v3 tinyint(2),
    v4 smallint,
    v5 mediumint,
    v6 integer,
    v7 bigint,
    v8 float,
    v9 double,
    v10 numeric(4, 2),
    v11 decimal(4, 2),
    v12 char(255),
    v13 varchar(255),
    v14 bit(10),
    v15 tinyblob,
    v16 blob,
    v17 mediumblob,
    v18 longblob,
    v19 date,
    v20 time,
    v21 timestamp,
    v22 json
);
INSERT INTO test SELECT
  1 as id,
  true as v0,
  true as v1,
  2 as v2,
  3 as v3,
  4 as v4,
  5 as v5,
  6 as v6,
  7 as v7,
  1.08 as v8,
  1.09 as v9,
  1.10 as v10,
  1.11 as v11,
  'char' as v12,
  'varchar' as v13,
  b'1010' as v14,
  x'16' as v15,
  x'17' as v16,
  x'18' as v17,
  x'19' as v18,
  '2021-01-01' as v19,
  '12:34:56' as v20,
  '2021-01-01 12:34:56' as v21,
  JSON_OBJECT('key1', 1, 'key2', 'abc');

rw:

select * from mysql_query('$MYSQL_HOST', '$MYSQL_TCP_PORT', '$RISEDEV_MYSQL_USER', '$MYSQL_PWD', 'tvf', 'select * from test;');
----
1 t 1 2 3 4 5 6 7 1.08 1.09 1.10 1.11 char varchar \x000a \x16 \x17 \x18 \x19 2021-01-01 12:34:56 2021-01-01 12:34:56+00:00 {"key1": 1, "key2": "abc"}

Breaking changes

If user had a column with datatype bit in their source previously, this can cause inconsistency. Because previously it would be deserialized as NULL. After this PR, it correctly converts it to either bytea or boolean.

Copy link

gitguardian bot commented Oct 28, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
14320510 Triggered Generic Password c139f84 e2e_test/source_inline/tvf/mysql_query.slt View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @StrikeW to see whether there some duplications

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I unified both in risingwave_common.

@kwannoel kwannoel added the user-facing-changes Contains changes that are visible to users label Oct 30, 2024
@kwannoel kwannoel marked this pull request as ready for review October 30, 2024 12:36
@kwannoel kwannoel requested a review from a team as a code owner October 30, 2024 12:36
@kwannoel kwannoel requested a review from MrCroxx October 30, 2024 12:36
@graphite-app graphite-app bot requested review from a team October 30, 2024 12:37
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM.

src/batch/src/executor/mysql_query.rs Outdated Show resolved Hide resolved
src/frontend/src/expr/table_function.rs Show resolved Hide resolved
src/frontend/src/expr/table_function.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@StrikeW StrikeW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/common/src/mysql.rs Outdated Show resolved Hide resolved
Comment on lines 136 to +140
TableFunctionToFileScanRule::create(),
// Apply postgres query rule next
TableFunctionToPostgresQueryRule::create(),
// Apply mysql query rule next
TableFunctionToMySqlQueryRule::create(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to merge all these into one rule?

Inside the rule just match logical_table_function.table_function.function_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for CONNECTION first: #19222.

We can refactor pg and mysql then.

@kwannoel kwannoel requested a review from xxchan November 1, 2024 02:21
@@ -174,8 +174,8 @@ SELECT
c_binary_255
FROM rw_mysql_types_test order by c_boolean;
----
0 NULL NULL NULL -8388608 -2147483647 9223372036854775806 -10 -10000 -10000 c d \x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
1 NULL -128 -32767 -8388608 -2147483647 -9223372036854775807 -10 -10000 -10000 a b \x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0 f NULL NULL -8388608 -2147483647 9223372036854775806 -10 -10000 -10000 c d \x000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this changed? Could you please note the "changes" of the "Type mapping table" in the release note?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the release notes, previously only mentioned them in the PR description. It's because we support bit conversion now.

@kwannoel kwannoel mentioned this pull request Nov 1, 2024
30 tasks
@kwannoel kwannoel added this pull request to the merge queue Nov 1, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants