Skip to content

Migrate datafusion/sql tests to insta, part6 #15578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

qstommyshu
Copy link
Contributor

@qstommyshu qstommyshu commented Apr 4, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

This is the part 5 of #15484 breakdown, as the code changes in #15484 is too large. Part1: #15497, Part2: #15499, Part3: #15533, Part4: 15548, Part5: #15567

Are these changes tested?

Yes, I manually tested the before/after changes.

Are there any user-facing changes?

No

@github-actions github-actions bot added the sql SQL Planner label Apr 4, 2025
@qstommyshu qstommyshu marked this pull request as ready for review April 4, 2025 19:18
@qstommyshu
Copy link
Contributor Author

qstommyshu commented Apr 4, 2025

This is the last PR for #15397

The migration is mostly done. There is only one really big test roundtrip_statement_with_dialect() that I think it would be too big to break down into individual test cases with assert_snapshot! inline assertion (I'm thinking if there is a way to to break it down without increasing 1000+ lines of test code).

Please let me know if you have any suggestion @alamb @blaginin .

@qstommyshu
Copy link
Contributor Author

qstommyshu commented Apr 4, 2025

Also, I found the test cases are a bit here and there, but I think the sql test suite are too large to restructure now. I'm thinking maybe we can try to use modules to separate different tests in future?

Here is an example:

pub fn add(a: i32, b: i32) -> i32 {
    a + b
}

pub fn multiply(a: i32, b: i32) -> i32 {
    a * b
}

#[cfg(test)]
mod tests {
    use super::*;

    // add test suite
    mod add_tests {
        use super::*;

        #[test]
        fn test_add_positive() {
            assert_eq!(add(2, 3), 5);
        }

        #[test]
        fn test_add_negative() {
            assert_eq!(add(-2, -3), -5);
        }
    }

    // multiply test suite
    mod multiply_tests {
        use super::*;

        #[test]
        fn test_multiply_positive() {
            assert_eq!(multiply(4, 5), 20);
        }

        #[test]
        fn test_multiply_mixed() {
            assert_eq!(multiply(-3, 6), -18);
        }
    }

    // integration test suite
    mod integration_tests {
        use super::*;

        #[test]
        fn test_combined_operations() {
            let sum = add(2, 3);
            let product = multiply(sum, 4);
            assert_eq!(product, 20);
        }
    }
}

@qstommyshu
Copy link
Contributor Author

qstommyshu commented Apr 6, 2025

Hi @alamb and @blaginin

I found a way to migrate roundtrip_statement_with_dialect() now, just want to confirm if you like to proceed with it.

What I did below is to incorporate all core test logics into a macro, and try to minimize the lines of code (so that we don't have too much lines of code increase in this migration) for each test case but still expressive, by adding the variable name before each variable in the macro (because rust analyzer doesn't indicate types for macro).

#[macro_export]
macro_rules! roundtrip_statement_with_dialect_helper {
    (
        query: $sql:expr,
        parser_dialect: $parser_dialect:expr,
        unparser_dialect: $unparser_dialect:expr,
        expected: @ $expected:literal $(,)?
    ) => {{
        let statement = Parser::new(&$parser_dialect)
            .try_with_sql($sql)?
            .parse_statement()?;

        let state = MockSessionState::default()
            .with_aggregate_function(max_udaf())
            .with_aggregate_function(min_udaf())
            .with_expr_planner(Arc::new(CoreFunctionPlanner::default()))
            .with_expr_planner(Arc::new(NestedFunctionPlanner));

        let context = MockContextProvider { state };
        let sql_to_rel = SqlToRel::new(&context);
        let plan = sql_to_rel
            .sql_statement_to_plan(statement)
            .unwrap_or_else(|e| panic!("Failed to parse sql: {}\n{e}", $sql));

        let unparser = Unparser::new(&$unparser_dialect);
        let roundtrip_statement = unparser.plan_to_sql(&plan)?;

        let actual = &roundtrip_statement.to_string();
        insta::assert_snapshot!(actual, @ $expected);
    }};
}
// each test case would look like this
#[test]
fn roundtrip_statement_with_dialect_1() -> Result<(), DataFusionError> {
    roundtrip_statement_with_dialect_helper!(
        query: "select min(ta.j1_id) as j1_min from j1 ta order by min(ta.j1_id) limit 10;",
        parser_dialect: MySqlDialect {},
        unparser_dialect: UnparserMySqlDialect {},
        expected: @"SELECT `j1_min` FROM (SELECT min(`ta`.`j1_id`) AS `j1_min`, min(`ta`.`j1_id`) FROM `j1` AS `ta` ORDER BY min(`ta`.`j1_id`) ASC) AS `derived_sort` LIMIT 10",
    );
    Ok(())
}

Note: we can't put this macro in a for loop because inline assert_snapshot!($actual, @ $expected) only takes literal as expected argument, so we can't store the expected value in a variable and then pass to the macro (expr != literal).

And I feel like this is not a standard way to write test, not sure if I should go head and do it or not. Please let me know if you want to proceed with this way.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me -- thank you @qstommyshu

I had some minor suggestions but nothing required in my opinion

cc @blaginin

@qstommyshu
Copy link
Contributor Author

Hi @alamb and @blaginin

I found a way to migrate roundtrip_statement_with_dialect() now, just want to confirm if you like to proceed with it.

What I did below is to incorporate all core test logics into a macro, and try to minimize the lines of code (so that we don't have too much lines of code increase in this migration) for each test case but still expressive, by adding the variable name before each variable in the macro (because rust analyzer doesn't indicate types for macro).

#[macro_export]
macro_rules! roundtrip_statement_with_dialect_helper {
    (
        query: $sql:expr,
        parser_dialect: $parser_dialect:expr,
        unparser_dialect: $unparser_dialect:expr,
        expected: @ $expected:literal $(,)?
    ) => {{
        let statement = Parser::new(&$parser_dialect)
            .try_with_sql($sql)?
            .parse_statement()?;

        let state = MockSessionState::default()
            .with_aggregate_function(max_udaf())
            .with_aggregate_function(min_udaf())
            .with_expr_planner(Arc::new(CoreFunctionPlanner::default()))
            .with_expr_planner(Arc::new(NestedFunctionPlanner));

        let context = MockContextProvider { state };
        let sql_to_rel = SqlToRel::new(&context);
        let plan = sql_to_rel
            .sql_statement_to_plan(statement)
            .unwrap_or_else(|e| panic!("Failed to parse sql: {}\n{e}", $sql));

        let unparser = Unparser::new(&$unparser_dialect);
        let roundtrip_statement = unparser.plan_to_sql(&plan)?;

        let actual = &roundtrip_statement.to_string();
        insta::assert_snapshot!(actual, @ $expected);
    }};
}
// each test case would look like this
#[test]
fn roundtrip_statement_with_dialect_1() -> Result<(), DataFusionError> {
    roundtrip_statement_with_dialect_helper!(
        query: "select min(ta.j1_id) as j1_min from j1 ta order by min(ta.j1_id) limit 10;",
        parser_dialect: MySqlDialect {},
        unparser_dialect: UnparserMySqlDialect {},
        expected: @"SELECT `j1_min` FROM (SELECT min(`ta`.`j1_id`) AS `j1_min`, min(`ta`.`j1_id`) FROM `j1` AS `ta` ORDER BY min(`ta`.`j1_id`) ASC) AS `derived_sort` LIMIT 10",
    );
    Ok(())
}

Note: we can't put this macro in a for loop because inline assert_snapshot!($actual, @ $expected) only takes literal as expected argument, so we can't store the expected value in a variable and then pass to the macro (expr != literal).

And I feel like this is not a standard way to write test, not sure if I should go head and do it or not. Please let me know if you want to proceed with this way.

Just want to confirm, do I need to migrate the big test case roundtrip_statement_with_dialect() ?

@alamb
Copy link
Contributor

alamb commented Apr 6, 2025

Just want to confirm, do I need to migrate the big test case roundtrip_statement_with_dialect() ?

IN my opinion, it is is note required. If you want to rewrite it I am sure it would be appreciated by future contributors

Another pattern is to put the entire test into a structure that has a Display impl and then print out that display impl in the snapshot. I did something like this in pydantic#16

So instead of

// each test case would look like this
#[test]
fn roundtrip_statement_with_dialect_1() -> Result<(), DataFusionError> {
    roundtrip_statement_with_dialect_helper!(
        query: "select min(ta.j1_id) as j1_min from j1 ta order by min(ta.j1_id) limit 10;",
        parser_dialect: MySqlDialect {},
        unparser_dialect: UnparserMySqlDialect {},
        expected: @"SELECT `j1_min` FROM (SELECT min(`ta`.`j1_id`) AS `j1_min`, min(`ta`.`j1_id`) FROM `j1` AS `ta` ORDER BY min(`ta`.`j1_id`) ASC) AS `derived_sort` LIMIT 10",
    );
    Ok(())
}
struct TestResult { 
  input_query: String,
  parser_dialect: &Dialect,
  unparser_dialect: &Dialect,
  output_query: String 
}

impl TestResult {
  fn try_new(
  input_query: String,
  parser_dialect: &Dialect,
  unparser_dialect: &Dialect,
  ) -> Result<Self> { 
   // do unparse , etc
  }
}

impl Display for TestResult { ... }

// tests make a TestResult and compare the string result
#[test]
fn roundtrip_statement_with_dialect_1() -> Result<(), DataFusionError> {
    let result =


    insta::assert_snapshot!(
        TestResult::try_new(
          select min(ta.j1_id) as j1_min from j1 ta order by min(ta.j1_id) limit 10;",
          MySqlDialect {},
           UnparserMySqlDialect {},
        },
        @r"
        query: "select min(ta.j1_id) as j1_min from j1 ta order by min(ta.j1_id) limit 10;",
        parser_dialect: MySqlDialect {},
        unparser_dialect: UnparserMySqlDialect {},
        output "SELECT `j1_min` FROM (SELECT min(`ta`.`j1_id`) AS `j1_min`, min(`ta`.`j1_id`) FROM `j1` AS `ta` ORDER BY min(`ta`.`j1_id`) ASC) AS `derived_sort` LIMIT 10",
         );
}
}

@qstommyshu
Copy link
Contributor Author

Got it, Thanks @alamb for your thoughts.

I will update roundtrip_statement_with_dialect(). I will probably go with the macro approach because the macro approach is essentially the same as the original test, so I can almost follow the original TestStatementWithDialect structure. Plus, I don't need to create a new struct and impl traits.

I can foresee the code changes will be large, so I will leave it to the next PR (part 7, should be the real last part). Please merge this PR if you think the change is good.

@alamb alamb merged commit 4ccd455 into apache:main Apr 7, 2025
28 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 7, 2025

Thanks @qstommyshu !

nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* WIP

* migrate tests in `diagnostic.rs` to `insta`

* Migrate simple test case in `plan_to_sql.rs` to `insta`

* Migrate `sql_round_trip` test cases in `plan_to_sql.rs` to `insta`

* Migrate table reference tests in `plan_to_sql.rs` to use `insta` for snapshot testing

* Refactor roundtrip expression tests in `plan_to_sql.rs` to use snapshot assertions with `insta`

* Update roundtrip expression tests in `plan_to_sql.rs` to use updated snapshot assertions

* resolve comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants