Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsed results of FROM Clause in SQL query #3

Closed
JasperGuo opened this issue Oct 12, 2018 · 2 comments
Closed

Parsed results of FROM Clause in SQL query #3

JasperGuo opened this issue Oct 12, 2018 · 2 comments

Comments

@JasperGuo
Copy link

Hi,

Thanks for your great job in building such a dataset for nl2sql.
I am now conducting a characteristics study of the sql on it.
I find that some parsed results of FROM Clause in SQL Query are confusing.

Here is an example.
The FROM Clause in the query contains two join operations and three tables involved.
But in the parsed results of table_units, only two of the tables are involved.

    {
        "db_id": "department_management",
        "query": "SELECT DISTINCT T1.creation FROM department AS T1 JOIN management AS T2 ON T1.department_id  =  T2.department_id JOIN head AS T3 ON T2.head_id  =  T3.head_id WHERE T3.born_state  =  'Alabama'",
        "sql": {
            "except": null,
            "from": {
                "conds": [
                    [
                        false,
                        2,
                        [
                            0,
                            [
                                0,
                                1,
                                false
                            ],
                            null
                        ],
                        [
                            0,
                            11,
                            false
                        ],
                        null
                    ]
                ],
                "table_units": [
                    [
                        "table_unit",
                        0
                    ],
                    [
                        "table_unit",
                        2
                    ]
                ]
            },
           ...
        }
    }

Based on my further inspection, it may be caused by the order of JOIN and ON clauses.
Here is a clear and correct example.

{
        "db_id": "bike_1",
        "query": "SELECT count(*) FROM station AS T1 JOIN trip AS T2 JOIN station AS T3 JOIN trip AS T4 ON T1.id  =  T2.start_station_id AND T2.id  =  T4.id AND T3.id  =  T4.end_station_id WHERE T1.city  =  \"Mountain View\" AND T3.city  =  \"Palo Alto\"",
        "sql": {
            "except": null,
            "from": {
                "conds": [
                    [
                        false,
                        2,
                        [
                            0,
                            [
                                0,
                                1,
                                false
                            ],
                            null
                        ],
                        [
                            0,
                            16,
                            false
                        ],
                        null
                    ],
                    "and",
                    [
                        false,
                        2,
                        [
                            0,
                            [
                                0,
                                12,
                                false
                            ],
                            null
                        ],
                        [
                            0,
                            12,
                            false
                        ],
                        null
                    ],
                    "and",
                    [
                        false,
                        2,
                        [
                            0,
                            [
                                0,
                                1,
                                false
                            ],
                            null
                        ],
                        [
                            0,
                            19,
                            false
                        ],
                        null
                    ]
                ],
                "table_units": [
                    [
                        "table_unit",
                        0
                    ],
                    [
                        "table_unit",
                        2
                    ],
                    [
                        "table_unit",
                        0
                    ],
                    [
                        "table_unit",
                        2
                    ]
                ]
            },
            ...
        }
    },

Please let me know if I misunderstand something of the parsed results.

Great Thanks!

@taoyds
Copy link
Owner

taoyds commented Oct 17, 2018

Hi,
We generated train.json and dev.json using the old version of the SQL parse script process_sql.py. The old parsed results are inputs for some baselines so that we didn't update them in released files. If you would like to update the parsed sql results, we provide the following script using the latest process_sql.py (with a few modifications compared to the old version) to get the right parsed results:
https://github.com/taoyds/spider/blob/master/preprocess/parse_raw_json.py

Best,
Tao

@JasperGuo
Copy link
Author

Thanks Tao! I will give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants