Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb fails to pushdown json_extract resulting in querry oom or server crash #48981

Open
0xdeafbeef opened this issue Nov 28, 2023 · 4 comments
Open
Labels
may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5

Comments

@0xdeafbeef
Copy link

1. Minimal reproduce step (Required)

import pymysql
import json
import os
import random

# Database connection parameters
db_params = {
    'host': 'localhost',
    'user': 'your_username',
    'password': 'your_password',
    'db': 'your_database',
    'charset': 'utf8mb4',
    'cursorclass': pymysql.cursors.DictCursor
}

# Initialize connection
connection = pymysql.connect(**db_params)

# Function to create database and table
def create_database_and_table():
    with connection.cursor() as cursor:
        cursor.execute("CREATE DATABASE IF NOT EXISTS your_database;")
        cursor.execute("USE your_database;")
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS blocks (
                workchain TINYINT NOT NULL,
                shard BIGINT UNSIGNED NOT NULL,
                seqno INT UNSIGNED NOT NULL,
                root_hash VARBINARY(32) NOT NULL,
                shards_info JSON NULL,
                PRIMARY KEY (workchain, shard, seqno),
                CONSTRAINT blocks_root_hash UNIQUE (root_hash)
            );
        """)
    connection.commit()

# Function to generate a random hex string
def generate_hex_string(length=64):
    return ''.join(random.choice('0123456789abcdef') for _ in range(length))

# Function to generate fake shards_info JSON
def generate_shards_info(fake=True):
    if fake:
        # Generate random shards_info that does not match the query condition
        shards_info = [{"info": {"rootHash": [generate_hex_string(2)]}} for _ in range(random.randint(1, 5))]
    else:
        # Generate shards_info that matches the query condition
        shards_info = [{"info": {"rootHash": ["01"]}}]  # This will match the UNHEX condition in the SQL query
    return json.dumps(shards_info)

# Function to generate and insert fake data
def generate_fake_data(num_records, insert_matching_row=True):
    with connection.cursor() as cursor:
        if insert_matching_row:
            # Insert one matching row
            matching_workchain = -1
            matching_shard = 0x8000000000000000
            matching_seqno = random.randint(0, 4294967295)
            matching_root_hash = os.urandom(32)
            matching_shards_info = generate_shards_info(fake=False)

            sql = """
            INSERT INTO blocks (workchain, shard, seqno, root_hash, shards_info)
            VALUES (%s, %s, %s, %s, %s)
            """
            cursor.execute(sql, (matching_workchain, matching_shard, matching_seqno, matching_root_hash, matching_shards_info))

        # Insert fake rows
        for _ in range(num_records - 1):
            fake_workchain = random.randint(-128, 127)
            fake_seqno = random.randint(0, 4294967295)
            fake_root_hash = os.urandom(32)
            fake_shards_info = generate_shards_info()

            cursor.execute(sql, (fake_workchain, matching_shard, fake_seqno, fake_root_hash, fake_shards_info))

    connection.commit()

# Create database and table
create_database_and_table()

# Generate 10_000_000 fake records, with 1 matching row
generate_fake_data(10**6)

# Close the connection
connection.close()
SELECT root_hash
FROM blocks
WHERE workchain = -1
  AND shard = 0x8000000000000000
  AND UNHEX(JSON_UNQUOTE(JSON_EXTRACT(shards_info,
                                      '$[0].info.rootHash[0]'))) =
      0x1
LIMIT 1;

ddebbdd3fb95f2b16335f77890ad99b1ddec5992a4670105079a6a768c36bd5c.txt
image

using

/*+ READ_FROM_STORAGE(TIFLASH[blocks]) */ 

reduces memory usage from 8gb to 980kb
image
be3ba3f6fe2722f7b62db5fc771750f47e5484ba9426644ca165b25fe8113c9e.txt

2. What did you expect to see? (Required)

I expected a simple streaming search with JSON_UNQUOTE filter pushed down to tikv, which successfully was done with tiflash.

3. What did you see instead (Required)

HY000][1105] Your query has been cancelled due to exceeding the allowed memory limit for the tidb-server instance and this query is currently using the most memory. Please try narrowing your query scope or increase the tidb_server_memory_limit and try again.[conn=2819858710]

Server with version

Release Version: v6.5.2
Edition: Community
Git Commit Hash: 29116c0256c52b224da2b34d712c1063d171c0ad
Git Branch: heads/refs/tags/v6.5.2
UTC Build Time: 2023-04-19 10:52:06
GoVersion: go1.19.8
Race Enabled: false
TiKV Min Version: 6.2.0-alpha
Check Table Before Drop: false
Store: tikv

crashes with oom.

I have no info about memory usage because of the crash, but it has same execution plan.

4. What is your TiDB version? (Required)

Release Version: v7.4.0
Edition: Community
Git Commit Hash: 38cb4f3312be9199a983c0ef282d2ea2e28a7824
Git Branch: heads/refs/tags/v7.4.0
UTC Build Time: 2023-10-10 14:18:50
GoVersion: go1.21.1
Race Enabled: false
Check Table Before Drop: false
Store: tikv
@0xdeafbeef 0xdeafbeef added the type/bug The issue is confirmed as a bug. label Nov 28, 2023
@ti-chi-bot ti-chi-bot bot added may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 labels Nov 29, 2023
@zanmato1984
Copy link
Contributor

Hi @0xdeafbeef , thanks for the report.

First I want to give a quick answer to the question about why the same query sometimes got canceled and sometimes crashed (by OS's oom-killer). Our memory tracking mechanism will trigger query canceling if it detects the memory consumption is exceeding the specified memory limit, but that cancel action can be later than OS seeing the TiDB process is using up the system's physical memory. That's why crash can happen.

Now about the push down. There are several aspects to consider:

  1. The optimizer (cost model) decides which plan is better, i.e., using index lookup (this way no filter could be pushed down) or table full scan + filter push down. So it doesn't necessarily mean that pushing down filters like json_extract will out perform others. Actually in your case, if given enough memory on the machine, it is probably better to use index lookup rather than table full scan + filter push down (1m index rows + 0.86m table rows VS. 453m table rows), which might use less memory than index lookup though.
  2. Unfortunately, the optimizer doesn't take memory consumption into account so it might actually emit a plan that is optimal in terms of performance but requires more resources such as memory.
  3. Function json_extract and json_unquote are actually able to be pushed down to TiKV, but the function unhex can't. (However unhex is able to be pushed down to TiFlash.)
  4. There exists optimizer hints to ignore some index, see https://docs.pingcap.com/tidb/stable/optimizer-hints#ignore_indext1_name-idx1_name--idx2_name- . But given 3, this hint won't result in the filter being pushed down either.

We'll be adding push down support for unhex in TiKV to address 3. I'm also removing bug label.

@zanmato1984 zanmato1984 removed type/bug The issue is confirmed as a bug. sig/execution SIG execution severity/major labels Nov 29, 2023
@0xdeafbeef
Copy link
Author

@zanmato1984 thank you for detailed answer!
One thing I don't understand: Even if TiKV doesn't support UNHEX pushdown, shouldn't it return a string made with json_extract and then TiDB will filter the rows using UNHEX? This way, it would use a reasonable amount of memory.

I think the absence of UNHEX optimization is not the issue. The problem is that the index lookup consumes too much memory.

For example,

SELECT root_hash
FROM blocks
WHERE workchain = -1
  AND shard = 0x8000000000000000
  AND UNHEX(HEX(file_hash)) = 0x0000000
LIMIT 1;

it gives the same plan. But memory usage during IndexLookUp_18 is 34mb instead of 8gb.

0e2568b4f83ea4a007d589652c090ed695b8a6315cc6b7039647f6ba4a895601.txt

plan

@0xdeafbeef
Copy link
Author

I accidentally ran the query on a smaller database copy. I re-ran the test on the real one. It uses 89MB for the index, which is reasonable.
image

@zanmato1984
Copy link
Contributor

@zanmato1984 thank you for detailed answer! One thing I don't understand: Even if TiKV doesn't support UNHEX pushdown, shouldn't it return a string made with json_extract and then TiDB will filter the rows using UNHEX? This way, it would use a reasonable amount of memory.

The filter push down is actually a "all-or-nothing" situation, meaning that any unsupported function nested in any level of an expression tree will effectively block the whole expression tree to be pushed down.

I think the absence of UNHEX optimization is not the issue. The problem is that the index lookup consumes too much memory.

For example,

SELECT root_hash
FROM blocks
WHERE workchain = -1
  AND shard = 0x8000000000000000
  AND UNHEX(HEX(file_hash)) = 0x0000000
LIMIT 1;

it gives the same plan. But memory usage during IndexLookUp_18 is 34mb instead of 8gb.

0e2568b4f83ea4a007d589652c090ed695b8a6315cc6b7039647f6ba4a895601.txt

plan

I would think otherwise. I guess it's the time-consuming json functions in Selection operator that slowed down the consuming of the data output by IndexLookUp, and caused the high memory consumption within IndexLookUp. So pushing down unhex, and consequently the whole filter, will actually help in terms of the memory issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5
Projects
None yet
Development

No branches or pull requests

3 participants