Skip to content

Conversation

@max-hoffman
Copy link
Contributor

@max-hoffman max-hoffman commented Feb 22, 2024

Prevent writing rows that exceed the prolly serial message max size. When writing fields we use the current buffer size, the new field length, and the serial message metadata size to validate inserts.

fixes: #7487
fixes: #7524

companion: dolthub/go-mysql-server#2342

@max-hoffman max-hoffman changed the title [store] row length guards for serial messages [store] row length guards for prolly serial messages Feb 22, 2024

// MaxTupleDataSize is the maximum KV length considering the extra
// flatbuffer metadata required to serialize the message. This number
// is only useful for checking the "last row" per-es, because every field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"per-es" is this a typo?

// is only useful for checking the "last row" per-es, because every field
// has offsets but field count, content hash, etc are global properties
// of the message.
// (uint16) - (2 kv offsets) - (field count) - (content hash) - (node count) - (tree level)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why there are exactly two kv offsets in the message?

@max-hoffman
Copy link
Contributor Author

@nicktobey I made two notable changes

  • the row length check is now a responsibility of the integrator. The memory db simulates closer to what innoDB does, and Dolt's limit will be a bit less because our refs are addresses rather than pointers and flatbuffers have extra metadata.
  • I tried naming the metadata offsets so they are more self-documenting, and moved the field-specific offsets to PutField which has the ability to skip the first row's offsets

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
8cc4acd ok 5937457
version total_tests
8cc4acd 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
659999e ok 5937457
version total_tests
659999e 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
ea92f6b ok 5937457
version total_tests
ea92f6b 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
8a6c533 ok 5937457
version total_tests
8a6c533 5937457
correctness_percentage
100.0

@max-hoffman max-hoffman merged commit 6f61ac5 into main Feb 27, 2024
@coffeegoddd
Copy link
Contributor

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000
version result total
2a4f295 ok 5937457
version total_tests
2a4f295 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000
version result total
01100cd ok 5937457
version total_tests
01100cd 5937457
correctness_percentage
100.0

@github-actions
Copy link

@coffeegoddd DOLT

test_name detail row_cnt sorted mysql_time sql_mult cli_mult
batching LOAD DATA 10000 1 0.05 0.6
batching batch sql 10000 1 0.08 1.25
batching by line sql 10000 1 0.07 1.43
blob 1 blob 200000 1 0.91 3.14 3.49
blob 2 blobs 200000 1 0.88 4.07 4.67
blob no blob 200000 1 0.88 1.28 1.42
col type datetime 200000 1 0.81 1.74 1.98
col type varchar 200000 1 0.68 1.96 2.06
config width 2 cols 200000 1 0.77 2.18 1.32
config width 32 cols 200000 1 1.8 1.41 2.58
config width 8 cols 200000 1 1.07 1.2 1.44
pk type float 200000 1 0.83 1.2 1.27
pk type int 200000 1 0.81 1.19 1.44
pk type varchar 200000 1 1.53 0.95 0.94
row count 1.6mm 1600000 1 5.58 1.47 1.53
row count 400k 400000 1 1.42 1.4 1.46
row count 800k 800000 1 2.81 1.45 1.5
secondary index four index 200000 1 3.53 1.08 0.93
secondary index no secondary 200000 1 0.9 1.27 1.36
secondary index one index 200000 1 1.1 1.49 1.51
secondary index two index 200000 1 1.95 1.22 1.12
sorting shuffled 1mm 1000000 0 5.26 1.76 1.82
sorting sorted 1mm 1000000 1 5.31 1.75 1.81

@github-actions
Copy link

@coffeegoddd DOLT

name detail mean_mult
dolt_blame_basic system table 1.41
dolt_blame_commit_filter system table 3.63
dolt_commit_ancestors_commit_filter system table 0.84
dolt_commits_commit_filter system table 0.87
dolt_diff_log_join_from_commit system table 2.06
dolt_diff_log_join_to_commit system table 2.05
dolt_diff_table_from_commit_filter system table 1.12
dolt_diff_table_to_commit_filter system table 1.15
dolt_diffs_commit_filter system table 1.03
dolt_history_commit_filter system table 1.42
dolt_log_commit_filter system table 0.9

@github-actions
Copy link

@coffeegoddd DOLT

name add_cnt delete_cnt update_cnt latency
adds_only 60000 0 0 0.73
adds_updates_deletes 60000 60000 60000 3.82
deletes_only 0 60000 0 1.87
updates_only 0 0 60000 2.47

@Hydrocharged Hydrocharged deleted the max/row-length-guards branch February 27, 2024 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large row sizes panic on insert Rows that are too large will panic

3 participants