feat(text-splitters): add MySQL Language support for RecursiveCharacterTextSplitter Class with delimiter aware splitting #34077
+95
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Enhanced
RecursiveCharacterTextSplitterforLanguage.MYSQLby adding a comprehensive list of SQL separators including DDL, DML, and control-flow constructs. This enables more logical splitting of MySQL scripts—especially those containing stored procedures, triggers, functions, and multiline statements—while preserving structural integrity and block boundaries.Additionally, improved delimiter handling so MySQL‐specific separators are evaluated before the default newline/space behavior, minimizing incorrect splits. Updated unit tests accordingly and adjusted
chunk_sizeto avoid unintended recursive breaking during splitting.Issue: N/A
Dependencies: None
Test Notes
For the MySQL test
test_mysql_query_splits, the previous configuration usedchunk_size=100, but the first expected block exceeds 135 characters. With the default recursive behavior, this resulted in excessive splitting. Updated example usage: