Skip to content

Conversation

@Emmanuel-Arokiaraj
Copy link

Description:
Enhanced RecursiveCharacterTextSplitter for Language.MYSQL by adding a comprehensive list of SQL separators including DDL, DML, and control-flow constructs. This enables more logical splitting of MySQL scripts—especially those containing stored procedures, triggers, functions, and multiline statements—while preserving structural integrity and block boundaries.

Additionally, improved delimiter handling so MySQL‐specific separators are evaluated before the default newline/space behavior, minimizing incorrect splits. Updated unit tests accordingly and adjusted chunk_size to avoid unintended recursive breaking during splitting.

Issue: N/A

Dependencies: None


Test Notes

For the MySQL test test_mysql_query_splits, the previous configuration used chunk_size=100, but the first expected block exceeds 135 characters. With the default recursive behavior, this resulted in excessive splitting. Updated example usage:

splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.MYSQL,
    chunk_size=500,
    chunk_overlap=0,
)

@github-actions github-actions bot added text-splitters Related to the package `text-splitters` fix labels Nov 23, 2025
@Emmanuel-Arokiaraj Emmanuel-Arokiaraj changed the title fix(text-splitters): improve MySQL delimiter handling and enhance SQL splitting accuracy feat(text-splitters): add MySQL Language support for RecursiveCharacterTextSplitter Class with delimiter aware splitting Nov 23, 2025
@github-actions github-actions bot added feature and removed fix labels Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant