Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(datetime): enhance datetime parsing and validation #2129

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

CoreyWinkelmannPP
Copy link
Contributor

@CoreyWinkelmannPP CoreyWinkelmannPP commented Nov 8, 2024

Add validation and parsing for DateColumnDef and DateTimeColumnDef to handle invalid date formats and zero dates as null. Introduce a DateValidator with regex matching for date formats. Extend tests to cover invalid dates and malformed datetime strings.

MySQL and MariaDB allow invalid dates by default unless strict SQL mode is on, enabling entries like 2020-00-01 to be stored without error, which can lead to data processing issues. In older systems, data may be in poor condition, making it difficult to switch to strict mode, so these cases often need handling when using CDC.

Add validation and parsing for DateColumnDef and DateTimeColumnDef to handle invalid date formats and zero dates as null. Introduce a DateValidator with regex matching for date formats. Extend tests to cover invalid dates and malformed datetime strings.
@osheroff
Copy link
Collaborator

what was the behavior of the bootstrapper wrt to "0000-00-00" style dates (if the as-null flag was off) before this change?

I know, it absolutely sucks to output invalid datetimes into the stream but all things equal the 0000-00-00 case is so common I'd prefer not to change that behavior without changing a flag. I'm good with your other changes that nullify stuff like zero months tho.

@CoreyWinkelmannPP
Copy link
Contributor Author

That's a good question. I will try and run through those scenarios when I get a chance today to get an idea of what it did and how the change will affect it. Might be necessary to call out the all zero path and allow it to run the normal path through without validating it from there.

@CoreyWinkelmannPP
Copy link
Contributor Author

I isolated the date validation logic outside of the all zero case and I believe kept the logic of the all zero case the same as what existed before. From my testing locally it seems to work in the same way as before.

else
return appendFractionalSeconds("0000-00-00 00:00:00", 0, getColumnLength());
} else {
if ( !DateValidator.isValidDateTime(dateString) )
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the delay, I'm coming back to this PR since I'm prepping a release of your mariadb stuff and some other stuff too. I'm a little worried about running a regex in a very hot code-path...

But I guess this only runs on bootstrapping so it might be ok? can you confirm that?

if (value == null) {
return null;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mildly, same concerns here -- is this just for bootstrapping?

@osheroff
Copy link
Collaborator

would be nice to slap some integration tests around this code (esp checking behavior in bootstrapping, which I believe is where this change targets).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants