-
Notifications
You must be signed in to change notification settings - Fork 1k
Improve merge.data.table error messages for missing keys #6713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @aitap, |
|
It looks like the atime tests will only work with branches created inside the Rdatatable/data.table repository, not the outside forks. Unless you are a member of a team with write access to create new branches, there may be nothing you can fix. Apologies for the poke, @Anirban166, could you help with this? If this is not by design (e.g. performance tests are relatively expensive and therefore must be run only from local branches), could the action create a local branch from |
|
Hi @MichaelChirico, |
|
You can ignore the atime issue |
|
@MichaelChirico thanks for the clarification |
inst/tests/tests.Rraw
Outdated
| error = 'must be valid column names in x and y') | ||
| test(1962.021, { | ||
| if (!"z" %in% colnames(DT1) || !"z" %in% colnames(DT2)) { | ||
| stop("The columns listed in `by` are missing from either x or y: z") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove backticks, which are for markdown, not error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Toby, sorry, I disagree.
The backticks serve to highlight that this is a code object, and not a plain English word. Without them, a reader can easily be confused into thinking there's some grammatical mistake "in by", or otherwise struggle to parse the message they're given.
Of course, we could choose some other convention (single/double quotes, e.g.), and we should try and pick one and stick to it throughout the codebase... but that's a separate issue.
Personally, these days I am using `arg=` for function arguments to highlight that (1) it's code with the backticks and (2) it's a keyword argument with =.
inst/tests/tests.Rraw
Outdated
| test(1601.4, merge(DT0, DT0, by="a"), | ||
| warning="Neither of the input data.tables to join have columns.", | ||
| error="Elements listed in `by`") | ||
| error="The following columns are missing:\n - From `x`: a\n - From `y`: a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove newlines in error messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove, or add? I find the current output hard to read:
The following columns are missing: - From x: a
I would find this much more readable (possibly indenting the second line):
The following columns are missing:
- From x: a
At a higher level, I wonder if translation would be easier if we instead structured the message like so:
The following columns are missing from x: ...
The following columns are missing from y: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The longer sentence structure would definitely be easier to translate.
R/merge.R
Outdated
| if (!all(by.x %chin% nm_x)) { | ||
| missing_in_x <- setdiff(by.x, nm_x) | ||
| stopf("The following columns listed in `by.x` are missing from `x`: %s", | ||
| toString(missing_in_x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brackify instead of toString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The brackify function adds brackets around column names in error messages, which may not align with the expected format in your test cases.should i change the format of the test case ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, use brackify. It provides nice formatting and also some simple truncation mechanism in case missing_in_x happens to have 10s or dozens of elements.
|
I added documentation which explains that atime failure is normal in forks, https://github.com/Rdatatable/data.table/wiki/Performance-testing#can-not-be-run-from-forks
I believe the problem is not the "relatively expensive part" but rather that the action requires permission to upload artifacts to rdatatable/data.table repo. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6713 +/- ##
=======================================
Coverage 98.62% 98.62%
=======================================
Files 79 79
Lines 14640 14642 +2
=======================================
+ Hits 14438 14440 +2
Misses 202 202 ☔ View full report in Codecov by Sentry. |
|
Hi @MichaelChirico and @tdhock, |
|
Thank you for the PR! Please be sure to take a look at the final touch-up edits I've just made. A lot of it is minor points, but it does make review a lot easier if everything is "neat as a pin". |
|
Thank you for reviewing the PR and making those touch-up edits—I sincerely apologize for not having everything as polished as it should have been. I realize this stems from the coding style I’ve been accustomed to since I started coding, but I understand now how important it is to maintain a cleaner, more consistent style to make the review process smoother. I’ll be more mindful of these details moving forward to prevent causing unnecessary effort on your part. Thank you for your patience and understanding! |
In this i have made the error popping out to be more informative by two things
Which column(s) are missing.
Which data.table is missing the column(s).
Closes #6556