Skip to content

Conversation

@venom1204
Copy link
Contributor

@venom1204 venom1204 commented Jan 8, 2025

In this i have made the error popping out to be more informative by two things

Which column(s) are missing.
Which data.table is missing the column(s).
Closes #6556

@venom1204 venom1204 marked this pull request as draft January 8, 2025 14:32
@venom1204
Copy link
Contributor Author

Hi @aitap,
I hope you’re doing well! I’m currently facing some challenges with the atime performance test, and I could use your guidance.
Sorry for the disturbance, but if you have a moment, could you please help me figure out how to resolve this? I’d really appreciate your insights.
Thank you so much for your support!

@aitap
Copy link
Contributor

aitap commented Jan 8, 2025

It looks like the atime tests will only work with branches created inside the Rdatatable/data.table repository, not the outside forks. Unless you are a member of a team with write access to create new branches, there may be nothing you can fix.

Apologies for the poke, @Anirban166, could you help with this? If this is not by design (e.g. performance tests are relatively expensive and therefore must be run only from local branches), could the action create a local branch from refs/pull/<ID>/head instead of relying on ${GITHUB_HEAD_REF} (which seems to contain the branch name from the remote repository)?

@venom1204
Copy link
Contributor Author

Hi @MichaelChirico,
Apologies for the interruption, but could you kindly take a look at the issue with the atime performance tests? I’m running into an error, and it seems to be related to how the tests are triggered for pull requests from external forks.
If you could provide any guidance on how to resolve this, it would be greatly appreciated. Your help would mean a lot as I navigate this issue!

@MichaelChirico
Copy link
Member

You can ignore the atime issue

@venom1204 venom1204 marked this pull request as ready for review January 10, 2025 08:45
@venom1204
Copy link
Contributor Author

@MichaelChirico thanks for the clarification
can you please review the changes in the pr.

error = 'must be valid column names in x and y')
test(1962.021, {
if (!"z" %in% colnames(DT1) || !"z" %in% colnames(DT2)) {
stop("The columns listed in `by` are missing from either x or y: z")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove backticks, which are for markdown, not error messages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Toby, sorry, I disagree.

The backticks serve to highlight that this is a code object, and not a plain English word. Without them, a reader can easily be confused into thinking there's some grammatical mistake "in by", or otherwise struggle to parse the message they're given.

Of course, we could choose some other convention (single/double quotes, e.g.), and we should try and pick one and stick to it throughout the codebase... but that's a separate issue.

Personally, these days I am using `arg=` for function arguments to highlight that (1) it's code with the backticks and (2) it's a keyword argument with =.

test(1601.4, merge(DT0, DT0, by="a"),
warning="Neither of the input data.tables to join have columns.",
error="Elements listed in `by`")
error="The following columns are missing:\n - From `x`: a\n - From `y`: a")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove newlines in error messages

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, or add? I find the current output hard to read:

The following columns are missing: - From x: a

I would find this much more readable (possibly indenting the second line):

The following columns are missing:
- From x: a

At a higher level, I wonder if translation would be easier if we instead structured the message like so:

The following columns are missing from x: ...
The following columns are missing from y: ...

Copy link
Contributor

@aitap aitap Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The longer sentence structure would definitely be easier to translate.

R/merge.R Outdated
if (!all(by.x %chin% nm_x)) {
missing_in_x <- setdiff(by.x, nm_x)
stopf("The following columns listed in `by.x` are missing from `x`: %s",
toString(missing_in_x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brackify instead of toString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The brackify function adds brackets around column names in error messages, which may not align with the expected format in your test cases.should i change the format of the test case ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, use brackify. It provides nice formatting and also some simple truncation mechanism in case missing_in_x happens to have 10s or dozens of elements.

@tdhock
Copy link
Member

tdhock commented Jan 13, 2025

I added documentation which explains that atime failure is normal in forks, https://github.com/Rdatatable/data.table/wiki/Performance-testing#can-not-be-run-from-forks

If this is not by design (e.g. performance tests are relatively expensive and therefore must be run only from local branches),

I believe the problem is not the "relatively expensive part" but rather that the action requires permission to upload artifacts to rdatatable/data.table repo.

@codecov
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.62%. Comparing base (2714cb4) to head (d3f466f).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6713   +/-   ##
=======================================
  Coverage   98.62%   98.62%           
=======================================
  Files          79       79           
  Lines       14640    14642    +2     
=======================================
+ Hits        14438    14440    +2     
  Misses        202      202           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@venom1204
Copy link
Contributor Author

Hi @MichaelChirico and @tdhock,
I have implemented all the changes you suggested. Could you please review the updates and let me know if there's anything else I can improve?
Thank you!

@MichaelChirico MichaelChirico changed the title feat: improve merge.data.table error messages for missing keys (#6556) Improve merge.data.table error messages for missing keys Jan 27, 2025
@MichaelChirico
Copy link
Member

Thank you for the PR! Please be sure to take a look at the final touch-up edits I've just made. A lot of it is minor points, but it does make review a lot easier if everything is "neat as a pin".

@MichaelChirico MichaelChirico merged commit fd2a915 into Rdatatable:master Jan 27, 2025
8 of 9 checks passed
@venom1204
Copy link
Contributor Author

venom1204 commented Jan 27, 2025

Thank you for reviewing the PR and making those touch-up edits—I sincerely apologize for not having everything as polished as it should have been. I realize this stems from the coding style I’ve been accustomed to since I started coding, but I understand now how important it is to maintain a cleaner, more consistent style to make the review process smoother. I’ll be more mindful of these details moving forward to prevent causing unnecessary effort on your part. Thank you for your patience and understanding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature request] diagnostic for merge.data.table when by = key is not present in dt being merged

5 participants