Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automatically convert posixlt to posixct in set/:=/<- #6299

Merged
merged 12 commits into from
Aug 28, 2024

Conversation

ben-schwen
Copy link
Member

Closes #1724

@ben-schwen ben-schwen marked this pull request as draft July 17, 2024 23:09
Copy link

github-actions bot commented Jul 17, 2024

Comparison Plot

Generated via commit 013837f

Download link for the artifact containing the test results: ↓ atime-results.zip

Time taken to finish the standard R installation steps: 11 minutes and 32 seconds

Time taken to run atime::atime_pkg on the tests: 5 minutes and 47 seconds

@tdhock
Copy link
Member

tdhock commented Jul 18, 2024

related to an issue I was having many years ago #2068 (comment)
It sounds like this PR would fix that issue.
Briefly, I expected that any posixlt RHS of := should be converted to posixct before storing in a data.table column, as when we call data.table().
However I believe the current logic is: whenever strptime is used in DT[, j], wrap it in as.POSIXct as we see below

> data.table(x=strptime("2024-01-01","%Y-%m-%d"))
            x
       <POSc>
1: 2024-01-01
Warning message:
In as.data.table.list(x, keep.rownames = keep.rownames, check.names = check.names,  :
  POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.
> data.table()[, x := strptime("2024-01-01","%Y-%m-%d")]
Warning message:
In strptime("2024-01-01", "%Y-%m-%d") :
  strptime() usage detected and wrapped with as.POSIXct(). This is to minimize the chance of assigning POSIXlt columns, which use 40+ bytes to store one date (versus 8 for POSIXct). Use as.POSIXct() (which will call strptime() as needed internally) to avoid this warning.
> class(strptime("2024-01-01","%Y-%m-%d"))
[1] "POSIXlt" "POSIXt" 

So I think what you are doing is great, you may look at what is done in data.table() to make sure there is consistent messages/code.

@ben-schwen ben-schwen marked this pull request as ready for review July 18, 2024 13:18
NEWS.md Outdated Show resolved Hide resolved
src/assign.c Outdated
@@ -428,6 +428,10 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values)
}
if (any_duplicated(cols,FALSE)) error(_("Can't assign to the same column twice in the same query (duplicates detected)."));
if (!isNull(newcolnames) && !isString(newcolnames)) error(_("newcolnames is supplied but isn't a character vector"));
if (Rf_inherits(values, "POSIXlt")) {
warning(_("Values of type POSIXlt detected and converted to POSIXct. We do not recommend the use of POSIXlt at all because it uses 40+ bytes to store one date. Use as.POSIXct() to avoid this warning."));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare with warning from data.table()

POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.

Can the two warnings be made exactly the same? (easier for translators probably)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(easier for translators probably)

Possibly, but this is thrown from C while the other is thrown from R, so currently they live in different domains (R-data.table.pot vs data.table.pot), so a given translator might not even know about the two different messages. I think there are two ways to align them:

  1. Make the R message thrown from a C wrapper like posixlt_warning = function() .Call(Cposixlt_warning)
  2. Call gettext(., domain="data.table") on the exact string used in C.

The former adds a little clutter to our .so, while the latter is a bit fragile by depending on the exact wording. I'm not sure either is better, or if either is worth investing in in the first place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the effort to align them is worth, if it is not ensured that both are translated by the same translator (or by the same translation string).

inst/tests/tests.Rraw Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
@MichaelChirico MichaelChirico added this to the 1.17.0 milestone Aug 6, 2024
@MichaelChirico
Copy link
Member

Marking this for 1.17.0 -- I suspect there can be downstream breakages as a result. Better to defer till after release.

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what revdeps find!

@MichaelChirico MichaelChirico merged commit be97437 into master Aug 28, 2024
7 of 8 checks passed
@MichaelChirico MichaelChirico deleted the set_posixct branch August 28, 2024 03:56
@tdhock
Copy link
Member

tdhock commented Aug 29, 2024

revdeps checked today and did not find anything, https://rcdata.nau.edu/genomic-ml/data.table-revdeps/analyze/2024-08-28/

@iago-pssjd
Copy link
Contributor

Does it also close #4800?

@ben-schwen
Copy link
Member Author

@iago-pssjd Nope since this only covers operations on data.table and not the setDT conversion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

:= should as.POSIXct when RHS is POSIXlt
4 participants