-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUGZILLA #17546] extend readtable with a hook for column type detection #6720
Comments
Please attach your patch and then we can iterate on it. METADATA
|
Created attachment 2414 [details] METADATA
INCLUDED PATCH
|
Created attachment 2415 [details] This patch illustrates the use of colConvert. METADATA
INCLUDED PATCH
|
Created attachment 2416 [details] This is my v2 of the patch. METADATA
INCLUDED PATCH
|
Created attachment 2417 [details] the test for v2 patch. METADATA
INCLUDED PATCH
|
Created attachment 2418 [details] This v3 version combines code & test in 1 patch. In my opinion, the API change is correct. METADATA
INCLUDED PATCH
|
There are two basic patterns to override fallback:
As you've discovered, the downside to #2 is that there is some state that has to be passed to the delegate. For example, with callNextMethod(), the name of the generic and other information is needed to find the next method (it gets those from the calling frames, which is complicated and not recommended). State should be localized to the extent possible. Global variables add a lot of complexity. One option would to pass a closure as a second argument to the coercion function. The function would pass the data (potentially transformed) to the closure to execute the default behavior. The closure would take the place of the index argument, which would no longer be needed. METADATA
|
pattern #2, delegation to the fallback, makes the core code very The downside in this situation is the legacy set of parameters that have Defining a closure as argument to the override must be coded in I don't feel like creating the closure in the user workspace is a On the other hand, I pass the index argument to the benefit of the If having the fallback as a closure would work, I'd propose 3 arguments, METADATA
|
Every argument adds complexity to the contract. I just noticed that the patch only accepts a single function for coercion. Shouldn't the user be able to specify a function for each column? It could recycle for convenience. Names on the list could correspond to column names. METADATA
|
My colleagues at MOTUS typically try to digest timeuse files in .csv with
Defining for each of their file a list of conversion per column is strict but error-prone, and higly subject to copy-paste problems. Having 1 function that probes 1 of 2 well-defined formats ressembles much more the current behaviour (ie. distinguishing numbers from factor etc) and contributes to the usability.
It felt logical with my C background. BTW, I looked at closures, but I did not immediately get how that would solve my problem avoiding global variables. Is there a good example that duckduckgo missed. METADATA
|
Created attachment 2420 [details] V4 of the patch replaces the index argument with a fallback closure. METADATA
INCLUDED PATCH
|
I'm having some doubts about the use case. Not sure how typical it is to have a large number of identically typed columns to convert. It might be better to write a reusable function (ideally in a package) for your colleagues that parses the relevant file format, converting after reading. METADATA
|
Thanks for taking time to review it. METADATA
|
I would like an enhancement to readtable that avoids the builtin type detection.
I think this is best done using a new parameter that points to a function that will be called for each column and produces the converted column.
I particularly want to use this to read .csv files with most columns datetime stamps in DD/MM/YYYY HH:MM format. This is not very exotic.
I see no real alternatives to this approach, only incomplete fragments.
The real, stupid alternative is to redefine read.table in my workspace.
I think this is not good practise.
METADATA
The text was updated successfully, but these errors were encountered: