-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benjamin's Initial Evaluation Brain Dump #133
Comments
@nutterb, sorry for not responding to your thoughtful post until now. I've read it several times since September. In short, I like your ideas and think they're all worth discussing. Some of these are large/mature/encapsulated enough they're spun off into entire issues. Some specific reactions. New Functions
New Features
Code ImprovementsAs long as the test suite has appropriate coverage of these areas, and doesn't break, I'm all for it. I had no idea I think the lowest hanging fruit could still be associated with structuring network calls so they're less chatty & wasteful. But I don't have any ideas beyond EAV; and even that hits a limit and needs batching. Other NotesI agree with your two points. So far the EAV playground has been beating the flat export almost every time, even with non-sparse data. But I'd love to know more and describe the performance envelope to users. Thanks again for all the thought you put into this, @nutterb. |
I think this is something that the A crude, silly example is below. A less trivial example is the interaction between the sprinkle_bg function and the index_to_sprinkle
The But as you mentioned, these kinds of optimizations are not going to make a big difference to the user. optimizing the calls to the API will help a lot, as will optimizing the construction of data sets from batches. This is just one of those things that I change when I see them. passing error messages with
|
I apologize for the length but I wanted to make this available for review and discussion. I thought it would be good to get some feedback on ideas before I do any code work. i can break these up into separate issues when there's a specific aspect needing focus.
Also, be honest and direct in your responses. Part of this process will be transitioning me away from my
redcapAPI
mindset to theREDCapR
mindset. I promise not to have hurt feelings if you don't want to incorporate anything in this list.Observed Structure
Stand-alone functions
checkbox_choices
regex_named_captures
redcap_column_sanitize
replace_nas_with_explicit
The API Calls
This is the code I used to plot out how the existing API calls interact with each other. The blue boxes are my proposals for new functions. A child node indicates a function that is called within its parent.
*Less complex Systems *
populate_project_simple
callsredcap_project
retrieve_token_mssql
callsretrieve_credential_mssql
retrieve_credential_local
validate_no_logical
andvalidate_no_uppercase
callvalidate_for_write
New functions
sanitize_token
As of 6.9.5,
It wouldn't be hard to write a function to sanitize the token. It would
look something like:
This has the same effect as the code introduce through Issue #103
(#103), and has the added
benefit of providing consistency of tokens into earlier version of
REDCap.
api_call
It might seem silly, but if the
httr
interface ever undergoes achange, or if there's ever a decision to change the package making the
web call, putting all of the
POST
calls into one function means onlyhave to change that one function, instead of finding all of the calls in
the package.
redcap_read_batch
Would it simplify the interface for the user to have
redcap_read
bethe driver for exporting data, and to make it a wrapper for
redcap_read_oneshot
and a new functionredcap_read_batch
?redcapAPI::exportRecords
usesbatch.size = -1
to indicate reading inone shot. Something similar could be done to direct traffic between the
subfunctions without affecting backward compatibility (though
redcap_read_oneshot
would need to continue to be exported.)redcap_write_batch
Similar concept to
redcap_read_batch
. Just make it one function to call for the user.New Features
fields to factors, Date, or POSIXct objects, as appropriate. I
propose a logical
format_data
argument toredcap_read
thatformats the data (similar to how
redcapAPI
formats) whenTRUE
. Anew function
format_data
would be the engine to do this work.redcap_read
andredcap_write
.redcapAPI
has ameta_data = NULL
argument thatallows the user to pass in the meta data data.frame if it exists. If
given, there is no need to download the meta data again. If
NULL
,the meta data is downloaded. Not strictly necessary, but can save a
bit of time. I'd consider this pretty low priority, because it is a
feature that can be added without affect back compatibility.
table, the arms table, the mappings table. I don't know that these
get used much, but they can be handy for formatting event names
and such.
mixed feelings about this. I put a lot of work into doing this in
redcapAPI
, but the REDCap validations return some pretty goodmessages as well.
Code Improvements
Sections of code like
can be optimized with
It isn't a big deal, (see benchmarking below, but as a package matures,
I think it's a good thing to incorporate some efficiency gains)
Other Notes
RODBCext
to manage credentials, it wouldn't behard to write up instructions for acessing credentials from any
SQL database. Users would just need to know their driver. We could
probably make an adaptation that pulls from a CSV using the
sqldf
package. It would make a consistent style of management thatwould encourage best practices, even for users who don't have the
benefit of SQL databases.
eav
of alarge data file is faster than the
csv
export. How large does adatabase have to be with the
eav
format before the server timesout? If we find the
eav
to be faster, perhaps we could up thebatch number from 100 to something larger (fewer calls to the API
would mean less wait time for the user, but I suspect you would want
to continue batching to make the server available to other users.
Upping the default batch number seems like a good compromise.)
I should also add that at one point I had compiled a list of the API changes announced on the group. I was making notes on what changes I might incorporate into
redcapAPI
and how those changes could affect the R package. The link to the spreadsheet ishttps://docs.google.com/spreadsheets/d/1NMdpb-k5nvrVxF0gvnpIfQyP4vZIpQgUxoLUKJuv2L0/edit?usp=sharing
The text was updated successfully, but these errors were encountered: