-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redesign ReadStatMeta
and add ReadStatColMeta
for DataAPI.jl v1.13
#6
Conversation
@nalimilan Regarding performance, as you have mentioned this several times. Do you think anyone will notice such delay if one has 10,000 column data frame to work with:
|
(note - we do not materialize it in DataFrames.jl unless needed so normally even this cost is not paid) |
An overhead of about 1ms sounds reasonable (and 10,000 columns is relatively large). But it would be worth benchmarking the PR to check this. |
Thanks for the great work on metadata! With the v1.12 release of Since the changes made here are already outdated, I am starting to rewrite the part on metadata. There is no good reason to keep the old design with |
@junyuan-chen thank you! Just please note that in JuliaData/DataAPI.jl#56 some minimal extensions to the metadata API are added (nothing breaking, just small changes that make using metadata more convenient; they should be merged probably in the coming week). Thank you! |
@bkamins Thanks for mentioning that! I will keep that in mind. |
Use the `metadata` function recently added to DataAPI to expose the contents of `ReadStatMeta`. This will allow e.g. `DataFrame` to import the metadata. The implementation is not super efficient as a new `Dict` is allocated for each column. If this turns out to be a problem it would be possible to create a special `AbstractDict` type that would expose the data without allocating, but this may be overkill. Also, if users mutate the returned metadata `Dict`s, the changes will have no effect on `ReadStatMeta`, except if they mutate the `Dicts` holding value labels.
f422864
to
3ec444b
Compare
Project.toml
Outdated
@@ -13,7 +13,7 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c" | |||
|
|||
[compat] | |||
CategoricalArrays = "0.10" | |||
DataAPI = "1.6" | |||
DataAPI = "1.11" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prepare this PR against DataAPI.jl 1.13 (it will be released this week)
src/table.jl
Outdated
@@ -123,6 +123,53 @@ Retrieve the metadata parsed from a data file. | |||
""" | |||
getmeta(tb::ReadStatTable) = getfield(tb, :meta) | |||
|
|||
hasmetadata(tb::ReadStatTable) = getmeta(tb) !== nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be metadatasupport
and it takes type as an argument.
src/table.jl
Outdated
hasmetadata(tb::ReadStatTable) = getmeta(tb) !== nothing | ||
hasmetadata(tb::ReadStatTable, col::Symbol) = getmeta(tb) !== nothing && haskey(tb, col) | ||
|
||
function metadata(tb::ReadStatTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also style
kwarg should be supported.
src/table.jl
Outdated
return ret | ||
end | ||
|
||
function metadata(tb::ReadStatTable, col::Symbol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be colmetadata
@nalimilan - do you want me to update this PR after DataAPI.jl 1.13 is out, or you want to do it (to practice the API 😄)? |
@bkamins Thanks for looking into that! But, I forgot to mention that the one I pushed yesterday is still the old one written back in May. I have started rewriting it for the |
Codecov Report
@@ Coverage Diff @@
## main #6 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 5 5
Lines 237 300 +63
=========================================
+ Hits 237 300 +63
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
My attempt on the rewrite is now complete (except for documentation pages). Here is a summary of main changes:
|
DataAPI.metadata
ReadStatMeta
and add ReadStatColMeta
for DataAPI.jl v1.13
Sorry, I had missed this. Thanks for finishing it! It would make sense to mark labels with the |
This is what RData.jl also does - right? |
@nalimilan @bkamins Sure. Let me do this change for the next release. |
Use the
metadata
function that is going to be added to DataAPI (JuliaData/DataFrames.jl#3055) to expose the contents ofReadStatMeta
. This will allow e.g.DataFrame
to import the metadata.The implementation is not super efficient as a new
Dict
is allocated for each column. If this turns out to be a problem it would be possible to create a specialAbstractDict
type that would expose the data without allocating, but this may be overkill. Also, if users mutate the returned metadataDict
s, the changes will have no effect onReadStatMeta
, except if they mutate theDict
s holding value labels. We could copy theseDict
s for safety, as mutating them for one variable will affect all other variables that use the same value labels.Cc: @bkamins