Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contains for categorical Dimensions should call Base.contains on the Strings #532

Closed
felixcremer opened this issue Sep 6, 2023 · 3 comments · Fixed by #534
Closed

Contains for categorical Dimensions should call Base.contains on the Strings #532

felixcremer opened this issue Sep 6, 2023 · 3 comments · Fixed by #534

Comments

@felixcremer
Copy link
Contributor

I am surprised, that the Contains selector for Categorical values does call At and does not call the contains function.
My use case is, that I have a long Variable dimensions with many different named variables for which some are similar or following a certain naming pattern.
And then I want to select a certain variable group like in this shortened example.
I would expect that the Contains(value) selector behaves as calling Where(contains(value))
I can open a PR with these changes, but I might need some help in making the Vector case work.

julia> arr = DimArray(rand(10,10,4), (X(1:10), Y(1:10), Dim{:Variable}(["root_moisture", "soil_moisture", "air_temperature", "something"])))
10×10×4 DimArray{Float64,3} with dimensions: 
  X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Dim{:Variable} Categorical{String} String["root_moisture", "soil_moisture", "air_temperature", "something"] Unordered
[:, :, 1]
     1          2          3         4         5         6         7           8          9         10
  1  0.0451999  0.721679   0.472552  0.172361  0.838639  0.748815  0.00979697  0.0228791  0.312279   0.254207
  ⋮                                            ⋮                                                     ⋮
 10  0.0962899  0.0916193  0.856692  0.725752  0.530497  0.891864  0.307378    0.40408    0.429365   0.0391044
[and 3 more slices...]
julia> arr[Variable=Where(contains("moisture"))]
10×10×2 DimArray{Float64,3} with dimensions: 
  X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Y Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  Dim{:Variable} Categorical{String} String["root_moisture", "soil_moisture"] Unordered
[:, :, 1]
     1          2          3         4         5         6         7           8          9         10
  1  0.0451999  0.721679   0.472552  0.172361  0.838639  0.748815  0.00979697  0.0228791  0.312279   0.254207
  ⋮                                            ⋮                                                     ⋮
 10  0.0962899  0.0916193  0.856692  0.725752  0.530497  0.891864  0.307378    0.40408    0.429365   0.0391044
[and 1 more slices...]
julia> arr[Variable=Contains("moisture")] # This fails
ERROR: ArgumentError: moisture not found in ["root_moisture", "soil_moisture", "air_temperature", "something"]
Stacktrace:
@rafaqz
Copy link
Owner

rafaqz commented Sep 6, 2023

Contains means an interval containing a point. Having this run contains on strings could be confusing? Probably just using Where is better

@felixcremer
Copy link
Contributor Author

Then the Docstring should state, that one should use Where for this case and should not silently try to use At and rather throw an informative error message.

I think, that the overlap with the Intervals is not so bad because Categorical and Intervals doing different things is expected and Contains has a clear meaning for Strings.

@rafaqz
Copy link
Owner

rafaqz commented Sep 6, 2023

Let make the docstring clearer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants