-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrete transform #70
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that addition! I really think this is a good extension to the binary transformation.
I have some issues with the standard handling of the input and the restrictions to the given values:
- why is it necessary for the thresholds to be related to the given values?
- I think a discrete transformation could also be called with a single number, that says how many values there should be in the end. Like: divide it in 5 values, following the original field.
- When giving values and thresholds, I would just check, if
len(values) == len(thresholds)+1
and not that the corresponding thresholds are between the current values, because in the end the given values could be chosen independent of the given field.
What do you think?
srf = gs.SRF(model, seed=20170519) | ||
srf.structured([x, y]) | ||
# create 5 equidistanly spaced values | ||
discrete_values = np.linspace(np.min(srf.field), np.max(srf.field), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values are equidistantly, yes. But they don't need to be equally distributed in the end. I think, this could be misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean it's misleading for this specific example? Would " create 5 eq. spaced values for this example" be better?! Or do you prefer an example with non-eq. spaced values?
else: | ||
if thresholds is None: | ||
# just in case, sort the values | ||
values = np.sort(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if I want the values to BE unsorted? Why do the given values need to be in relation to the values of the given field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, at least for the way I implemented the transformation, the values have to be monotonically increasing. I'm also not sure of how to interpret non monotonically increasing values.
gstools/transform/field.py
Outdated
values = np.array(values) | ||
thresholds = np.array(thresholds) | ||
for i in range(len(thresholds)): | ||
if not (values[i] <= thresholds[i] < values[i + 1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How else would you define an array of thresholds, which subdivides an array of values? - It's a bit like the nodes and the edges of a graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binary
transformation does the following:
- get a
divide
value to select "lower" and "upper" values - replace the lower and upper values with given values
The divide values is unrelated to the given lower and upper values, which will be set in the field. It is only related to the input-field
if thresholds is None: | ||
# just in case, sort the values | ||
values = np.sort(values) | ||
thresholds = (values[1:] + values[:-1]) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the natural way would be to us thresholds, so that the resulting ratios between the given values are even. Maybe I don't get the aim of this transformation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also not exactly sure how to handle the kind of input arguments the best.
But at least in my use case I'm just interested in the number of "value classes". The non-arithmetic mean thresholds are just a little generalisation, which was easy to implement.
If the function would just take an array of thresholds, what values would you assign to all the field values lying in between two thresholds?
Thank's for the input. I think I've addressed 1) with my other comments. At first I also wanted to simply take an integer, but by taking an array of values, it becomes so much more flexible and I simply hope, that functions like But if the values are not related to the field anymore, then this should not be a field transformation, but a new kind of field generation. If you want to extend the value range of the field, it could simply be multiplied by a constant factor before transforming it. |
Now, arbitrary values can be assigned to the value classes which are separated by the thresholds.
Wow, yesterday I had a pretty big brain fart... Ready for merging? |
No problem :-) Only thing I would argue, is the standard calculation of the threshold. I would propose the following way: from scipy.special import erfinv
n = len(values)
p = np.arange(1, n) / n # n-1 equal subdivisions of [0, 1]
# use quantile of the normal distribution to get equal portions for each value
thresholds = fld.mean + fld.model.sill * np.sqrt(2) * erfinv(2 * p - 1) Then we also don't need to sort the values. |
Or we provide both ways with a switch. Then we could both be happy |
I get your point in wanting to relate the thresholds to a Gaussian process. But do you think there are application cases for discrete distributions where one would want that? At least in my use case, I'd only want to have the arithmetic mean of the given values as a threshold. Do you have an idea of how to concisely describe the difference of the two threshold calculation methods? |
For example to get a facies distribution, where each facies has a number and they should be equally distributed and you want them to be next to each other in the order you have given them.
Then the switch could state that. like
One connects the given values to the field and does sth. like a nearest neighbor interpolation (dividing areas by the arithmetic mean of the given values). The other chunks the given field in equal portions and equips each area with a corresponding given value. EDITOr we could use the |
If we use the if thresholds == "arithmetic":
# just in case, sort the values
values = np.sort(values)
thresholds = (values[1:] + values[:-1]) / 2
elif thresholds == "equal":
n = len(values)
p = np.arange(1, n) / n # n-1 equal subdivisions of [0, 1]
# use quantile of the normal distribution to get equal portions for each value
thresholds = fld.mean + fld.model.sill * np.sqrt(2) * erfinv(2 * p - 1)
else:
if len(values) != len(thresholds) + 1:
raise ValueError(
"discrete transformation: len(values) != len(thresholds) + 1"
)
values = np.array(values)
thresholds = np.array(thresholds, dtype=float) EDITI made a mistake in the calculation, the term |
Okay, I've added the possibility to add more automatic threshold calculations, but your suggestion is still pretty buggy. I'll have a look at it next week. |
I just found a bug in the Zinn&Harvey transformation, where I had the same fallacy. \sigma is not the variance but the standard deviation. That's why we have to use the root. |
…e; th check in discrete; call discrete from binary; zinnharvey bugfix
Damn, such a minor thing I added to GSTools and so much worries with it :-) |
I added a transform function to the collection, which takes values and optionally thresholds and applies these to an existing field, in order to create a field made up of a number of discrete values.