Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vario_estimate_structured returns only nan if nan points in the field #83

Closed
TobiasGlaubach opened this issue Apr 22, 2020 · 14 comments · Fixed by #106
Closed

vario_estimate_structured returns only nan if nan points in the field #83

TobiasGlaubach opened this issue Apr 22, 2020 · 14 comments · Fixed by #106
Assignees
Labels
Documentation enhancement New feature or request question Further information is requested
Milestone

Comments

@TobiasGlaubach
Copy link
Contributor

TobiasGlaubach commented Apr 22, 2020

Hi,
not sure if this is intended behaviour, I was trying to exclude invalid points from a structured grid by setting them to np.nan before applying vario_estimate_structured the result was purely np.nan.

minimal code in python:

import numpy as np
import gstools as gs
import matplotlib.pyplot as plt
from scipy import ndimage, misc

x = y = np.arange(100)
model = gs.Exponential(dim=2, var=1, len_scale=[12.0, 3.0], angles=np.pi / 8)
srf = gs.SRF(model, seed=20170519)
srf.structured([x, y])
val = srf.field
val[0,0] = np.nan
gamma_x = gs.vario_estimate_structured(val, direction="x")
print(gamma_x)

Thanks and cheers,
Tobi

@LSchueler
Copy link
Member

It's definitely not intended behaviour. Even if it was, it should be documented.
I'll see how to fix it. Thanks for reporting!

Here is the way I intended to cope with missing / invalid data:

import numpy as np
import gstools as gs
import matplotlib.pyplot as plt
from scipy import ndimage, misc

x = y = np.arange(100)
model = gs.Exponential(dim=2, var=1, len_scale=[12.0, 3.0], angles=np.pi / 8)
srf = gs.SRF(model, seed=20170519)
srf.structured([x, y])
val = srf.field
val_mask = np.zeros_like(val)
# mark invalid points
val_mask[0,0] = 1
val = np.ma.masked_array(val, mask=val_mask)
gamma_x = gs.vario_estimate_structured(val, direction="x")
print(gamma_x)

@TobiasGlaubach
Copy link
Contributor Author

Worked like a charm! Thanks for the workaround!

While we are at it. Does GSTools have any functionality for estimating the rotation of a variogram? Currently I am rotating the grid before calcuation of the variogram.

from scipy import ndimage
val_rot = ndimage.rotate(val, 71, reshape=True)
val_masked = np.ma.masked_array(val_rot, mask=val_rot == 0.0)
gs.vario_estimate_structured(val_masked , direction="x")

And then explore fits on the variograms calculated from those rotations.

@LSchueler
Copy link
Member

What exactly do you mean by "estimating the rotation of a variogram"?
It seems that you're not interested in creating spatial random fields, otherwise have a look at this tutorial

Would you like to estimate the rotation angle?

@TobiasGlaubach
Copy link
Contributor Author

Say all you get is field measured data (not artifical created fields) the angle of the two main axes within this field is unknown. The only things available are the (X,Y) and the associated F matrices measured from a unknown field f(x, y).

I am looking for an automated way of finding the angles of the variogram associated with the unknown field f(x, y).

currently I am rotating the whole measurement data and calcuate the variogram for those rotated fields, in order to find a well matching variogram (through checking the residuals of the fit).


To put it in an example, say the real field (unknown but measured) would be something like the example from your documentation:

x = y = np.arange(100)
model = gs.Exponential(dim=2, var=1, len_scale=[12.0, 3.0], angles=np.pi / 8)
srf = gs.SRF(model, seed=20170519)
srf.structured([x, y])

Does GSTools have some functionality to estimate 'model' including angles and anisothophy?

Currently I am doing this:

import numpy as np
import gstools as gs
import matplotlib.pyplot as plt
from scipy import ndimage

val_rot = ndimage.rotate(val, rot_alpha, reshape=True)
val_masked = np.ma.masked_array(val_rot, mask=val_rot == 0.0)

gamma_x = gs.vario_estimate_structured(val_masked, direction="x")
gamma_y = gs.vario_estimate_structured(val_masked, direction="y")

nmax = 40
dgrid = np.arange(nmax)
fit_model_x = gs.Exponential(dim=2)
fit_model_y = gs.Exponential(dim=2)

fit_mdl_x = fit_model_x.fit_variogram(dgrid, gamma_x[:40], nugget=False)
fit_mdl_y = fit_model_y.fit_variogram(dgrid, gamma_y[:40], nugget=False)

var_dx_estim = fit_model_x.variogram(dgrid)
var_dx_calc = gamma_x[:nmax]

var_dy_estim = fit_model_y.variogram(dgrid)
var_dy_calc = gamma_y[:nmax]

len_scale = fit_model_y.len_scale
anis = fit_model_x.len_scale / fit_model_y.len_scale

which yields OK results when used in a minimization scheme for the angle alpha:

Best fit:
   orientation: 68.36877873481575 degree CW
   len_scale:   12.856
   anisotrophy: 0.203 (len_scale_x: 2.616, len_scale_y: 12.856)

image

I was wondering if GSTools has a feature I missed which is capeable to do this in a smarter way.

@LSchueler
Copy link
Member

For the moment, GSTools only supports "directional" variograms for structured grides, as they naturally give preferential directions along the axes.
Recently we have been discussing a possibility of introducing directional variograms for unstructured grids.

By using unstructured grids, we wouldn't need to interpolate the field, as you are doing right now with the ndimage.rotate.
Then, with your field being defined on an unstructured grid, you could choose a circle (or spherical) segment by giving a direction and the segment angle. In this segment, the variogram is then estimated. This also leans itself more towards more sparse field measurements, as the segment can be widened for less data.

We have actually also been discussing about the possibility to include the direction of the segment (which is the "angle" your interested in) into an optimisation procedure. Which would be pretty easy to implement, once the directional estimation on unstructured grids is established.

Are you interested in contributing to this and do you have spare time for it? - We are always grateful for helping hands!

@TobiasGlaubach
Copy link
Contributor Author

TobiasGlaubach commented Apr 24, 2020

I will have to deal with unstructured data as well soonish. I checked your code for unstructured(...) and saw, that it links to a C function. Not sure if thats through Cython or if you actually programmed the function in C since I quickly stopped digging deeper after some quick glances.

I am happy to help, as I will have to work on it anyways. My idea was to just rotate the input data using a rotation matrix instead of programming rotated search masks within the unstructured(...) implementation. It might (not sure though) be slower computational wise, but for me easier to implement.

If adjusting the code in C is necessary I will unfortunatelly probably not have the time to set up my environment for C programing and get back into C programming. I would very much prefer to stay on a python level.


EDIT:

While writing this reply I was browsing through your repository source files instead of the compiled package files (as did before) and I saw that unstructured actually leads to estimator.pyx which looks much more manageable than the pure underlaying C code :-D


Ok I am happy to help. Give me a day or so to check what you actually implemented. Where would you recommend on going from here.

@TobiasGlaubach
Copy link
Contributor Author

Regarding Optimization I already used aforementioned code within the minimize_scalar function. Which is admittedly quite crude, but yielded OK results for the simple test cases.

My code:

def calc_residuals(rot_alpha):
    val_rot = ndimage.rotate(val, rot_alpha, reshape=True)
    val_masked = np.ma.masked_array(val_rot, mask=val_rot == 0.0)

    gamma_x = gs.vario_estimate_structured(val_masked, direction="x")
    gamma_y = gs.vario_estimate_structured(val_masked, direction="y")

    nmax = 40
    dgrid = np.arange(nmax)
    fit_model_x = gs.Exponential(dim=2)
    fit_model_y = gs.Exponential(dim=2)

    fit_mdl_x = fit_model_x.fit_variogram(dgrid, gamma_x[:40], nugget=False)
    fit_mdl_y = fit_model_y.fit_variogram(dgrid, gamma_y[:40], nugget=False)

    var_dx_estim = fit_model_x.variogram(dgrid)
    var_dx_calc = gamma_x[:nmax]

    var_dy_estim = fit_model_y.variogram(dgrid)
    var_dy_calc = gamma_y[:nmax]

    lstsq_x = np.linalg.norm(var_dx_estim - var_dx_calc)
    lstsq_y = np.linalg.norm(var_dy_estim - var_dy_calc)

    lstsq = theta * lstsq_x + (1-theta) * lstsq_y
    
    return lstsq

from scipy.optimize import minimize_scalar
res = minimize_scalar(calc_residuals, bounds=(0, 90), method='bounded')
res

@TobiasGlaubach
Copy link
Contributor Author

TobiasGlaubach commented Apr 27, 2020

Ok I checked your code for the unstructured(...) implementation, and I see two ways to go from here:

A Working directly with search masks in the unstructured(...) function in Cython

  • add the arguments angles and tolerance to the function unstructured(...)
  • Adjust the implementation of the distance(x,y,z,k,j) function in Cython for 2 and 3 dimensions to also return the angle (for 2D) and angles (for 3D)
  • Adjust the masking done in line 162 to also compare if the calculated angle aligns with the requested angle (within tolerance).
  • Adjust set all variogram values to NaN where count is 0, indicating that there is no data at this distance (or leave it at zero if NaN is not preferred)

B Wrap a rotation and masking around the unstructured(...) function

  • before the call to the unstructured(...) function rotate the x,y,z data by the given angles, so that the requested major axis align with the carthesian coordinate system (can be done effectively using scipy.spatial.transform).
  • Mask the field data into N slices along the major axis, and call unstructured(...) independently on all of those.
  • Average the results of all slices (excluding the bins without data)
  • return the averaged results

A would be computationally less expensive I guess and the more proper implementation. *B would be computationally more expensive but probably easier to implement.

Some quick and dirty tests on my system showed OK results for approach B.
However I think approach A is definitelly the better way to go. @LSchueler What do you think?

@TobiasGlaubach
Copy link
Contributor Author

I implemented a prototype in the Cython unstructured(...) function on my local branch. File see here. The code compiles into a C++ function, when I use a minimal setup script. However I am running into problems importing the function and testing it. I can prepare some unit tests in here but any help regarding a testing environment would be highly appreciated.

My minimal setup.py:

from setuptools import setup, find_packages, Distribution, Extension
from Cython.Build import cythonize
import numpy as np

FLAGS = []

CY_MODULES = []
CY_MODULES.append(
    Extension(
        "estimator",
        ["estimator.pyx"],
        language="c++",
        include_dirs=[np.get_include()],
        extra_compile_args=FLAGS,
        extra_link_args=FLAGS,
    )
)

EXT_MODULES = cythonize(CY_MODULES)  # annotate=True

@LSchueler
Copy link
Member

Sorry for taking some time to answer! - I'm pretty busy with another project at the moment.

First things first, these are the steps I do when I want to test Cython stuff locally. Everything in the main directory, i.e. gstools/.

Building the Cython code:
python setup.py build_ext --inplace
or with OpenMP support
python setup.py --openmp build_ext --inplace

Then you can perform the tests with
python tests/test_mytestcase.py
also single tests like this
python tests/test_mycase.py MyCase.first_test

In order to get a feeling for how well the Cython code translates to pure C++ code, you can add following to line 182 in the setup.py:

EXT_MODULES = cythonize(CY_MODULES, annotate=True)

This generates html-files where the lines get yellower the more Python code is needed, which makes things slow.

@LSchueler
Copy link
Member

"since I quickly stopped digging deeper after some quick glances." That's a very wise thing to do! :-D

@MuellerSeb
Copy link
Member

MuellerSeb commented Apr 28, 2020

Hey there! I'll jump in into the conversation.
First of all @TobiasGlaubach thanks for your enthusiasm and the effort! Directional variogram estimation is a long time goal of GSTools and it would be fantastic to finally include this.

I already came up with an proposal for the estimation of rotation and anisotropy in another issue: #55 (comment)

I just made a well-educated guess, that one could estimate rotation and anisotropy by minimizing the variances within the lags of the variogram-cloud, by derotating and restretching the field to an isotrop one.

I think this approach is similar to your one, but the advantage is, that always the whole field is used to estimate.

The optimization question is: How to de-rotate and re-stretch a field, so that the estimated isotrop variogram is best fitting?

I'll try to come up with a minimal working example to have a look at.

What do you think of that?

Cheers, Sebastian

@TobiasGlaubach
Copy link
Contributor Author

@MuellerSeb Sounds good. Meanwhile I will implement your equations from #55 while I am working on unstructured, so we can use this as a basis.
@LSchueler Thanks for the help I will probably implement and hopefully run some tests later this week. I will keep you guys posted.

@MuellerSeb
Copy link
Member

All implemented with #106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants