Skip to content

Add trapz to DataArray for mathematical integration #1288

Closed
@jeffbparker

Description

@jeffbparker

Since scientific data is often an approximation to a continuous function, when we write mean() or sum(), our underlying intention is often to approximate an integral. For example, if we have temperature of a rod T(t, x) as a function of time and space, the average value Tavg(x) is the integral of T(t,x) with respect to x, divided by the length.

I would guess that in practice, many uses of mean() and sum() are intending to approximate integrals of continuous functions. That is typically my use, at least. But simply adding up all values is a Riemann sum approximation to an integral which is not very accurate.

For approximating an integral, it seems to me that the trapezoidal rule (trapz() in numpy) should be preferred to sum() or mean() in essentially all cases, as the trapezoidal rule is more accurate while still being efficient.

It would be very useful to have trapz() as a method of DataArrays, so one could write, e.g., for an average value, Tavg = T.trapz(dim='time') / totalTime. Currently, I would have to use numpy's method and then rebuild the reduced-dimensional array myself:

TavgVal= np.trapz(T, T['time'], axis=0) / totalTime
Tavg= xr.DataArray(TavgVal, coords=T['space'], dims='space')

It could even be useful to have a function like mean_trapz() that calculates the mean value based on trapz. More generally, one could imagine having other integration methods too. E.g., data.integrate(dim='x', method='simpson'). But trapz is probably good enough for many cases and a big improvement over mean, and trapz is very simple even for unequally spaced data. And trapz shouldn't be much less efficient in principle, although in practice I find np.trapz() to be several times slower than np.mean().

Quick examples demonstrating sum/mean vs. trapz to convince you of the superiority of trapz:

x = np.linspace(0, 2, 200)
y = 1/3 * x**3
dx = x[1] - x[0]
integralRiemann =  dx * np.sum(y)  # 1.3467673375251465
integralTrapz = np.trapz(y, x)  # 1.3333670025167712
integralExact = 4/3  # 1.3333333333333333

This second example demonstrates the special advantages of trapz() for periodic functions because the trapezoidal rule happens to be extremely accurate for periodic functions integrated over their period.

x = np.linspace(0, 2*np.pi, 200)
y = cos(x)**2
meanRiemann = np.mean(y)  #  0.50249999999999995
meanTrapz = np.trapz(y, x) / (2*np.pi)  # 0.5
meanExact = 1/2  # 0.5

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions