ENH: optimized Groupby.diff()

#### Is your feature request related to a problem?

Doing groupby().diff() with a big dataset and many groups is quite slow. In this image, it is shown how in certain cases optimizing it with numba can get 1000x speed.

![image](https://user-images.githubusercontent.com/30231949/79698140-20296100-8287-11ea-9439-4afcf6923070.png)

#### Describe the solution you'd like

Now, my question is, can this be optimized in pandas?
I realise the case is somehow special, but i've had to work with small groups and I'm finding some speed issues.

#### API breaking implications

[this should provide a description of how this feature will affect the API]

#### Describe alternatives you've considered

[this should provide a description of any alternative solutions or features you've considered]

#### Additional context

Here's the python code in text format

```python
import numpy as np
import pandas as pd
from numba import njit

# create dataframe with many groups
GROUPS = 100000
SIZE = 1000000
df = pd.DataFrame()
df["groups"]=np.random.choice(np.arange(GROUPS), size=SIZE)
df["values"] = np.random.random(size=SIZE)
df.sort_values("groups", inplace=True)

diff_pandas = df.groupby("groups")["values"].diff().values

@njit
def group_diff(groups: np.array, values: np.array, lag: int) -> np.array:
    result_exp_mean = np.empty_like(values, dtype=np.float64)
    for i in range(values.shape[0]):
        if groups[i] == groups[i - lag]:
            result_exp_mean[i] = values[i] - values[i - lag]
        else:
            result_exp_mean[i] = np.nan
    return result_exp_mean

groups = df.groupby("groups").ngroup().values
values = df["values"].values
diff_numba = group_diff(groups, values, 1)

# check that it is equal
np.isclose(diff_pandas, diff_numba, equal_nan=True).all()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: optimized Groupby.diff() #33658

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: optimized Groupby.diff() #33658

Description

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions