This repository was archived by the owner on Feb 2, 2024. It is now read-only.
This repository was archived by the owner on Feb 2, 2024. It is now read-only.
[BUG] python and sdc-compiled functions generate different output with same input #996
Closed
Description
Reporting a bug
In [25]: num_columns = 20
...: features = [f'col{i}' for i in range(num_columns)]
...: df = pd.DataFrame(np.random.rand(5, num_columns), columns=features)
...: target_col = 'col0'
In [26]: df
Out[26]:
col0 col1 col2 col3 col4 col5 ...
0 0.847436 0.116855 0.782481 0.485027 0.027340 0.328801 ...
1 0.482504 0.845380 0.753603 0.535273 0.243581 0.861275 ...
2 0.190646 0.539439 0.901377 0.770925 0.908361 0.454777 ...
3 0.355888 0.451189 0.672876 0.745438 0.576982 0.907190 ...
4 0.535901 0.394481 0.118837 0.199040 0.557401 0.653302 ...
[5 rows x 20 columns]
In [27]: def _modified_pipeline(df, target_col):
...: samples = df[df['col1'] >= 0.2]
...: p_sum = (samples[target_col] >= 0.5).sum()
...: r_sum = (samples[target_col] <= 0.5).sum()
...: cnt = len(samples)
...: return p_sum, r_sum, cnt
...:
In [28]: from numba import njit
...: @njit
...: def jit_modified_pipeline(df, target_col):
...: samples = df[df['col1'] >= 0.2]
...: p_sum = (samples[target_col] >= 0.5).sum()
...: r_sum = (samples[target_col] <= 0.5).sum()
...: cnt = len(samples)
...: return p_sum, r_sum, cnt
...:
In [29]: _modified_pipeline(df, target_col)
Out[29]: (1, 3, 4)
In [30]: jit_modified_pipeline(df, target_col)
<ipython-input-28-bbc0261853d0>:5: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
.....
Out[30]: (1, 2, 3)
As you can see, python and sdc obtain different outputs with the same inputs.
Python 3.7.9 & numba 0.52.0 & sdc 0.38.0 & pandas 1.2.0