Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

[BUG] python and sdc-compiled functions generate different output with same input #996

Closed
@dlee992

Description

@dlee992

Reporting a bug

In [25]: num_columns = 20
    ...: features = [f'col{i}' for i in range(num_columns)]
    ...: df = pd.DataFrame(np.random.rand(5, num_columns), columns=features)
    ...: target_col = 'col0'

In [26]: df
Out[26]:
       col0      col1      col2      col3      col4      col5  ...    
0  0.847436  0.116855  0.782481  0.485027  0.027340  0.328801  ...  
1  0.482504  0.845380  0.753603  0.535273  0.243581  0.861275  ...  
2  0.190646  0.539439  0.901377  0.770925  0.908361  0.454777  ...  
3  0.355888  0.451189  0.672876  0.745438  0.576982  0.907190  ...  
4  0.535901  0.394481  0.118837  0.199040  0.557401  0.653302  ...  

[5 rows x 20 columns]

In [27]: def _modified_pipeline(df, target_col):
    ...:     samples = df[df['col1'] >= 0.2]
    ...:     p_sum = (samples[target_col] >= 0.5).sum()
    ...:     r_sum = (samples[target_col] <= 0.5).sum()
    ...:     cnt = len(samples)
    ...:     return p_sum, r_sum, cnt
    ...:

In [28]: from numba import njit
    ...: @njit
    ...: def jit_modified_pipeline(df, target_col):
    ...:     samples = df[df['col1'] >= 0.2]
    ...:     p_sum = (samples[target_col] >= 0.5).sum()
    ...:     r_sum = (samples[target_col] <= 0.5).sum()
    ...:     cnt = len(samples)
    ...:     return p_sum, r_sum, cnt
    ...:

In [29]: _modified_pipeline(df, target_col)
Out[29]: (1, 3, 4)

In [30]: jit_modified_pipeline(df, target_col)
<ipython-input-28-bbc0261853d0>:5: NumbaPerformanceWarning:
The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.
.....
Out[30]: (1, 2, 3)

As you can see, python and sdc obtain different outputs with the same inputs.

Python 3.7.9 & numba 0.52.0 & sdc 0.38.0 & pandas 1.2.0

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions