Skip to content

deterministic and related flags don't guarantee same result on different machines #6683

Open

Description

I'm using the LGBMRegressor as part of a Scikit-learn API. I'm having issues in that I have some models that give me different results when calling .predict() in my Docker environment on my local Mac machine and in the same Docker environment on an AWS EC2 instance. This is despite the model using deterministic=True, force_row_wise=True and num_threads=1.

First question. Is this expected that even with these flags set that results might be different on different machines? Under the deterministic section of the docs, I see the following bullet point:

when you use the different seeds, different LightGBM versions, the binaries compiled by different compilers, or in different systems, the results are expected to be different

This makes it seem like maybe this is expected behavior, although I might have hoped that running in a Docker environment would allow for reproducible behavior. The problem of course is that, as I'm creating tests for my code base, I can't guarantee that the tests will pass in CI/CD if they pass locally on my computer or elsewhere. If this expected behavior, how are people including LGBM code in their test suites which don't run on the same hardware?

If this is not expected behavior, is there a data or model setup that would maybe not be covered by the flags being set in this way? Prior to the LGBMRegressor, I have a data transformation pipeline that makes various data transformations. Purely by guessing and checking, I figured out that by removing a CyclicalFeatures (https://feature-engine.trainindata.com/en/1.7.x/api_doc/creation/CyclicalFeatures.html#feature_engine.creation.CyclicalFeatures) transformation on the pipeline gave me reproducible results between my local machine and the EC2 box. This transformation isn't doing anything stochastic, but it simply transforming a feature into sine and cosine representations. Is there a reason why mapping a feature to the -1 to 1 range would introduce a behavior that would be non-deterministic?

I have a minimal example which includes data, a saved pipeline, and a driver script. If useful, I could relabel the data to remove any sensitive information and provide it, provide a minimal working Docker environment, etc., but just wanted to ask the above questions first.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions