Description
In theory, the seed set in MLContext
is intended to provide the global seed for all components and operations requiring randomness, e.g. sampling, permutation, etc. In practice, this doesn't always hold true.
TrainTestSplit
, CrossValidationSplit
, and CrossValidate
all have a user specified seed and call EnsureGroupPreservationColumn
, which in turn uses GenerateNumberTransform
and HashingEstimator
.
When the seed is not specified by the user, it is not derived from MLContext
. Instead, GenerateNumberTransform
and HashingEstimator
use their own defaults, so that if a user doesn't specify a seed to TrainTestSplit
, CrossValidationSplit
, or CrossValidate
, they will always get a deterministic split regardless of the seed in MLContext
.
machinelearning/src/Microsoft.ML.Data/DataLoadSave/DataOperationsCatalog.cs
Lines 496 to 505 in 24c8274
machinelearning/src/Microsoft.ML.Data/DataLoadSave/DataOperationsCatalog.cs
Lines 521 to 525 in 24c8274