Skip to content

Releases: dotnet/machinelearning

ML.NET 1.5.2

14 Sep 22:20
5929292
Compare
Choose a tag to compare

New Features

  • New API and algorithms for time series data. In this release ML.NET introduces new capabilities for working with time series data.
    • Detecting seasonality in time series (#5231)
    • Removing seasonality from time series prior to anomaly detection (#5202)
    • Threshold for root cause analysis (#5218)
    • RCA for anomaly detection can now return multiple dimensions(#5236)
  • Ranking experiments in AutoML.NET API. ML.NET now adds support for automating ranking experiments. (#5150, #5246) Corresponding support will soon be added to Model Builder in Visual Studio.
  • Cross validation support in ranking (#5263)
  • CountTargetEncodingEstimator. This transforms a categorical column into a set of features that includes the count of each label class, the log-odds for each label class and the back-off indicator (#4514)

Enhancements

  • Onnx Enhancements
    • Support more types for ONNX export of HashEstimator (#5104)
    • Added ONNX export support for NaiveCalibrator (#5289)
    • Added ONNX export support for StopWordsRemovingEstimator and CustomStopWordsRemovingEstimator (#5279)
    • Support onnx export with previous OpSet version (#5176)
    • Added a sample for Onnx conversion (#5195)
  • New features in old transformers
    • Robust Scaler now added to the Normalizer catalog (#5166)
    • ReplaceMissingValues now supports Mode as a replacement method. (#5205)
    • Added in standard conversions to convert types to string (#5106)
  • Output topic summary to model file for LDATransformer (#5260)
  • Use Channel Instead of BufferBlock (#5123, #5313). (Thanks @jwood803)
  • Support specifying command timeout while using the database loader (#5288)
  • Added cross entropy support to validation training, edited metric reporting (#5255)
  • Allow TextLoader to load empty float/double fields as NaN instead of 0 (#5198)

Bug Fixes

  • Changed default value of RowGroupColumnName from null to GroupId (#5290)
  • Updated AveragedPerceptron default iterations from 1 to 10 (#5258)
  • Properly normalize column names in Utils.GetSampleData() for duplicate cases (#5280)
  • Add two-variable scenario in Tensor shape inference for TensorflowTransform (#5257)
  • Fixed score column name and order bugs in CalibratorTransformer (#5261)
  • Fix for conditional error in root cause analysis additions (#5269)
  • Ensured Sanitized Column Names are Unique in AutoML CLI (#5177)
  • Ensure that the graph is set to be the current graph when scoring with multiple models (#5149)
  • Uniform onnx conversion method when using non-default column names (#5146)
  • Fixed multiple issues related to splitting data. (#5227)
  • Changed default NGram length from 1 to 2. (#5248)
  • Improve exception msg by adding column name (#5232)
  • Use model schema type instead of class definition schema (#5228)
  • Use GetRandomFileName when creating random temp folder to avoid conflict (#5229)
  • Filter anomalies according to boundaries under AnomalyAndMargin mode (#5212)
  • Improve error message when defining custom type for variables (#5114)
  • Fixed OnnxTransformer output column mapping. (#5192)
  • Fixed version format of built packages (#5197)
  • Improvements to "Invalid TValue" error message (#5189)
  • Added IDisposable to OnnxTransformer and fixed memory leaks (#5348)
  • Fixes #4392. Added AddPredictionEnginePool overload for implementation factory (#4393)
  • Updated codegen to make it work with mlnet 1.5 (#5173)
  • Updated codegen to support object detection scenario. (#5216)
  • Fix issue #5350, check file lock before reload model (#5351)
  • Improve handling of infinity values in AutoML.NET when calculating average CV metrics (#5345)
  • Throw when PCA generates invalid eigenvectors (#5349)
  • RobustScalingNormalizer entrypoint added (#5310)
  • Replace whitelist terminology to allow list (#5328) (Thanks @LetticiaNicoli)
  • Fixes (#5352) issues caused by equality with non-string values for root cause localization (#5354)
  • Added catch in R^2 calculation for case with few samples (#5319)
  • Added support for RankingMetrics with CrossValSummaryRunner (#5386)

Test updates

  • Refactor of OnnxConversionTests.cs (#5185)
  • New code coverage (#5169)
  • Test fix using breastcancel dataset and test cleanup (#5292)

Documentation Updates

  • Updated ORT version info for OnnxScoringEstimator (#5175)
  • Updated OnnxTransformer docs (#5296)
  • Improve VectorTypeAttribute(dims) docs (#5301)

Breaking Changes

  • None

ML.NET 1.5.0

26 May 23:58
121a999
Compare
Choose a tag to compare

New Features

  • New anomaly detection algorithm (#5135). ML.NET has previously supported anomaly detection through DetectAnomalyBySrCnn. This function operates in a streaming manner by computing anomalies around each arriving point and examining a window around it. Now we introduce a new function DetectEntireAnomalyBySrCnn that computes anomalies by considering the entire dataset and also supports the ability to set sensitivity and output margin.
  • Root Cause Detection (#4925) ML.NET now also supports root cause detection for anomalies detected in time series data.

Enhancements

  • Updates to TextLoader
    • Enable TextLoader to accept new lines in quoted fields (#5125)
    • Add escapeChar support to TextLoader (#5147)
    • Add public generic methods to TextLoader catalog that accept Options objects (#5134)
    • Added decimal marker option in TextLoader (#5145, #5154)
  • Onnxruntime updated to v1.3 (#5104). This brings support for additional data types for the HashingEstimator.
  • Onnx export for OneHotHashEncodingTransformer and HashingTransormer (#5013, #5152, #5138)
  • Support for Categorical features in CalculateFeatureContribution of LightGBM (#5018)

Bug Fixes

In this release we have traced down every bug that would occur randomly and sporadically and fixed many subtle bugs. As a result, we have also re-enabled a lot of tests listed in the Test Updates section below.

  • Fixed race condition for test MulticlassTreeFeaturizedLRTest (#4950)
  • Fix SsaForecast bug (#5023)
  • Fixed x86 crash (#5081)
  • Fixed and added unit tests for EnsureResourceAsync hanging issue (#4943)
  • Added IDisposable support for several classes (#4939)
  • Updated libmf and corresponding MatrixFactorizationSimpleTrainAndPredict() baselines per build (#5121)
  • Fix MatrixFactorization trainer's warning (#5071)
  • Update CodeGenerator's console project to netcoreapp3.1 (#5066)
  • Let ImageLoadingTransformer dispose the last image it loads (#5056)
  • [LightGBM] Fixed bug for empty categorical values (#5048)
  • Converted potentially large variables to type long (#5041)
  • Made resource downloading more robust (#4997)
  • Updated MultiFileSource.Load to fix inconsistent behavior with multiple files (#5003)
  • Removed WeakReference already cleaned up by GC (#4995)
  • Fixed Bitmap(file) locking the file. (#4994)
  • Remove WeakReference list in PredictionEnginePoolPolicy. (#4992)
  • Added the assembly name of the custom transform to the model file (#4989)
  • Updated constructor of ImageLoadingTransformer to accept empty imageFolder paths (#4976)

Onnx bug fixes

  • ColumnSelectingTransformer now infers ONNX shape (#5079)
  • Fixed KMeans scoring differences between ORT and OnnxRunner (#4942)
  • CountFeatureSelectingEstimator no selection support (#5000)
  • Fixes OneHotEncoding Issue (#4974)
  • Fixes multiclass logistic regression (#4963)
  • Adding vector tests for KeyToValue and ValueToKey (#5090)

AutoML fixes

  • Handle NaN optimization metric in AutoML (#5031)
  • Add projects capability in CodeGenerator (#5002)
  • Simplify CodeGen - phase 2 (#4972)
  • Support sweeping multiline option in AutoML (#5148)

Test updates

  • Fix libomp installation for MacOS Builds(#5143, #5141)
  • address TF test download fail, use resource manager with retry download (#5102)
  • Adding OneHotHashEncoding Test (#5098)
  • Changed Dictionary to ConcurrentDictionary (#5097)
  • Added SQLite database to test loading of datasets in non-Windows builds (#5080)
  • Added ability to compare configuration specific baselines, updated baslines for many tests and re-enabled disabled tests (#5045, #5059, #5068, #5057, #5047, #5029, #5094, #5060)
  • Fixed TestCancellation hanging (#4999)
  • fix benchmark test hanging issue (#4985)
  • Added working version of checking whether file is available for access (#4938)

Documentation Updates

  • Update OnnxTransformer Doc XML (#5085)
  • Updated build docs for .NET Core 3.1 (#4967)
  • Updated OnnxScoringEstimator's documentation (#4966)
  • Fix xrefs in the LDSVM trainer docs (#4940)
  • Clarified parameters on time series (#5038)
  • Update ForecastBySsa function specifications and add seealso (#5027)
  • Add see also section to TensorFlowEstimator docs (#4941)

Breaking Changes

  • None

ML.NET 1.5.0-preview2

12 Mar 02:41
ed481b6
Compare
Choose a tag to compare

New Features (IN-PREVIEW, please provide feedback)

  • TimeSeriesImputer (#4623) This data transformer can be used to impute missing rows in time series data.
  • LDSVM Trainer (#4060) The "Local Deep SVM" usess trees as its SVM kernel to create a non-linear binary trainer. A sample can be found here.
  • Onnxruntime updated to v1.2 This also includes support for GPU execution of onnx models
  • Export-to-ONNX for below components:
    • SlotsDroppingTransformer (#4562)
    • ColumnSelectingTransformer (#4590)
    • VectorWhiteningTransformer (#4577)
    • NaiveBayesMulticlassTrainer (#4636)
    • PlattCalibratorTransformer (#4699)
    • TokenizingByCharactersTransformer (#4805)
    • TextNormalizingTransformer (#4781)

Bug Fixes

  • Fix issue in WaiterWaiter caused by race condition (#4829)
    • Onnx Export change to allow for running inference on multiple rows in OnnxRuntime (#4783)
  • Data splits to default to MLContext seed when not specified (#4764)
  • Add Seed property to MLContext and use as default for data splits (#4775)
  • Onnx bug fixes
    • Updating onnxruntime version (#4882)
    • Calculate ReduceSum row by row in ONNX model from OneVsAllTrainer (#4904)
    • Several onnx export fixes related to KeyToValue and ValueToKey transformers (#4900, #4866, #4841, #4889, #4878, #4797)
    • Fixes to onnx export for text related transforms (#4891, #4813)
    • Fixed bugs in OptionalColumnTransform and ColumnSelecting (#4887, #4815)
    • Alternate solution for ColumnConcatenatingTransformer (#4875)
    • Added slot names support for OnnxTransformer (#4857)
    • Fixed output schema of OnnxTransformer (#4849)
    • Changed Binarizer node to be cast to the type of the predicted label … (#4818)
    • Fix for OneVersusAllTrainer (#4698)
    • Enable OnnxTransformer to accept KeyDataViewTypes as if they were UInt32 (#4824)
    • Fix off by 1 error with the cats_int64s attribute for the OneHotEncoder ONNX operator (#4827)
    • Changed Binarizer node to be cast to the type of the predicted label … (#4818)
    • Updated handling of missing values with LightGBM, and added ability to use (0) as missing value (#4695)
    • Double cast to float for some onnx estimators (#4745)
    • Fix onnx output name for GcnTransform (#4786)
  • Added support to run PFI on uncalibrated binary classification models (#4587)
  • Fix bug in WordBagEstimator when training on empty data (#4696)
  • Added Cancellation mechanism to Image Classification (through the experimental nuget) (fixes #4632) (#4650)
  • Changed F1 score to return 0 instead of NaN when Precision + Recall is 0 (#4674)
  • TextLoader, BinaryLoader and SvmLightLoader now check the existence of the input file before training (#4665)
  • ImageLoadingTransformer now checks the existence of input folder before training (#4691)
  • Use random file name for AutoML experiment folder (#4657)
  • Using invariance culture when converting to string (#4635)
  • Fix NullReferenceException when it comes to Recommendation in AutoML and CodeGenerator (#4774)

Enhancements

  • Added in support for System.DateTime type for the DateTimeTransformer (#4661)
  • Additional changes to ExpressionTransformer (#4614)
  • Optimize generic MethodInfo for Func (#4588)
  • Data splits to default to MLContext seed when not specified (#4764)
  • Added in DateTime type support for TimeSeriesImputer (#4812)

Test updates

  • Code analysis updates
    • Update analyzer test library (#4740)
    • Enable the internal code analyzer for test projects (#4731)
    • Implement MSML_ExtendBaseTestClass (Test classes should be derived from BaseTestClass) (#4746)
    • Enable MSML_TypeParamName for the full solution (#4762)
    • Enable MSML_ParameterLocalVarName for the full solution (#4833)
    • Enable MSML_SingleVariableDeclaration for the full solution (#4765)
  • Better logging from tests
    • Ensure tests capture the full log (#4710)
    • Fix failure to capture test failures (#4716)
    • Collect crash dump upload dump and pdb to artifact (#4666)
  • Enable Conditional Numerical Reproducibility for tests (#4569)
  • Changed all MLContext creation to include a fixed seed (#4736)
  • Fix incorrect SynchronizationContext use in TestSweeper (#4779)

Documentation Updates

Breaking Changes

  • None

ML.NET 1.5.0-preview

01 Jan 01:26
Compare
Choose a tag to compare
ML.NET 1.5.0-preview Pre-release
Pre-release

New Features (IN-PREVIEW, please provide feedback)

  • Export-to-ONNX for below components:

    • WordTokenizingTransformer (#4451)
    • NgramExtractingTransformer (#4451)
    • OptionalColumnTransform (#4454)
    • KeyToValueMappingTransformer (#4455)
    • LbfgsMaximumEntropyMulticlassTrainer (4462)
    • LightGbmMulticlassTrainer (4462)
    • LightGbmMulticlassTrainer with SoftMax (4462)
    • OneVersusAllTrainer (4462)
    • SdcaMaximumEntropyMulticlassTrainer (4462)
    • SdcaNonCalibratedMulticlassTrainer (4462)
    • CopyColumn Transform (#4486)
    • PriorTrainer (#4515)
  • DateTime Transformer (#4521)

  • Loader and Saver for SVMLight file format (#4190)
    Sample

  • Expression transformer (#4548)
    The expression transformer takes the expression in the form of text using syntax of a simple expression language, and performs the operation defined in the expression on the input columns in each row of the data. The transformer supports having a vector input column, in which case it applies the expression to each slot of the vector independently. The expression language is extendable to user defined operations.
    Sample

Bug Fixes

  • Fix using permutation feature importance with Binary Prediction Transformer and CalibratedModelParametersBase loaded from disk. (#4306)
  • Fixed model saving and loading of OneVersusAllTrainer to include SoftMax. (#4472)
  • Ignore hidden columns in AutoML schema checks of validation data. (#4490)
  • Ensure BufferBlocks are completed and empty in RowShufflingTransformer. (#4479)
  • Create methods not being called when loading models from disk. (#4485)
  • Fixes onnx exports for binary classification trainers. (#4463)
  • Make PredictionEnginePool.GetPredictionEngine thread safe. (#4570)
  • Memory leak when using FeaturizeText transform. (#4576)
  • System.ArgumentOutOfRangeException issue in CustomStopWordsRemovingTransformer. (#4592)
  • Image Classification low accuracy on EuroSAT Dataset. (4522)

Stability fixes by Sam Harwell

  • Prevent exceptions from escaping FileSystemWatcher events. (#4535)
  • Make local functions static where applicable. (#4530)
  • Disable CS0649 in OnnxConversionTest. (#4531)
  • Make test methods public. (#4532)
  • Conditionally compile helper code. (#4534)
  • Avoid running API Compat for design time builds. (#4529)
  • Pass by reference when null is not expected. (#4546)
  • Add Xunit.Combinatorial for test projects. (#4545)
  • Use Theory to break up tests in OnnxConversionTest. (#4533)
  • Update code coverage integration. (#4543)
  • Use std::unique_ptr for objects in LdaEngine. (#4547)
  • Enable VSTestBlame to show details for crashes. (#4537)
  • Use std::unique_ptr for samplers_ and likelihood_in_iter_. (#4551)
  • Add tests for IParameterValue implementations. (#4549)
  • Convert LdaEngine to a SafeHandle. (#4538)
  • Create SafeBoosterHandle and SafeDataSetHandle. (#4539)
  • Add IterationDataAttribute. (#4561)
  • Add tests for ParameterSet equality. (#4550)
  • Add a test handler for AppDomain.UnhandledException. (#4557)

Breaking Changes

None

Enhancements

  • Hash Transform API that takes in advanced options. (#4443)
  • Image classification performance improvements and option to create validation set from train set. (#4522)
  • Upgraded OnnxRuntime to v1.0 and Google Protobuf to 3.10.1. (#4416)

CLI and AutoML API

  • None.

Remarks

  • Thank you, Sam Harwell for making a series of stability fixes that has substantially increased the stability of our Build CI.

ML.NET 1.4.0

01 Jan 01:23
1480fda
Compare
Choose a tag to compare

New Features

  • General Availability of Image Classification API
    Introduces Microsoft.ML.Vision package that enables image classification by leveraging an existing pre-trained deep neural network model. Here the API trains the last classification layer using TensorFlow by using its C# bindings from TensorFlow .NET. This is a high level API that is simple yet powerful. Below are some of the key features:

    • GPU training: Supported on Windows and Linux, more information here.
    • Early stopping: Saves time by stopping training automatically when model has been stabelized.
    • Learning rate scheduler: Learning rate is an integral and potentially difficult part of deep learning. By providing learning rate schedulers, we give users a way to optimize the learning rate with high initial values which can decay over time. High initial learning rate helps to introduce randomness into the system, allowing the Loss function to better find the global minima. While the decayed learning rate helps to stabilize the loss over time. We have implemented Exponential Decay Learning rate scheduler and Polynomial Decay Learning rate scheduler.
    • Pre-trained DNN Architectures: The supported DNN architectures used internally for transfer learning are below:
      • Inception V3.
      • ResNet V2 101.
      • ResNet V2 50.
      • MobileNet V2.

    Example code:

    var pipeline = mlContext.MulticlassClassification.Trainers.ImageClassification(
                    featureColumnName: "Image", labelColumnName: "Label");
    
    ITransformer trainedModel = pipeline.Fit(trainDataView);

    Samples

    Defaults

    Learning rate scheduling

    Early stopping

    ResNet V2 101 train-test split

    End-to-End

  • General Availability of Database Loader
    The database loader enables to load data from databases into the IDataView and therefore enables model training directly against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, etc.

    It is important to highlight that in the same way as when training from files, when training with a database ML .NET also supports data streaming, meaning that the whole database doesn’t need to fit into memory, it’ll be reading from the database as it needs so you can handle very large databases (i.e. 50GB, 100GB or larger).

    Example code:

    //Lines of code for loading data from a database into an IDataView for a later model training
    //...
    string connectionString = @"Data Source=YOUR_SERVER;Initial Catalog= YOUR_DATABASE;Integrated Security=True";
    
    string commandText = "SELECT * from SentimentDataset";
    
    DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader();
    DbProviderFactory providerFactory = DbProviderFactories.GetFactory("System.Data.SqlClient");
    DatabaseSource dbSource = new DatabaseSource(providerFactory, connectionString, commandText);
    
    IDataView trainingDataView = loader.Load(dbSource);
    
    // ML.NET model training code using the training IDataView
    //...
    
    public class SentimentData
    {
        public string FeedbackText;
        public string Label;
    }

    Design specification

    Sample

    How to doc

  • General Availability of PredictionEnginePool for scalable deployment
    When deploying an ML model into multi-threaded and scalable .NET Core web applications and services (such as ASP .NET Core web apps, WebAPIs or an Azure Function) it is recommended to use the PredictionEnginePool instead of directly creating the PredictionEngine object on every request due to performance and scalability reasons. For further background information on why the PredictionEnginePool is recommended, read this blog post.

    Sample

  • General Availability of Enhanced for .NET Core 3.0
    This means ML .NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.

Bug Fixes

  • Adds reasonable exception when user tries to use OnnxSequenceType attribute without specifing sequence type. (#4272)
  • Image Classification API: Fix processing incomplete batch(<batchSize), images processed per epoch , enable EarlyStopping without Validation Set. (#4289)
  • Exception is thrown if NDCG > 10 is used with LightGbm for evaluating ranking. (##4081)
  • DatabaseLoader error when using attributes (i.e ColumnName). (#4308)
  • Recommendation experiment got SMAC local search exception during training. (#4358)
  • TensorFlow exception triggered: input ended unexpectedly in the middle of a field. (#4314)
  • PredictionEngine breaks after saving/loading a Model. (#4321)
  • Data file locked even after TextLoader goes out of context. (#4404)
  • ImageClassification API should save cache files/meta files in user temp directory or user provided workspace path. (#4410)

Breaking Changes

None

Enhancements

  • Publish latest nuget to public feed from master branch when commits are made. (#4406)
  • Defaults for ImageClassification API. (#4415)

CLI and AutoML API

  • Recommendation Task. (#4246, 4391)
  • Image Classification Task. (#4395)
  • Move AutoML CodeGen to master from feature branch. (#4365)

Remarks

  • None.

ML.NET 1.4.0-preview2

09 Oct 21:10
Compare
Choose a tag to compare
ML.NET 1.4.0-preview2 Pre-release
Pre-release

New Features

  • Deep Neural Networks Training (0.16.0-preview2)

    Improves the in-preview ImageClassification API further:

    • Early stopping feature stops the training when optimal accuracy is reached (#4237)
    • Enables inferencing on in-memory images (#4242)
    • PredictedLabel output column now contains actual class labels instead of uint32 class index values (#4228)
    • GPU support on Windows and Linux (#4270, #4277)
    • Upgraded TensorFlow .NET version to 0.11.3 (#4205)

    In-memory image inferencing sample
    Early stopping sample
    GPU samples

  • New ONNX Exporters (1.4.0-preview2)

    • LpNormNormalizing transformer (#4161)
    • PCA transformer (4188)
    • TypeConverting transformer (#4155)
    • MissingValueIndicator transformer (#4194)

Bug Fixes

  • OnnxSequenceType and ColumnName attributes together doesn't work (#4187)
  • Fix memory leak in TensorflowTransformer (#4223)
  • Enable permutation feature importance to be used with model loaded from disk (#4262)
  • IsSavedModel returns true when loaded TensorFlow model is a frozen model (#4262)
  • Exception when using OnnxSequenceType attribute directly without specify sequence type (#4272, #4297)

Samples

  • TensorFlow full model retrain sample (#4127)

Breaking Changes

None.

Obsolete API

  • OnnxSequenceType attribute that doesn't take a type (#4272, #4297)

Enhancements

  • Improve exception message in LightGBM (#4214)
  • FeaturizeText should allow only outputColumnName to be defined (#4211)
  • Fix NgramExtractingTransformer GetSlotNames to not allocate a new delegate on every invoke (#4247)
  • Resurrect broken code coverage build and re-enable code coverage for pull request (#4261)
  • NimbusML entrypoint for permutation feature importance (#4232)
  • Reuse memory when copying outputs from TensorFlow graph (#4260)
  • DateTime to DateTime standard conversion (#4273)
  • CodeCov version upgraded to 1.7.2 (#4291)

CLI and AutoML API

None.

Remarks

None.

ML.NET 1.4.0-preview

09 Oct 21:09
Compare
Choose a tag to compare
ML.NET 1.4.0-preview Pre-release
Pre-release

New Features

  • Deep Neural Networks Training (0.16.0-preview) (#4151)

    Improves the in-preview ImageClassification API further:

    • Increases DNN training speed by ~10x compared to the same API in 0.15.1 release.
    • Prevents repeated computations by caching featurized image values to disk from intermediate layers to train the final fully-connected layer.
    • Reduced and constant memory footprint.
    • Simplifies the API by not requiring the user to pre-process the image.
    • Introduces callback to provide metrics during training such as accuracy, cross-entropy.
    • Improved image classification sample.
          public static ImageClassificationEstimator ImageClassification(
              this ModelOperationsCatalog catalog,
              string featuresColumnName,
              string labelColumnName,
              string scoreColumnName = "Score",
              string predictedLabelColumnName = "PredictedLabel",
              Architecture arch = Architecture.InceptionV3,
              int epoch = 100,
              int batchSize = 10,
              float learningRate = 0.01f,
              ImageClassificationMetricsCallback metricsCallback = null,
              int statisticFrequency = 1,
              DnnFramework framework = DnnFramework.Tensorflow,
              string modelSavePath = null,
              string finalModelPrefix = "custom_retrained_model_based_on_",
              IDataView validationSet = null,
              bool testOnTrainSet = true,
              bool reuseTrainSetBottleneckCachedValues = false,
              bool reuseValidationSetBottleneckCachedValues = false,
              string trainSetBottleneckCachedValuesFilePath = "trainSetBottleneckFile.csv",
              string validationSetBottleneckCachedValuesFilePath = "validationSetBottleneckFile.csv"
              )

    Design specification

    Sample

  • Database Loader (0.16.0-preview) (#4070,#4091,#4138)

    Additional DatabaseLoader support:

    • Support DBNull.
    • Add CreateDatabaseLoader<TInput> to map columns from a .NET Type.
    • Read multiple columns into a single vector

    Design specification

    Sample

      string connectionString = "YOUR_RELATIONAL_DATABASE_CONNECTION_STRING";
    
      string commandText = "SELECT * from URLClicks";
    
      DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader<UrlClick>();
                  
      DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, 
                                                      connectionString, 
                                                      commandText);         
      IDataView dataView = loader.Load(dbSource);
  • Enhanced .NET Core 3.0 Support

    • Use C# hardware intrinsics detection to support AVX, SSE and software fallbacks
    • Allows for faster training on AVX-supported machines
    • Allows for scoring core ML .NET models on ARM processors. (Note: some components do not support ARM yet, ex. FastTree, LightGBM, OnnxTransformer)

Bug Fixes

None.

Samples

  • DeepLearning Image Classification Training sample (DNN Transfer Learning) (#633)
  • DatabaseLoader sample loading an IDataView from SQL Server localdb (#611)

Breaking Changes

None

Enhancements

None.

CLI and AutoML API

  • AutoML codebase has moved from feature branch to master branch (#3882).

Remarks

None.

ML.NET 1.3.1

06 Aug 10:50
d1d5e1f
Compare
Choose a tag to compare

New Features

  • Deep Neural Networks Training (PREVIEW) (#4057)
    Introduces in-preview 0.15.1 Microsoft.ML.DNN package that enables full DNN model retraining and transfer learning in .NET using C# bindings for tensorflow provided by Tensorflow .NET. The goal of this package is to allow high level DNN training and scoring tasks such as image classification, text classification, object detection, etc using simple yet powerful APIs that are framework agnostic but currently they only uses Tensorflow as the backend. The below APIs are in early preview and we hope to get customer feedback that we can incorporate in the next iteration.

    DNN stack

    public static DnnEstimator RetrainDnnModel(
              this ModelOperationsCatalog catalog,
              string[] outputColumnNames,
              string[] inputColumnNames,
              string labelColumnName,
              string tensorFlowLabel,
              string optimizationOperation,
              string modelPath,
              int epoch = 10,
              int batchSize = 20,
              string lossOperation = null,
              string metricOperation = null,
              string learningRateOperation = null,
              float learningRate = 0.01f,
              bool addBatchDimensionInput = false,
              DnnFramework dnnFramework = DnnFramework.Tensorflow)
    
    public static DnnEstimator ImageClassification(
              this ModelOperationsCatalog catalog,
              string featuresColumnName,
              string labelColumnName,
              string outputGraphPath = null,
              string scoreColumnName = "Score",
              string predictedLabelColumnName = "PredictedLabel",
              string checkpointName = "_retrain_checkpoint",
              Architecture arch = Architecture.InceptionV3,
              DnnFramework dnnFramework = DnnFramework.Tensorflow,
              int epoch = 10,
              int batchSize = 20,
              float learningRate = 0.01f,
              bool measureTrainAccuracy = false)

    Design specification

    Image classification (Inception V3) sample

    Image classification (Resnet V2 101) sample

  • Database Loader (PREVIEW) (#4035)
    Introduces Database loader that enables training on databases. This loader supports any relational database supported by System.Data in .NET Framework or .NET Core, meaning that you can use many RDBMS such as SQL Server, Azure SQL Database, Oracle, PostgreSQL, MySQL, etc. This feature is in early preview and can be accessed via Microsoft.ML.Experimental nuget.

    Design specification

    Sample

    public static DatabaseLoader CreateDatabaseLoader(this DataOperationsCatalog catalog,
              params DatabaseLoader.Column[] columns)

Bug Fixes

Serious

  • SaveOnnxCommand appears to ignore predictors when saving a model to ONNX format: This broke export to ONNX functionality. (3974)

  • Unable to use fasterrcnn onnx model. (3963)

  • PredictedLabel is always true for Anomaly Detection: This bug disabled scenarios like fraud detection using binary classification/PCA. (#4039)

  • Update build certifications: This bug broke the official builds because of outdated certificates that were being used. (#4059)

Other

  • Stop LightGbm Warning for Default Metric Input: Fixes warning, LightGBM Warning Unknown parameter metric= is produced when the default metric is used. (#3965)

Samples

Breaking Changes

None

Enhancements

  • Farewell to the Static API (4009)

  • AVX and FMA intrinsics in Factorization Machine (3940)

CLI and AutoML API

  • Bug fixes.

Remarks

ML.NET v1.2.0

03 Jul 05:44
1c1d3a4
Compare
Choose a tag to compare

General Availability

  • Microsoft.ML.TimeSeries

    • Anomaly detection algorithms (Spike and Change Point):
      • Independent and identically distributed.
      • Singular spectrum analysis.
      • Spectral residual from Azure Anomaly Detector/Kensho team.
    • Forecasting models:
      • Singular spectrum analysis.
    • Prediction Engine for online learning
      • Enables updating time series model with new observations at scoring so that the user does not have to re-train the time series with old data each time.

    Samples

  • Microsoft.ML.OnnxTransformer
    Enables scoring of ONNX models in the learning pipeline. Uses ONNX Runtime v0.4.

    Sample

  • Microsoft.ML.TensorFlow
    Enables scoring of TensorFlow models in the learning pipeline. Uses TensorFlow v1.13. Very useful for image and text classification. Users can featurize images or text using DNN models and feed the result into a classical machine learning model like a decision tree or logistic regression trainer.

    Samples

New Features

  • Tree-based featurization (#3812)

    Generating features using tree structure has been a popular technique in data mining. Useful for capturing feature interactions when creating a stacked model, dimensionality reduction, or featurizing towards an alternative label. ML.NET's tree featurization trains a tree-based model and then maps input feature vector to several non-linear feature vectors. Those generated feature vectors are:

    • The leaves it falls into. It's a binary vector with ones happens at the indexes of reached leaves,
    • The paths that the input vector passes before hitting the leaves, and
    • The reached leaves values.

    Here are two references.

    Samples

  • Microsoft.Extensions.ML integration package. (#3827)

    This package makes it easier to use ML.NET with app models that support Microsoft.Extensions - i.e. ASP.NET and Azure Functions.

    Specifically it contains functionality for:

    • Dependency Injection
    • Pooling PredictionEngines
    • Reloading models when the file or URI has changed
    • Hooking ML.NET logging to Microsoft.Extensions.Logging

Bug Fixes

Serious

  • Time series Sequential Transform needs to have a binding mechanism: This bug made it impossible to use time series in NimbusML. (#3875)

  • Build errors resulting from upgrading to VS2019 compilers: The default CMAKE_C_FLAG for debug configuration sets /ZI to generate a PDB capable of edit and continue. In the new compilers, this is incompatible with /guard:cf which we set for security reasons. (#3894)

  • LightGBM Evaluation metric parameters: In LightGbm EvaluateMetricType where if a user specified EvaluateMetricType.Default, the metric would not get added to the options Dictionary, and LightGbmWrappedTraining would throw because of that. (#3815)

  • Change default EvaluationMetric for LightGbm: In ML.NET, the default EvaluationMetric for LightGbm is set to EvaluateMetricType.Error for multiclass, EvaluationMetricType.LogLoss for binary etc. This leads to inconsistent behavior from the user's perspective. (#3859)

Other

  • CustomGains should allow multiple values in argument attribute. (#3854)

Breaking Changes

None

Enhancements

  • Fixes the Hardcoded Sigmoid value from -0.5 to the value specified during training. (#3850)

  • Fix TextLoader constructor and add exception message. (#3788)

  • Introduce the FixZero argument to the LogMeanVariance normalizer. (#3916)

  • Ensembles trainer now work with ITrainerEstimators instead of ITrainers. (#3796)

  • LightGBM Unbalanced Data Argument. (#3925)

  • Tree based trainers implement ICanGetSummaryAsIDataView. (#3892)

  • CLI and AutoML API

    • Internationalization fixes to generate proper ML.NET C# code. (#3725)
    • Automatic Cross Validation for small datasets, and CV stability fixes. (#3794)
    • Code cleanup to match .NET style. (#3823)

Documentation and Samples

  • Samples for applying ONNX model to in-memory images. (#3851)
  • Reformatted all ~200 samples to 85 character width so the horizontal scrollbar does not appear on docs webpage. (#3930, 3941, 3949, 3950, 3947, 3943, 3942, 3946, 3948)

Remarks

  • Roughly 200 Github issues were closed, the count decreased from ~550 to 351. Most of the issues got resolved due to the release of stable API and availability of samples.

ML.NET v1.1.0

04 Jun 22:41
d5c4e94
Compare
Choose a tag to compare

New Features

  • Image type support in IDataView
    PR#3263 added support for in-memory image as a type in IDataView. Previously it was not possible to use an image directly in IDataView, and the user had to specify the file path as a string and load the image using a transform. The feature resolved the following issues: 3162, 3723, 3369, 3274, 445, 3460, 2121, 2495, 3784.

    Image type support in IDataView was a much requested feature by the users.

    Sample to convert gray scale image in-Memory | Sample for custom mapping with in-memory using custom type

  • Super-Resolution based Anomaly Detector (preview, please provide feedback)
    PR#3693 adds a new anomaly detection algorithm to the Microsoft.ML.TimeSeries nuget. This algorithm is based on Super-Resolution using Deep Convolutional Networks and also got accepted in KDD'2019 conference as an oral presentation. One of the advantages of this algorithm is that it does not require any prior training and based on benchmarks using grid parameter search to find upper bounds it out performs the Independent and identically distributed(IID) and Singular Spectrum Analysis(SSA) based anomaly detection algorithms in accuracy. This contribution comes from the Azure Anomaly Detector team.

    Algo Precision Recall F1 #TruePositive #Positives #Anomalies Fine tuned parameters
    SSA (requires training) 0.582 0.585 0.583 2290 3936 3915 Confidence=99, PValueHistoryLength=32, Season=11, and use half the data of each series to do the training.
    IID 0.668 0.491 0.566 1924 2579 3915 Confidence=99, PValueHistoryLength=56
    SR 0.601 0.670 0.634 2625 4370 3915 WindowSize=64, BackAddWindowSize=5, LookaheadWindowSize=5, AveragingWindowSize=3, JudgementWindowSize=64, Threshold=0.45

    Sample for anomaly detection by SRCNN | Sample for anomaly detection by SRCNN using batch prediction

  • Time Series Forecasting (preview, please provide feedback)
    PR#1900 introduces a framework for time series forecasting models and exposes an API for Singular Spectrum Analysis(SSA) based forecasting model in the Microsoft.ML.TimeSeries nuget. This framework allows to forecast w/o confidence intervals, update model with new observations and save/load the model to/from persistent storage. This closes following issues 929 and 3151 and was a much requested feature by the github community since September 2018. With this change Microsoft.ML.TimeSeries nuget is feature complete for RTM.

    Sample for forecasting | Sample for forecasting using confidence intervals

Bug Fixes

Serious

  • Math Kernel Library fails to load with latest libomp: Fixed by PR#3721 this bug made it impossible for anyone to check code into master branch because it was causing build failures.

  • Transform Wrapper fails at deserialization: Fixed by
    PR#3700 this bug affected first party(1P) customer. A model trained using NimbusML(Python bindings for ML.NET) and then loaded for scoring/inferencing using ML.NET will hit this bug.

  • Index out of bounds exception in KeyToVector transformer: Fixed by PR#3763 this bug closes following github issues: 3757,1751,2678. It affected first party customer and also github users.

Other

  • Download images only when not present on disk and print warning messages when converting unsupported pixel format by PR#3625
  • ML.NET source code does not build in VS2019 by PR#3742
  • Fix SoftMax precision by utilizing double in the internal calculations by PR#3676
  • Fix to the official build due to API Compat tool change by PR#3667
  • Check for number of input columns in concat transform by PR#3809

Breaking Changes

None

Enhancements

  • API Compat tool by PR#3623 ensures future changes to ML.NET will not break the stable API released in 1.0.0.
  • Upgrade the TensorFlow version from 1.12.0 to 1.13.1 by PR#3758
  • API for saving time series model to stream by PR#3805

Documentation and Samples

  • L1-norm and L2-norm regularization documentation by PR#3586
  • Sample for data save and load from text and binary files by PR#3745
  • Sample for LoadFromEnumerable with a SchemaDefinition by PR#3696
  • Sample for LogLossPerClass metric for multiclass trainers by PR#3724
  • Sample for WithOnFitDelegate by PR#3738
  • Sample for loading data using text loader using various techniques by PR#3793

Remarks