New Functionality:

* Export trained LightGBM models for evaluation outside of Spark.

* LightGBM on Spark supports multiple cores per executor

* `CNTKModel` works with multi-input multi-output models of any CNTK datatype

* Added Minibatching and Flattening transformers for adding flexible batching logic to pipelines, deep networks, and web clients.

* Added `Benchmark` test API for tracking model performance across versions

* Added `PartitionConsolidator` function for aggregating streaming data onto one partition per executor (for use with connection/rate-limited HTTP services)

Updates and Improvements:

* Updated to Spark 2.3.0

* Added Databricks notebook tests to build system

* `CNTKModel` uses significantly less memory

* Simplified example notebooks

* Simplified APIs for MMLSpark Serving

* Simplified APIs for CNTK on Spark

* LightGBM stability improvements

* `ComputeModelStatistics` stability improvements


We would like to acknowledge the external contributors who helped create this version of MMLSpark (in order of commit history)

* 严伟,  @terrytangyuan, @ywskycn, @dvanasseldonk, Jilong Liao, @chappers, @ekaterina-sereda-rf


Toggle v0.12's commit message

New functionality:

* MMLSpark Serving: a RESTful computation engine built on Spark
  streaming.  See `docs/` for details.

* New LightGBM Binary Classification and Regression learners and
  infrastructure with a Python notebook for examples.

* MMLSpark Clients: a general-purpose, distributed, and fault tolerant
  HTTP Library usable from Spark, Pyspark, and SparklyR.  See

* Add `MinibatchTransformer` and `FlattenBatch` to enable efficient,
  buffered, minibatch processing in Spark.

* Added Python wrappers and a notebook example for the
  `TuneHyperparameters` module, demonstrating parallel distributed
  hyperparameter tuning through randomized grid search.

* Add a `MultiNGram` transformer for efficiently computing variable
  length n-grams.

* Added DataType parameter for building models that are parameterized by
  Spark data types.


* Update per-instance statistics module so it works for any Spark ML

* Update CNTK to version 2.4.

* Updated Spark to version v2.2.1 (the following release is likely to be
  based on Spark 2.3).

* Also updated SBT and JVM.

* Refactored readers directory into `io` directory


* Fix the Conda installation in our Docker image, resolving issues with
  importing `numpy`.

* Fix a regression in R wrappers with the latest SparklyR version.

* Additional bugfixes, stability, and notebook improvements.


Toggle v0.11's commit message

New functionality:

* TuneHyperparameters: parallel distributed randomized grid search for
  SparkML and TrainClassifier/TrainRegressor parameters.  Sample
  notebook and python wrappers will be added in the near future.

* Added `PowerBIWriter` for writing and streaming data frames to

* Expanded image reading and writing capabilities, including using
  images with Spark Structured Streaming.  Images can be read from and
  written to paths specified in a dataframe.

* New functionality for convenient plotting in Python.

* UDF transformer and additional UDFs.

* Expanded pipeline support for arbitrary user code and libraries such
  as NLTK through UDFTransformer.

* Refactored fuzzing system and added test coverage.

* GPU training supports multiple VMs.


* Updated to Conda 4.3.31, which comes with Python 3.6.3.

* Also updated SBT and JVM.


* Additional bugfixes, stability, and notebook improvements.


Toggle v0.10.9's commit message

Same as v0.11, but using an older Spark v2.1.0 installation.


Toggle v0.10's commit message

New functionality:

* We now provide initial support for training on a GPU VM, and an ARM
  template to deploy an HDI Cluster with an associated GPU machine.  See
  `docs/` for instructions on setting this up.

* New auto-generated R wrappers for estimators and transformers.  To
  import them into R, you can use devtools to import from the uploaded
  zip file.  Tests and sample notebooks to come.

* A new `RenameColumn` transformer for renaming columns within a

New notebooks:

* Notebook 104: An experiment to demonstrate regression models to
  predict automobile prices.  This notebook demonstrates the use of
  `Pipeline` stages, `CleanMissingData`, and

* Notebook 105: Demonstrates `DataConversion` to make some columns Categorical.

* There us a 401 notebook in `notebooks/gpu` which demonstrates CNTK
  training when using a GPU VM.  (It is not shown with the rest of the
  notebooks yet.)


* Updated to use CNTK 2.2.  Note that this version of CNTK depends on
  libpng12 and libjasper1 -- which are included in our docker images.
  (This should get resolved in the upcoming CNTK 2.3 release.)


* Local builds will always use a "0.0" version instead of a version
  based on the git repository.  This should simplify the build process
  for developers and avoid hard-to-resolve update issues.

* The `TextPreprocessor` transformer can be used to find and replace all
  key value pairs in an input map.

* Fixed a regression in the image reader where zip files with images no
  longer displayed the full path to the image inside a zip file.

* Additional minor bug and stability fixes.


Toggle v0.9.9's commit message

Same as v0.10, but using an older Conda installation with Python 3.5.2.


Toggle v0.9's commit message

New functionality:

* Refactor `ImageReader` and `BinaryFileReader` to support streaming
  images, including a Python API.  Also improved performance of the
  readers.  Check the 302 notebook for usage example.

* Add `ClassBalancer` estimator for improving classification performance
  on highly imbalanced datasets.

* Create an infrastructure for automated fuzzing, serialization, and
  python wrapper tests.

* Added a `DropColumns` pipeline stage.

New notebooks:

* 305: A Flowers sample notebook demonstrating deep transfer learning
  with `ImageFeaturizer`.


* Our main build is now based on Spark 2.2.


* Enable streaming through the `EnsembleByKey` transformer.

* ImageReader, HDFS issue, etc.


Toggle v0.8.9's commit message

Same as v0.9, but using an older Conda installation with Python 3.5.2.


Toggle v0.8's commit message

New functionality:

* We are now uploading MMLSpark as a "Azure/mmlspark" spark package.
  Use `--packages Azure:mmlspark:0.8` with the Spark command-line tools.

* Add a bi-directional LSTM medical entity extractor to the
  `ModelDownloader`, and new jupyter notebook for medical entity
  extraction using NLTK, PubMed Word embeddings, and the Bi-LSTM.

* Add `ImageSetAugmenter` for easy dataset augmentation within image
  processing pipelines.


* Optimize the performance of `CNTKModel`.  It now broadcasts a loaded
  model to workers and shares model weights between partitions on the
  same worker.  Minibatch padding (an internal workaround of a CNTK bug)
  is now no longer used, eliminating excess computations when there is a
  mismatch between the partition size and minibatch size.

* Bugfix: CNTKModel can work with models with unnamed outputs.

Docker image improvements:

* Environment variables are now part of the docker image (in addition to
  being set in bash).

* New docker images:
  - `microsoft/mmlspark:latest`: plain image, as always,
  - `microsoft/mmlspark:gpu`: GPU variant based on an `nvidia/cuda` image.
  - `microsoft/mmlspark:plus` and `microsoft/mmlspark:plus-gpu`: these
    images contain additional packages for internal use; they will
    probably be based on an older Conda version too in future releases.


* The Conda environment now includes NLTK.

* Updated Java and SBT versions.


Toggle v0.7.91's commit message

Same as v0.8, but using an older Conda installation with Python 3.5.2.