Skip to content

v0.8

@mmlspark-bot mmlspark-bot tagged this 14 Sep 18:41
New functionality:

* We are now uploading MMLSpark as a "Azure/mmlspark" spark package.
  Use `--packages Azure:mmlspark:0.8` with the Spark command-line tools.

* Add a bi-directional LSTM medical entity extractor to the
  `ModelDownloader`, and new jupyter notebook for medical entity
  extraction using NLTK, PubMed Word embeddings, and the Bi-LSTM.

* Add `ImageSetAugmenter` for easy dataset augmentation within image
  processing pipelines.

Improvements:

* Optimize the performance of `CNTKModel`.  It now broadcasts a loaded
  model to workers and shares model weights between partitions on the
  same worker.  Minibatch padding (an internal workaround of a CNTK bug)
  is now no longer used, eliminating excess computations when there is a
  mismatch between the partition size and minibatch size.

* Bugfix: CNTKModel can work with models with unnamed outputs.

Docker image improvements:

* Environment variables are now part of the docker image (in addition to
  being set in bash).

* New docker images:
  - `microsoft/mmlspark:latest`: plain image, as always,
  - `microsoft/mmlspark:gpu`: GPU variant based on an `nvidia/cuda` image.
  - `microsoft/mmlspark:plus` and `microsoft/mmlspark:plus-gpu`: these
    images contain additional packages for internal use; they will
    probably be based on an older Conda version too in future releases.

Updates:

* The Conda environment now includes NLTK.

* Updated Java and SBT versions.
Assets 2
Loading