v0.8

elibarzilay released this 18 Jul 02:18

· 1557 commits to master since this release

New functionality:

We are now uploading MMLSpark as a Azure/mmlspark spark package.
Use --packages Azure:mmlspark:0.8 with the Spark command-line tools.
Add a bi-directional LSTM medical entity extractor to the
ModelDownloader, and new jupyter notebook for medical entity
extraction using NLTK, PubMed Word embeddings, and the Bi-LSTM.
Add ImageSetAugmenter for easy dataset augmentation within image
processing pipelines.

Improvements:

Optimize the performance of CNTKModel. It now broadcasts a loaded
model to workers and shares model weights between partitions on the
same worker. Minibatch padding (an internal workaround of a CNTK bug)
is now no longer used, eliminating excess computations when there is a
mismatch between the partition size and minibatch size.
Bugfix: CNTKModel can work with models with unnamed outputs.

Docker image improvements:

Environment variables are now part of the docker image (in addition to
being set in bash).
New docker images:
- microsoft/mmlspark:latest: plain image, as always,
- microsoft/mmlspark:gpu: GPU variant based on an nvidia/cuda image.
- microsoft/mmlspark:plus and microsoft/mmlspark:plus-gpu: these
  images contain additional packages for internal use; they will
  probably be based on an older Conda version too in future releases.

Updates:

The Conda environment now includes NLTK.
Updated Java and SBT versions.

Assets 2