v0.8
New functionality:
-
We are now uploading MMLSpark as a
Azure/mmlspark
spark package.
Use--packages Azure:mmlspark:0.8
with the Spark command-line tools. -
Add a bi-directional LSTM medical entity extractor to the
ModelDownloader
, and new jupyter notebook for medical entity
extraction using NLTK, PubMed Word embeddings, and the Bi-LSTM. -
Add
ImageSetAugmenter
for easy dataset augmentation within image
processing pipelines.
Improvements:
-
Optimize the performance of
CNTKModel
. It now broadcasts a loaded
model to workers and shares model weights between partitions on the
same worker. Minibatch padding (an internal workaround of a CNTK bug)
is now no longer used, eliminating excess computations when there is a
mismatch between the partition size and minibatch size. -
Bugfix: CNTKModel can work with models with unnamed outputs.
Docker image improvements:
-
Environment variables are now part of the docker image (in addition to
being set in bash). -
New docker images:
microsoft/mmlspark:latest
: plain image, as always,microsoft/mmlspark:gpu
: GPU variant based on annvidia/cuda
image.microsoft/mmlspark:plus
andmicrosoft/mmlspark:plus-gpu
: these
images contain additional packages for internal use; they will
probably be based on an older Conda version too in future releases.
Updates:
-
The Conda environment now includes NLTK.
-
Updated Java and SBT versions.