Skip to content

v0.8

Compare
Choose a tag to compare
@elibarzilay elibarzilay released this 18 Jul 02:18
· 1557 commits to master since this release

New functionality:

  • We are now uploading MMLSpark as a Azure/mmlspark spark package.
    Use --packages Azure:mmlspark:0.8 with the Spark command-line tools.

  • Add a bi-directional LSTM medical entity extractor to the
    ModelDownloader, and new jupyter notebook for medical entity
    extraction using NLTK, PubMed Word embeddings, and the Bi-LSTM.

  • Add ImageSetAugmenter for easy dataset augmentation within image
    processing pipelines.

Improvements:

  • Optimize the performance of CNTKModel. It now broadcasts a loaded
    model to workers and shares model weights between partitions on the
    same worker. Minibatch padding (an internal workaround of a CNTK bug)
    is now no longer used, eliminating excess computations when there is a
    mismatch between the partition size and minibatch size.

  • Bugfix: CNTKModel can work with models with unnamed outputs.

Docker image improvements:

  • Environment variables are now part of the docker image (in addition to
    being set in bash).

  • New docker images:

    • microsoft/mmlspark:latest: plain image, as always,
    • microsoft/mmlspark:gpu: GPU variant based on an nvidia/cuda image.
    • microsoft/mmlspark:plus and microsoft/mmlspark:plus-gpu: these
      images contain additional packages for internal use; they will
      probably be based on an older Conda version too in future releases.

Updates:

  • The Conda environment now includes NLTK.

  • Updated Java and SBT versions.