Skip to content
This repository was archived by the owner on Nov 16, 2023. It is now read-only.

Update aml branch. #415

Merged
merged 152 commits into from
Jan 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
ebf03e9
Draft, adding CategoryImputer, ToKeyImputer, ToString transformers
ganik Aug 23, 2019
600f5a5
Merge branch 'master' into ganik/aml
ganik Aug 23, 2019
ac12b2f
add tests
ganik Aug 23, 2019
62d3874
prelim commit
ganik Aug 24, 2019
2dec203
update manifest, fix unit tests/examples
ganik Aug 28, 2019
bb9dd73
upgrade version
ganik Aug 30, 2019
4034c8a
fix tests
ganik Aug 30, 2019
deecf05
temp hack fix for native libs
ganik Aug 30, 2019
e2ef393
copy libFeaturizers.so
ganik Aug 30, 2019
6c85423
fix version
ganik Aug 30, 2019
7fc9e08
fix cp
ganik Aug 30, 2019
2dff122
fix version
ganik Aug 30, 2019
ab32ad1
Update ML.Net version number.
Sep 4, 2019
96904c4
Update the examples and unit tests.
Sep 4, 2019
74df1e7
Update to latest version of the Featurizers library.
Sep 4, 2019
b310fc2
Fix test_tostring unit test.
Sep 4, 2019
f10ac43
Temporarily skip the estimator checks unit tests.
Sep 4, 2019
e13263a
Upgrade pip to the latest version when installing the Python
Sep 5, 2019
fe37274
Update test_estimator_checks for the three new transformers.
Sep 6, 2019
c928293
Remove extra comma from test_estimator_checks.
Sep 6, 2019
4d6807e
Update the ML.Net version.
Sep 6, 2019
f3e8417
Merge pull request #4 from pieths/aml
ganik Sep 6, 2019
7f20259
Merge branch 'master' into ganik/aml
ganik Sep 6, 2019
9c23cc4
Merge branch 'ganik/aml' of https://github.com/ganik/NimbusML into ga…
ganik Sep 11, 2019
46b5716
Merge branch 'master' into ganik/aml
ganik Sep 11, 2019
2c4630c
Add TimeSeriesImputer
ganik Sep 11, 2019
03a218c
Add country param to DateTimeSplitter
ganik Sep 17, 2019
1454256
Upgrade TensorFlow.NET version. Required by latest version of Microso…
Sep 18, 2019
e0bf89e
Update ML.Net version and import new AutoMLFeaturizers package.
Sep 18, 2019
a69cf6e
Add back in the accidentally removed tests from test_data_with_missin…
Sep 18, 2019
b1a8073
Update the DateTimeSplitter examples.
Sep 18, 2019
2d89f49
Update the ToKeyImputer examples.
Sep 18, 2019
d434492
Update the ToString examples.
Sep 18, 2019
a726b23
Merge pull request #5 from pieths/aml
ganik Sep 19, 2019
b420f79
Merge branch 'master' into ganik/aml
ganik Sep 19, 2019
b3e78dd
Merge branch 'master' into ganik/aml
ganik Sep 19, 2019
088d437
Update build to support latest nuget packages and updates.
Sep 20, 2019
feef418
Remove copy of libFeaturizers from linux build script.
Sep 20, 2019
6b8c5c5
Merge pull request #6 from pieths/aml
ganik Sep 20, 2019
ecb5f3b
Add TimeSeriesImputer to the NimbusML project.
Sep 20, 2019
b1f473f
Merge pull request #7 from pieths/aml
ganik Sep 20, 2019
7f1917f
Add initial DataFrame based example for TimeSeriesImputer.
Sep 23, 2019
d4a78ae
Update to the latest version of manifest.json.
Sep 23, 2019
ed9ec73
Add missing project include for the TimeSeriesImputer example.
Sep 23, 2019
eb36294
Update the DateTimeSplitter examples.
Sep 23, 2019
15d026c
Update build files to copy over the Data folder which is required for…
Sep 23, 2019
4904acf
Add a unit test for testing the holiday name return value for DateTim…
Sep 23, 2019
5edf0d8
Add unit test for ToKeyImputer.
Sep 23, 2019
a775a92
Update to latest version of manifest.json. Makes grain input required…
Sep 23, 2019
34f6ba6
Update TimeSeriesImputer_df example.
Sep 23, 2019
18ee975
Remove TimeSeriesImputer from test_estimator_checks.
Sep 23, 2019
7b05312
Update nuget.config to point to relative directory for ml.net packages.
Sep 24, 2019
b0fb48d
Add unit test for TimeSeriesImputer.
Sep 24, 2019
f62c17a
Use environmental variable to specify the local ml.net nuget package …
Sep 24, 2019
a4cf299
Update to the latest version of ml.net.
Sep 24, 2019
db36336
Add latest version of nuget packages for building.
Sep 24, 2019
9ef9baf
Merge pull request #8 from pieths/aml
ganik Sep 24, 2019
913e785
Merge branch 'master' into ganik/aml
ganik Sep 24, 2019
a697082
Update to the latest windows ml.net binaries.
Sep 24, 2019
5c8fbc4
Add linux ml.net binaries.
Sep 24, 2019
e3b5473
Merge pull request #9 from pieths/aml
ganik Sep 24, 2019
13cb163
adding correct nuget packages/location
michaelgsharp Sep 24, 2019
5b2fc0c
adding correct ML.NET signed packages
michaelgsharp Sep 24, 2019
765164f
adding correct ML.NET signed packages
michaelgsharp Sep 24, 2019
b9da669
Merge pull request #10 from michaelgsharp/ganik/aml
ganik Sep 24, 2019
3d5a973
Merge branch 'master' into ganik/aml
ganik Sep 24, 2019
6b24db6
Merge branch 'master' into ganik/aml
ganik Sep 26, 2019
afcdda1
Update the referenced ML.Net versions.
Oct 4, 2019
28c8e16
Update to the latest version of the manifest.
Oct 4, 2019
7910a8e
Add RobustScaler to the public API.
Oct 4, 2019
678ce5b
Fix spacing bug in RobustScalar in manifest.json.
Oct 4, 2019
a79ba3e
Merge branch 'upstream-master' into aml
Oct 4, 2019
b3e61be
Merge pull request #13 from pieths/aml
ganik Oct 4, 2019
3c08c41
Merge branch 'master' into ganik/aml
ganik Oct 4, 2019
cbae53a
Update to the latest version of manifest.json which contains naming f…
Oct 7, 2019
76b00c3
Merge pull request #14 from pieths/aml
ganik Oct 7, 2019
3cc1c75
Update to latest unsigned nuget packages for testing RobustScaler and…
Oct 8, 2019
907f6b5
Add RobustScaler unit tests and examples.
Oct 8, 2019
f0a3c95
Merge branch 'upstream-master' into aml
Oct 8, 2019
ebb1c7f
Merge pull request #15 from pieths/aml
ganik Oct 8, 2019
f8d1d9e
Update to the latest signed ML.Net nugets.
Oct 8, 2019
5e839d3
Merge pull request #16 from pieths/aml
ganik Oct 8, 2019
aeb8e6e
Merge branch 'master' into ganik/aml
ganik Oct 9, 2019
0952d52
Fix RobustScaler checks in test_estimator_checks.
Oct 9, 2019
90ff473
Merge pull request #17 from pieths/aml
ganik Oct 9, 2019
31416f3
Merge branch 'master' into ganik/aml
ganik Oct 9, 2019
932ae12
up version
ganik Oct 9, 2019
15eddb4
Turn off shuffling for FactorizationMachineBinaryClassifier. (#316)
pieths Oct 9, 2019
d9a194c
Initial implementation of NGramExtractor. (#320)
pieths Oct 10, 2019
96a19d7
Disable check which prevents artifacts from being generated by pull r…
pieths Oct 15, 2019
a4fc413
Update ManifestGenerator. (#329)
pieths Oct 15, 2019
a405b8c
Added "# -- coding: utf-8 --" to preserve the character `␂` while gua…
mstfbl Oct 15, 2019
46a14e6
Added utf-8 encoding to NGramExtractor.py (#339)
mstfbl Oct 16, 2019
fd40a4a
Image.py and Image_df.py extended testing examples are now supported …
mstfbl Oct 17, 2019
c286df1
Add third party notices to package description on PyPI (#341)
najeeb-kazmi Oct 18, 2019
310af48
Add 1.5 (#344)
najeeb-kazmi Oct 18, 2019
0ca53df
Add info to README.md (#342)
najeeb-kazmi Oct 18, 2019
64383c1
Fix DbgWinPy2.7 build which was failing when building NativeBridge. (…
pieths Oct 20, 2019
59c16d4
Fix error level when exiting build.cmd. (#345)
pieths Oct 22, 2019
b6f1a88
Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and …
mstfbl Oct 23, 2019
ea0ab8a
Renamed URL scripts and reflected this change inside these files (#348)
mstfbl Oct 24, 2019
9572489
Merge branch 'upstream-master' into aml
Oct 28, 2019
6410f26
Merge pull request #18 from pieths/aml
ganik Oct 28, 2019
8120d0e
Execute unit tests in parallel (#331)
pieths Oct 28, 2019
c387908
Decrease the size of the images in the Image and Image_df examples. (…
pieths Oct 29, 2019
ce8217b
Update package references to work with the latest versions from nuget…
pieths Nov 7, 2019
331551b
Update release notes. (#354)
pieths Nov 7, 2019
fb292b3
Added Google.Protobuf.dll to Mac and Linux builds (#358)
mstfbl Nov 12, 2019
04a2082
Modifications to support scripted temp/docs merging. (#361)
pieths Nov 13, 2019
28b68c2
Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/de…
mstfbl Nov 13, 2019
be0ab53
Initial implementation of the temp_docs_updater script. (#363)
pieths Nov 14, 2019
28dcc8b
Update README.md
ganik Nov 15, 2019
5b97afe
Generate PrefixColumnConcatenator with entry point compiler instead o…
pieths Nov 19, 2019
4abeb47
Fix broken docs (#369)
najeeb-kazmi Nov 21, 2019
56bbda6
Removed all references to DSSM in NimbusML (except for in test_wordem…
mstfbl Nov 25, 2019
452dfb2
Added catch for predictors that do not support summary() (#375)
mstfbl Dec 2, 2019
0b9889e
Update Microsoft.DataPrep to the latest version. (#379)
pieths Dec 3, 2019
a6d4e2e
Create release notes for the 1.6.0 release. (#382)
pieths Dec 4, 2019
7587a8f
Bump version to 1.6.1 to fix dprep issue. (#385)
pieths Dec 6, 2019
ae4f4de
Removed "TODO: Replace with CV" comments (#389)
mstfbl Dec 19, 2019
e6decdd
Disabled tests that only fail on Mac Py2.7 due to string encoding/dec…
mstfbl Dec 24, 2019
e8b92f0
Add as_csr documentation to the inline docstrings for transform() and…
pieths Dec 24, 2019
962113e
Update to the latest version of ML.Net.
Dec 27, 2019
55ab4eb
Whitespace change to start a new CI run to see if the mac build is wo…
Dec 30, 2019
0bacf0e
Update to the latest version of ML.Net. (#401)
pieths Dec 30, 2019
6269b27
Typo fixed on paragraph 15 (#399)
kant Dec 30, 2019
9387651
Typo fixed on paragraph 10 (#398)
kant Dec 30, 2019
c883590
Initial implementation of DateTimeSplitter. Ported from the aml branch.
Dec 30, 2019
ee41bad
Merge branch 'master' into datetime-featurizer
Dec 30, 2019
5f1a6f9
Update the transform output formats documentation. (#395)
pieths Jan 2, 2020
4d66882
Fixed broken brew command (#402)
mstfbl Jan 4, 2020
77c36c9
Merge branch 'master' into datetime-featurizer
ganik Jan 4, 2020
b57cc25
Re-activated NGramFeaturizer2.py (#381)
mstfbl Jan 5, 2020
5f5b464
Merge branch 'master' into datetime-featurizer
Jan 6, 2020
92d47f6
Merge branch 'datetime-featurizer' of https://github.com/pieths/Nimbu…
Jan 6, 2020
cf16a1e
Fix missing import in test_datetimesplitter.
Jan 6, 2020
c135be2
Fix issue with ColumnSelector when dropping columns after DateTimeSpl…
Jan 6, 2020
7ada90b
Contributing: Fix a typo (#406)
MaherJendoubi Jan 8, 2020
ec36b19
Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (…
pieths Jan 8, 2020
d5c7c82
Fix issue when specifying split_start='after_transforms' with CV.fit(…
pieths Jan 14, 2020
284fcd7
Use latest ML.Net dev packages from MachineLearning feed.
Jan 17, 2020
ad00b70
Re-enable the default nuget.org feed. It does not appear to cause
Jan 17, 2020
258a799
Add whitespace change to restart CI build. Linux timed out.
Jan 21, 2020
c542c1d
Fix build issue when using pip version >= 20.0.0
Jan 21, 2020
8bee51b
Fix build issue caused by latest version of pip (>=20.0.0) (#412)
pieths Jan 21, 2020
4c5bac1
Merge branch 'master' into nuget_update
Jan 21, 2020
5e8da52
Merge branch 'ganik-aml' into aml
Jan 23, 2020
73496e2
Merge branch 'datetime-featurizer' into aml
Jan 23, 2020
5411a93
Merge branch 'nightly' into aml
Jan 23, 2020
29d7188
Remove local-nuget-packages, fix build and test_estimator_checks fail…
Jan 23, 2020
d30cc47
Remove DateTimeSplitter duplicates in nimbusml.pyproj
Jan 23, 2020
c84412b
Remove duplicate ML.Featurizers import.
Jan 23, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,13 @@

`nimbusml` is a Python module that provides Python bindings for [ML.NET](https://github.com/dotnet/machinelearning).

ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel and others. `nimbusml` was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.
ML.NET was originally developed in Microsoft Research and is used across many product groups in Microsoft like Windows, Bing, PowerPoint, Excel, and others. `nimbusml` was built to enable data science teams that are more familiar with Python to take advantage of ML.NET's functionality and performance.

This package enables training ML.NET pipelines or integrating ML.NET components directly into [scikit-learn](https://scikit-learn.org/stable/) pipelines (it supports `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs).
`nimbusml` enables training ML.NET pipelines or integrating ML.NET components directly into [scikit-learn](https://scikit-learn.org/stable/) pipelines. It adheres to existing `scikit-learn` conventions, allowing simple interoperability between `nimbusml` and `scikit-learn` components, while adding a suite of fast, highly optimized, and scalable algorithms, transforms, and components written in C++ and C\#.

See examples below showing interoperability with `scikit-learn`. A more detailed example in the [documentation](https://docs.microsoft.com/en-us/nimbusml/tutorials/b_c-sentiment-analysis-3-combining-nimbusml-and-scikit-learn) shows how to use a `nimbusml` component in a `scikit-learn` pipeline, and create a pipeline using only `nimbusml` components.

`nimbusml` supports `numpy.ndarray`, `scipy.sparse_cst`, and `pandas.DataFrame` as inputs. In addition, `nimbusml` also supports streaming from files without loading the dataset into memory with `FileDataStream`, which allows training on data significantly exceeding memory.

Documentation can be found [here](https://docs.microsoft.com/en-us/NimbusML/overview) and additional notebook samples can be found [here](https://github.com/Microsoft/NimbusML-Samples).

Expand Down Expand Up @@ -84,7 +88,7 @@ To build `nimbusml` from source please visit our [developer guide](docs/develope

## Contributing

The contributions guide can be found [here](CONTRIBUTING.md). Given the experimental nature of this project, support will be provided on a best-effort basis. We suggest opening an issue for discussion before starting a PR with big changes.
The contributions guide can be found [here](CONTRIBUTING.md).

## Support

Expand Down
91 changes: 77 additions & 14 deletions build.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ set RunExtendedTests=False
set BuildDotNetBridgeOnly=False
set SkipDotNetBridge=False
set AzureBuild=False
set BuildManifestGenerator=False
set UpdateManifest=False
set VerifyManifest=False

:Arg_Loop
if [%1] == [] goto :Build
Expand Down Expand Up @@ -53,6 +56,10 @@ if /i [%1] == [--skipDotNetBridge] (
set SkipDotNetBridge=True
shift && goto :Arg_Loop
)
if /i [%1] == [--updateManifest] (
set UpdateManifest=True
shift && goto :Arg_Loop
)
if /i [%1] == [--azureBuild] (
set AzureBuild=True
shift && goto :Arg_Loop
Expand All @@ -68,6 +75,7 @@ echo " --installPythonPackages Install python packages after build"
echo " --includeExtendedTests Include the extended tests if the tests are run"
echo " --buildDotNetBridgeOnly Build only DotNetBridge"
echo " --skipDotNetBridge Build everything except DotNetBridge"
echo " --updateManifest Update manifest.json"
echo " --azureBuild Building in azure devops (adds dotnet CLI to the path)"
goto :Exit_Success

Expand Down Expand Up @@ -173,8 +181,6 @@ if "%AzureBuild%" == "True" (
echo ##vso[task.prependpath]%_dotnetRoot%
)

set LOCAL_NUGET_PACKAGES_DIR=.\local-nuget-packages

:: Build managed code
echo ""
echo "#################################"
Expand All @@ -191,6 +197,37 @@ if "%BuildDotNetBridgeOnly%" == "True" (
call "%_dotnet%" build -c %Configuration% --force "%__currentScriptDir%src\Platforms\build.csproj"
call "%_dotnet%" publish "%__currentScriptDir%src\Platforms\build.csproj" --force --self-contained -r win-x64 -c %Configuration%


if "%Configuration:~-5%" == "Py3.7" set VerifyManifest=True
if "%VerifyManifest%" == "True" set BuildManifestGenerator=True
if "%UpdateManifest%" == "True" set BuildManifestGenerator=True

if "%BuildManifestGenerator%" == "True" (
echo ""
echo "#################################"
echo "Building Manifest Generator... "
echo "#################################"
call "%_dotnet%" build -c %Configuration% -o "%BuildOutputDir%%Configuration%" --force "%__currentScriptDir%src\ManifestGenerator\ManifestGenerator.csproj"
)

if "%UpdateManifest%" == "True" (
echo Updating manifest.json ...
call "%_dotnet%" "%BuildOutputDir%%Configuration%\ManifestGenerator.dll" create %__currentScriptDir%\src\python\tools\manifest.json
echo manifest.json updated.
echo Run entrypoint_compiler.py --generate_api --generate_entrypoints to generate entry points and api files.
goto :Exit_Success
)

if "%VerifyManifest%" == "True" (
echo Verifying manifest.json ...
call "%_dotnet%" "%BuildOutputDir%%Configuration%\ManifestGenerator.dll" verify %__currentScriptDir%\src\python\tools\manifest.json
if errorlevel 1 (
echo manifest.json is invalid.
echo Run build --updateManifest to update manifest.json.
goto :Exit_Error
)
)

echo ""
echo "#################################"
echo "Downloading Dependencies "
Expand Down Expand Up @@ -352,13 +389,13 @@ if "%InstallPythonPackages%" == "True" (
echo "#################################"
echo "Installing python packages ... "
echo "#################################"
call "%PythonExe%" -m pip install --upgrade pip
call "%PythonExe%" -m pip install --upgrade nose pytest graphviz imageio pytest-cov "jupyter_client>=4.4.0" "nbconvert>=4.2.0"
call "%PythonExe%" -m pip install --upgrade "pip==19.3.1"
call "%PythonExe%" -m pip install --upgrade nose pytest pytest-xdist graphviz imageio pytest-cov "jupyter_client>=4.4.0" "nbconvert>=4.2.0"

if %PythonVersion% == 2.7 (
call "%PythonExe%" -m pip install --upgrade pyzmq
) else (
call "%PythonExe%" -m pip install --upgrade "azureml-dataprep>=1.1.12"
call "%PythonExe%" -m pip install --upgrade "azureml-dataprep>=1.1.33"
)

call "%PythonExe%" -m pip install --upgrade "%__currentScriptDir%target\%WheelFile%"
Expand All @@ -379,27 +416,53 @@ set TestsPath1=%PackagePath%\tests
set TestsPath2=%__currentScriptDir%src\python\tests
set TestsPath3=%__currentScriptDir%src\python\tests_extended
set ReportPath=%__currentScriptDir%build\TestCoverageReport
call "%PythonExe%" -m pytest --verbose --maxfail=1000 --capture=sys "%TestsPath1%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
if errorlevel 1 (
goto :Exit_Error
)
call "%PythonExe%" -m pytest --verbose --maxfail=1000 --capture=sys "%TestsPath2%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
set NumConcurrentTests=%NUMBER_OF_PROCESSORS%

call "%PythonExe%" -m pytest -n %NumConcurrentTests% --verbose --maxfail=1000 --capture=sys "%TestsPath2%" "%TestsPath1%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
if errorlevel 1 (
goto :Exit_Error
:: Rerun any failed tests to give them one more
:: chance in case the errors were intermittent.
call "%PythonExe%" -m pytest -n %NumConcurrentTests% --last-failed --verbose --maxfail=1000 --capture=sys "%TestsPath2%" "%TestsPath1%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
if errorlevel 1 (
goto :Exit_Error
)
)

if "%RunExtendedTests%" == "True" (
call "%PythonExe%" -m pytest --verbose --maxfail=1000 --capture=sys "%TestsPath3%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
call "%PythonExe%" -m pytest -n %NumConcurrentTests% --verbose --maxfail=1000 --capture=sys "%TestsPath3%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
if errorlevel 1 (
goto :Exit_Error
:: Rerun any failed tests to give them one more
:: chance in case the errors were intermittent.
call "%PythonExe%" -m pytest -n %NumConcurrentTests% --last-failed --verbose --maxfail=1000 --capture=sys "%TestsPath3%" --cov="%PackagePath%" --cov-report term-missing --cov-report html:"%ReportPath%"
if errorlevel 1 (
goto :Exit_Error
)
)
)

:Exit_Success
call :CleanUpDotnet
endlocal
exit /b %ERRORLEVEL%

:Exit_Error
call :CleanUpDotnet
endlocal
echo Failed with error %ERRORLEVEL%
exit /b %ERRORLEVEL%
exit /b %ERRORLEVEL%

:CleanUpDotnet
:: Save the error level so it can be
:: restored when exiting the function
set PrevErrorLevel=%ERRORLEVEL%

:: Shutdown all dotnet persistent servers so that the
:: dotnet executable is not left open in the background.
:: As of dotnet 2.1.3 three servers are left running in
:: the background. This will shutdown them all down.
:: See here for more info: https://github.com/dotnet/cli/issues/9458
:: This fixes an issue when re-running the build script because
:: the build script was trying to replace the existing dotnet
:: binaries which were sometimes still in use.
call "%_dotnet%" build-server shutdown
exit /b %PrevErrorLevel%
27 changes: 19 additions & 8 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,6 @@ then
echo "Installing dotnet SDK ... "
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin -Version 2.1.701 -InstallDir ./cli

export LOCAL_NUGET_PACKAGES_DIR=./local-nuget-packages

# Build managed code
echo "Building managed code ... "
_dotnet="${__currentScriptDir}/cli/dotnet"
Expand Down Expand Up @@ -284,7 +282,7 @@ then
exit 1
fi
# Review: Adding "--upgrade" to pip install will cause problems when using Anaconda as the python distro because of Anaconda's quirks with pytest.
"${PythonExe}" -m pip install nose "pytest>=4.4.0" graphviz "pytest-cov>=2.6.1" "jupyter_client>=4.4.0" "nbconvert>=4.2.0"
"${PythonExe}" -m pip install nose "pytest>=4.4.0" pytest-xdist graphviz "pytest-cov>=2.6.1" "jupyter_client>=4.4.0" "nbconvert>=4.2.0"
if [ ${PythonVersion} = 2.7 ]
then
"${PythonExe}" -m pip install --upgrade pyzmq
Expand All @@ -294,7 +292,7 @@ then
"${PythonExe}" -m pip install --upgrade pytest-remotedata
fi

"${PythonExe}" -m pip install --upgrade "azureml-dataprep>=1.1.12"
"${PythonExe}" -m pip install --upgrade "azureml-dataprep>=1.1.33"
fi
"${PythonExe}" -m pip install --upgrade "${Wheel}"
"${PythonExe}" -m pip install "scikit-learn==0.19.2"
Expand All @@ -311,12 +309,25 @@ then
TestsPath2=${__currentScriptDir}/src/python/tests
TestsPath3=${__currentScriptDir}/src/python/tests_extended
ReportPath=${__currentScriptDir}/build/TestCoverageReport
"${PythonExe}" -m pytest --verbose --maxfail=1000 --capture=sys "${TestsPath1}"
"${PythonExe}" -m pytest --verbose --maxfail=1000 --capture=sys "${TestsPath2}"
"${PythonExe}" -m pytest -n 4 --verbose --maxfail=1000 --capture=sys "${TestsPath2}" "${TestsPath1}" || \
"${PythonExe}" -m pytest -n 4 --last-failed --verbose --maxfail=1000 --capture=sys "${TestsPath2}" "${TestsPath1}"

if [ ${__runExtendedTests} = true ]
then
"${PythonExe}" -m pytest --verbose --maxfail=1000 --capture=sys "${TestsPath3}"
then
echo "Running extended tests ... "
if [ ! "$(uname -s)" = "Darwin" ]
then
# Required for Image.py and Image_df.py to run successfully on Ubuntu.
{
apt-get update
apt-get install libc6-dev -y
apt-get install libgdiplus -y
} || {
# Required for Image.py and Image_df.py to run successfully on CentOS.
yum install glibc-devel -y
}
fi
"${PythonExe}" -m pytest -n 4 --verbose --maxfail=1000 --capture=sys "${TestsPath3}"
fi
fi

Expand Down
4 changes: 2 additions & 2 deletions build/ci/phase-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ phases:
- script: $(_buildScript) --configuration $(_configuration) --runTests $(_testOptions)
# Mac phases
- ${{ if eq(parameters.name, 'Mac') }}:
- script: brew update && brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f5b1ac99a7fba27c19cee0bc4f036775c889b359/Formula/libomp.rb mono-libgdiplus gettext && brew link gettext --force
# Note: Manual defining of the libomp URL below is needed to avoid error at runtime. Installing using 'brew install libomp' results in "Intel MKL FATAL ERROR: Cannot load libmkl_intel_thread.dylib."
- script: brew update && brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f5b1ac99a7fba27c19cee0bc4f036775c889b359/Formula/libomp.rb gettext && brew link gettext --force && brew unlink python@2 && brew install mono-libgdiplus
- ${{ if eq(parameters.testDistro, 'noTests') }}:
- script: chmod 777 $(_buildScript) && $(_buildScript) --configuration $(_configuration)
- ${{ if eq(parameters.testDistro, '') }}:
Expand All @@ -50,7 +51,6 @@ phases:
# Publish build artifacts
- ${{ if or(eq(parameters.name, 'Linux_Ubuntu16'), eq(parameters.name, 'Windows'), eq(parameters.name, 'Mac')) }}:
- task: PublishBuildArtifacts@1
condition: and(always(), ne(variables['Build.Reason'], 'PullRequest'))
displayName: Publish wheel file to VSTS artifacts
inputs:
pathToPublish: $(Build.SourcesDirectory)/target
Expand Down
1 change: 1 addition & 0 deletions build/libs_linux.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Google.Protobuf.dll
Newtonsoft.Json.dll
libCpuMathNative.so
libFastTreeNative.so
Expand Down
1 change: 1 addition & 0 deletions build/libs_mac.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
Google.Protobuf.dll
Newtonsoft.Json.dll
libCpuMathNative.dylib
libFastTreeNative.dylib
Expand Down
4 changes: 2 additions & 2 deletions docs/developers/linux-build.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ Building NimbusML from source on Linux
## Build
Run `./build.sh`

This downloads dependencies (.NET SDK, specific versions of Python and Boost), builds native code and managed code, and packages NimbusML into a pip-installable wheel. This produces debug binaries by default, and release versions can be specified by `./build.sh --configuration RlsLinPy3.7` for examle.
This downloads dependencies (.NET SDK, specific versions of Python and Boost), builds native code and managed code, and packages NimbusML into a pip-installable wheel. This produces debug binaries by default, and release versions can be specified by `./build.sh --configuration RlsLinPy3.7` for example.

For additional options including running tests and building components independently, see `./build.sh -h`.

### Known Issues
The LightGBM estimator fails on Linux when building from source. The official NimbusML Linux wheel package on Pypi.org has a working version of LightGBM.
The LightGBM estimator fails on Linux when building from source. The official NimbusML Linux wheel package on Pypi.org has a working version of LightGBM.
2 changes: 1 addition & 1 deletion docs/developers/windows-build.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ Building NimbusML from source on Windows
## Build
Run `build.cmd`

This downloads dependencies (.NET SDK, specific versions of Python and Boost), builds native code and managed code, and packages NimbusML into a pip-installable wheel. This produces debug binaries by default, and release versions can be specified by `build.cmd --configuration RlsWinPy3.7` for examle.
This downloads dependencies (.NET SDK, specific versions of Python and Boost), builds native code and managed code, and packages NimbusML into a pip-installable wheel. This produces debug binaries by default, and release versions can be specified by `build.cmd --configuration RlsWinPy3.7` for example.

For additional options including running tests and building components independently, see `build.cmd -?`.
101 changes: 101 additions & 0 deletions docs/release-notes/release-1.5.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# [NimbusML](https://docs.microsoft.com/en-us/nimbusml/overview) 1.5.0

## **New Features**

- **Initial implementation of `csr_matrix` output support.**

[PR#250](https://github.com/microsoft/NimbusML/pull/250)
Add support for data output in `scipy.sparse.csr_matrix` format.

```python
xf = OneHotVectorizer(columns={'c0':'c0', 'c1':'c1'})
xf.fit(train_df)
result = xf.transform(train_df, as_csr=True)
```

- **Permutation Feature Importance for model interpretibility.**

[PR#279](https://github.com/microsoft/NimbusML/pull/279)
Adds `permutation_feature_importance()` method to `Pipeline` and
predictor estimators, enabling evaluation of model-wide feature
importances on any dataset with same schema as the dataset used
to fit the `Pipeline`.

```python
pipe = Pipeline([
LogisticRegressionBinaryClassifier(label='label', feature=['feature'])
])
pipe.fit(data)
pipe.permutation_feature_importance(data)
```

- **Initial implementation of DateTime input and output column support.**

[PR#290](https://github.com/microsoft/NimbusML/pull/290)
Add initial support for input and output of Pandas DateTime columns.

- **Initial implementation of LpScaler.**

[PR#253](https://github.com/microsoft/NimbusML/pull/253)
Normalize vectors (rows) individually by rescaling them to unit norm (L2, L1 or LInf).
Performs the following operation on a vector X: Y = (X - M) / D, where M is mean and D
is either L2 norm, L1 norm or LInf norm.

- **Add support for variable length vector output.**

[PR#267](https://github.com/microsoft/NimbusML/pull/267)
Support output of columns returned from ML.Net which contain variable length vectors.

- **Save `predictor_model` when pickling a `Pipeline`.**

[PR#295](https://github.com/microsoft/NimbusML/pull/295)

- **Initial implementation of the WordTokenizer transform.**

[PR#296](https://github.com/microsoft/NimbusML/pull/296)

- **Add support for summary output from tree based predictors.**

[PR#298](https://github.com/microsoft/NimbusML/pull/298)

## **Bug Fixes**

- **Fixed `Pipeline.transform()` in transform only `Pipeline` fails if y column is provided **

[PR#294](https://github.com/microsoft/NimbusML/pull/294)
Enable calling `.transform()` on a `Pipeline` containing only transforms when the y column is provided

- **Fix issue when using `predict_proba` or `decision_function` with combined models.**

[PR#272](https://github.com/microsoft/NimbusML/pull/272)

- **Fix `Pipeline._extract_classes_from_headers` was not checking for valid steps.**

[PR#292](https://github.com/microsoft/NimbusML/pull/292)

- **Fix BinaryDataStream was not valid as input for transformer.**

[PR#307](https://github.com/microsoft/NimbusML/pull/307)

- **Fix casing for the installPythonPackages build.sh argument.**

[PR#256](https://github.com/microsoft/NimbusML/pull/256)

## **Breaking Changes**

- **Removed `y` parameter from `Pipeline.transform()`**

[PR#294](https://github.com/microsoft/NimbusML/pull/294)
Removed `y` parameter from `Pipeline.transform()` as it is not needed nor used for transforming data with a fitted `Pipeline`.

## **Enhancements**

None.

## **Documentation and Samples**

None.

## **Remarks**

None.
Loading