Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Native featurizers for AutoML (#317)
Browse files Browse the repository at this point in the history
* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers

* add tests

* prelim commit

* update manifest, fix unit tests/examples

* upgrade version

* fix tests

* temp hack fix for native libs

* copy libFeaturizers.so

* fix version

* fix cp

* fix version

* Update ML.Net version number.

* Update the examples and unit tests.

* Update to latest version of the Featurizers library.

* Fix test_tostring unit test.

* Temporarily skip the estimator checks unit tests.

* Upgrade pip to the latest version when installing the Python
packages on Windows. This fixes an issue I had where scikit-learn
would not install when building NimbusML with the RlsWinPy3.6
configuration because it could not find one of the test data sets.

* Update test_estimator_checks for the three new transformers.

* Remove extra comma from test_estimator_checks.

* Update the ML.Net version.

* Add TimeSeriesImputer

* Add country param to DateTimeSplitter

* Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn.

* Update ML.Net version and import new AutoMLFeaturizers package.

* Add back in the accidentally removed tests from test_data_with_missing.py.

* Update the DateTimeSplitter examples.

* Update the ToKeyImputer examples.

* Update the ToString examples.

* Update build to support latest nuget packages and updates.

* Remove copy of libFeaturizers from linux build script.

* Add TimeSeriesImputer to the NimbusML project.

* Add initial DataFrame based example for TimeSeriesImputer.

* Update to the latest version of manifest.json.

* Add missing project include for the TimeSeriesImputer example.

* Update the DateTimeSplitter examples.

* Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform.

* Add a unit test for testing the holiday name return value for DateTimeSplitter.

* Add unit test for ToKeyImputer.

* Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer.

* Update TimeSeriesImputer_df example.

* Remove TimeSeriesImputer from test_estimator_checks.

* Update nuget.config to point to relative directory for ml.net packages.

* Add unit test for TimeSeriesImputer.

* Use environmental variable to specify the local ml.net nuget package directory.

* Update to the latest version of ml.net.

* Add latest version of nuget packages for building.

* Update to the latest windows ml.net binaries.

* Add linux ml.net binaries.

* adding correct nuget packages/location

* adding correct ML.NET signed packages

* adding correct ML.NET signed packages

* Update the referenced ML.Net versions.

* Update to the latest version of the manifest.

* Add RobustScaler to the public API.

* Fix spacing bug in RobustScalar in manifest.json.

* Update to the latest version of manifest.json which contains naming fix for RobustScaler.

* Update to latest unsigned nuget packages for testing RobustScaler and latest master features.

* Add RobustScaler unit tests and examples.

* Update to the latest signed ML.Net nugets.

* Fix RobustScaler checks in test_estimator_checks.

* up version
  • Loading branch information
ganik authored Oct 9, 2019
1 parent f6711ad commit c64936d
Show file tree
Hide file tree
Showing 95 changed files with 2,380 additions and 40 deletions.
4 changes: 4 additions & 0 deletions build.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,8 @@ if "%AzureBuild%" == "True" (
echo ##vso[task.prependpath]%_dotnetRoot%
)

set LOCAL_NUGET_PACKAGES_DIR=.\local-nuget-packages

:: Build managed code
echo ""
echo "#################################"
Expand Down Expand Up @@ -311,6 +313,7 @@ copy "%BuildOutputDir%%Configuration%\pybridge.pyd" "%__currentScriptDir%src\py

if %PythonVersion% == 2.7 (
copy "%BuildOutputDir%%Configuration%\Platform\win-x64\publish\*.dll" "%__currentScriptDir%src\python\nimbusml\internal\libs\"
xcopy /S /E /I "%BuildOutputDir%%Configuration%\Platform\win-x64\publish\Data" "%__currentScriptDir%src\python\nimbusml\internal\libs\Data"
:: remove dataprep dlls as its not supported in python 2.7
del "%__currentScriptDir%src\python\nimbusml\internal\libs\Microsoft.DPrep.*"
del "%__currentScriptDir%src\python\nimbusml\internal\libs\Microsoft.Data.*"
Expand All @@ -321,6 +324,7 @@ if %PythonVersion% == 2.7 (
del "%__currentScriptDir%src\python\nimbusml\internal\libs\Microsoft.Workbench.Messaging.SDK.dll"
) else (
for /F "tokens=*" %%A in (build/libs_win.txt) do copy "%BuildOutputDir%%Configuration%\Platform\win-x64\publish\%%A" "%__currentScriptDir%src\python\nimbusml\internal\libs\"
xcopy /S /E /I "%BuildOutputDir%%Configuration%\Platform\win-x64\publish\Data" "%__currentScriptDir%src\python\nimbusml\internal\libs\Data"
)

if "%DebugBuild%" == "True" (
Expand Down
4 changes: 4 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,8 @@ then
echo "Installing dotnet SDK ... "
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin -Version 2.1.701 -InstallDir ./cli

export LOCAL_NUGET_PACKAGES_DIR=./local-nuget-packages

# Build managed code
echo "Building managed code ... "
_dotnet="${__currentScriptDir}/cli/dotnet"
Expand Down Expand Up @@ -213,6 +215,7 @@ then
cp "${BuildOutputDir}/${__configuration}/Platform/${PublishDir}"/publish/System.Native.a "${__currentScriptDir}/src/python/nimbusml/internal/libs/"
cp "${BuildOutputDir}/${__configuration}/Platform/${PublishDir}"/publish/createdump "${__currentScriptDir}/src/python/nimbusml/internal/libs/" || :
cp "${BuildOutputDir}/${__configuration}/Platform/${PublishDir}"/publish/sosdocsunix.txt "${__currentScriptDir}/src/python/nimbusml/internal/libs/"
cp -r "${BuildOutputDir}/${__configuration}/Platform/${PublishDir}"/publish/Data "${__currentScriptDir}/src/python/nimbusml/internal/libs/."
ext=*.so
if [ "$(uname -s)" = "Darwin" ]
then
Expand Down Expand Up @@ -241,6 +244,7 @@ then
cat build/${libs_txt} | while read i; do
cp "${BuildOutputDir}/${__configuration}/Platform/${PublishDir}"/publish/$i "${__currentScriptDir}/src/python/nimbusml/internal/libs/"
done
cp -r "${BuildOutputDir}/${__configuration}/Platform/${PublishDir}"/publish/Data "${__currentScriptDir}/src/python/nimbusml/internal/libs/."
fi

if [[ $__configuration = Dbg* ]]
Expand Down
1 change: 1 addition & 0 deletions build/libs_linux.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Newtonsoft.Json.dll
libCpuMathNative.so
libFastTreeNative.so
libFeaturizers.so
libLdaNative.so
libMklImports.so
libMklProxyNative.so
Expand Down
1 change: 1 addition & 0 deletions build/libs_mac.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ lib_lightgbm.dylib
libtensorflow.dylib
libonnxruntime.dylib
libtensorflow_framework.1.dylib
Featurizers.dll
System.Drawing.Common.dll
TensorFlow.NET.dll
NumSharp.Core.dll
Expand Down
1 change: 1 addition & 0 deletions build/libs_win.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ libiomp5md.dll
MklImports.dll
MklProxyNative.dll
SymSgdNative.dll
Featurizers.dll
tensorflow.dll
TensorFlow.NET.dll
NumSharp.Core.dll
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
3 changes: 2 additions & 1 deletion nuget.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
</config>
<packageSources>
<add key="nuget.org" value="https://api.nuget.org/v3/index.json" />
<add key="MlNet_Daily" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" />
<!--add key="MlNet_Daily" value="https://dotnet.myget.org/F/dotnet-core/api/v3/index.json" /-->
<add key="local_packages" value="%LOCAL_NUGET_PACKAGES_DIR%" />
</packageSources>
</configuration>
3 changes: 3 additions & 0 deletions src/DotNetBridge/Bridge.cs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
using System.Runtime.InteropServices;
using System.Text;
using System.Threading;
using Microsoft.ML;
using Microsoft.ML.Featurizers;
using Microsoft.ML.Data;
using Microsoft.ML.EntryPoints;
using Microsoft.ML.Runtime;
Expand Down Expand Up @@ -300,6 +302,7 @@ private static unsafe int GenericExec(EnvironmentBlock* penv, sbyte* psz, int cd
//env.ComponentCatalog.RegisterAssembly(typeof(TimeSeriesProcessingEntryPoints).Assembly);
//env.ComponentCatalog.RegisterAssembly(typeof(ParquetLoader).Assembly);
env.ComponentCatalog.RegisterAssembly(typeof(SsaChangePointDetector).Assembly);
env.ComponentCatalog.RegisterAssembly(typeof(CategoryImputerTransformer).Assembly);
env.ComponentCatalog.RegisterAssembly(typeof(DotNetBridgeEntrypoints).Assembly);

using (var ch = host.Start("Executing"))
Expand Down
24 changes: 13 additions & 11 deletions src/DotNetBridge/DotNetBridge.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,19 @@
<PrivateAssets>all</PrivateAssets>
<IncludeAssets>runtime; build; native; contentfiles; analyzers</IncludeAssets>
</PackageReference>
<PackageReference Include="Microsoft.ML" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.CpuMath" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.EntryPoints" Version="0.16.0-preview2" />
<PackageReference Include="Microsoft.ML.Mkl.Components" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.LightGBM" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.OnnxTransformer" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.TensorFlow" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.Dnn" Version="0.16.0-preview2" />
<PackageReference Include="Microsoft.ML.Ensemble" Version="0.16.0-preview2" />
<PackageReference Include="Microsoft.ML.TimeSeries" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.CpuMath" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.EntryPoints" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Mkl.Components" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.LightGBM" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.OnnxTransformer" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.TensorFlow" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Dnn" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Ensemble" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.TimeSeries" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Featurizers" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="MicrosoftMLFeaturizers" Version="0.1.0" />
<PackageReference Include="Microsoft.DataPrep" Version="0.0.1.12-preview" />
<PackageReference Include="TensorFlow.NET" Version="0.11.3" />
<PackageReference Include="SciSharp.TensorFlow.Redist" Version="1.14.0" />
Expand Down
24 changes: 13 additions & 11 deletions src/Platforms/build.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,19 @@
</PropertyGroup>

<ItemGroup>
<PackageReference Include="Microsoft.ML" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.CpuMath" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.EntryPoints" Version="0.16.0-preview2" />
<PackageReference Include="Microsoft.ML.Mkl.Components" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.LightGBM" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.OnnxTransformer" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.TensorFlow" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML.Dnn" Version="0.16.0-preview2" />
<PackageReference Include="Microsoft.ML.Ensemble" Version="0.16.0-preview2" />
<PackageReference Include="Microsoft.ML.TimeSeries" Version="1.4.0-preview2" />
<PackageReference Include="Microsoft.ML" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.CpuMath" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.EntryPoints" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Mkl.Components" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.ImageAnalytics" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.LightGBM" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.OnnxTransformer" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.TensorFlow" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Dnn" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Ensemble" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.TimeSeries" Version="1.6.2-preview2-28208-8" />
<PackageReference Include="Microsoft.ML.Featurizers" Version="0.18.2-preview2-28208-8" />
<PackageReference Include="MicrosoftMLFeaturizers" Version="0.1.0" />
<PackageReference Include="Microsoft.DataPrep" Version="0.0.1.12-preview" />
<PackageReference Include="TensorFlow.NET" Version="0.11.3" />
<PackageReference Include="SciSharp.TensorFlow.Redist" Version="1.14.0" />
Expand Down
Loading

0 comments on commit c64936d

Please sign in to comment.