Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grab latest from master #2

Merged
merged 39 commits into from
Jul 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
de250ba
Add example for applying ONNX model to in-memory images (#3851)
wschin Jun 21, 2019
e66e19e
Fixed build errors resulting from upgrading to VS2019 compilers (#3894)
harishsk Jun 21, 2019
9cd0b8e
Checked in a better fix based on code review (#3896)
harishsk Jun 24, 2019
9d29111
Tree-based featurization (#3812)
wschin Jun 26, 2019
e7c5858
Move from phases to jobs and use bring your own cloud pool (#3908)
safern Jun 27, 2019
00225c0
Bump ONNXRuntime version (#3837)
wschin Jun 27, 2019
7aa6513
Fix #3898 (remove crefs to internal methods) (#3899)
natke Jun 27, 2019
17c9155
Reformatting ModelOperations and DataOperations samples to width 85 (…
sierralee51 Jun 28, 2019
e0c4caa
Reformatting TensorFlow and AnomalyDetection samples to width 85 (#3922)
sayanshaw24 Jun 29, 2019
3401d7c
Create forecasting prediction engine and conform time series forecast…
codemzs Jul 1, 2019
a1ede65
Fixed #3207. Added an exception to the parameterless constructor for …
harishsk Jul 1, 2019
4b0fa53
Change ensembles trainer to work with ITrainerEstimators instead of I…
yaeldMS Jul 1, 2019
3a35a82
Change default EvaluationMetric for LightGbm trainers to conform to d…
najeeb-kazmi Jul 1, 2019
0c0789f
ONNXTransform Upgrade to Enable Non-tensor Types (#3881)
wschin Jul 1, 2019
c5a18ef
LightGBM Unbalanced Data Argument [Issue #3688 Fix] (#3925)
Jul 1, 2019
f78de3e
Internalize tensorflow API and fix #3863 (#3936)
codemzs Jul 1, 2019
9a32f54
Move Time Series, TensorFlow and OnnxTransform nugets to stable proje…
codemzs Jul 1, 2019
f67aab5
Add FixZero for LogMeanVariance normalizer (#3916)
artidoro Jul 1, 2019
7f34287
Fix typo in time series forecasting API. (#3944)
codemzs Jul 1, 2019
cea7f7b
Reformatted Ranking samples to width 85 (#3930)
sayanshaw24 Jul 2, 2019
c239fc3
Rename forecasting API argument to a better name. (#3945)
codemzs Jul 2, 2019
d0b3f86
Tree based trainers implement ICanGetSummaryAsIDataView (#3892)
artidoro Jul 2, 2019
3139697
Fix assignment of a variable to itself (#3912)
flash2048 Jul 2, 2019
01b3ec7
Reformatted Recommendation samples to width 85 (#3941)
sayanshaw24 Jul 2, 2019
1c1d3a4
reformatted samples not in specific folder (#3949)
sierralee51 Jul 2, 2019
541d268
Increment build version to 1.3 for release and 0.15 for preview. (#3956)
codemzs Jul 2, 2019
0153754
Reformatting Featurization of Transform and Misc files in Transform t…
Jul 2, 2019
1288d1d
Reformatting Test, Projection and TimeSeries of Transform to Width 85…
Jul 3, 2019
882a6d9
Reformatting Conversion, FeatureSelection and Image Analytics of Tran…
Jul 3, 2019
4fecfb2
Reformatting MulticlassClassification samples to width 85 (#3942)
sierralee51 Jul 3, 2019
ce9b38b
Reformatting BinaryClassification samples to width 85 (#3946)
sierralee51 Jul 3, 2019
2da72cb
Reformatted Regression samples to width 85 (#3948)
sayanshaw24 Jul 3, 2019
79813b9
Release notes for 1.2.0 release. (#3951)
codemzs Jul 3, 2019
78bfecb
Updated the redistributed version of Tensorflow to 1.14 (#3929)
harishsk Jul 8, 2019
e5d0546
Increment version for application compatibility. (#3957)
codemzs Jul 9, 2019
c3bdaaa
TF package size fix (#3983)
harishsk Jul 10, 2019
fb80f72
Early Draft specs doc for DatabaseLoader in ML.NET (#3857)
CESARDELATORRE Jul 17, 2019
fed45bb
DatabaseLoader specs: Update on NuGet and Class library design (#4021)
CESARDELATORRE Jul 18, 2019
b31b7ca
Minor typo fix in regularization documentation (#4012)
SnakyBeaky Jul 24, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
37 changes: 19 additions & 18 deletions .vsts-dotnet-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@ resources:
- container: UbuntuContainer
image: mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-16.04-mlnet-207e097-20190312152303

phases:
- template: /build/ci/phase-template.yml
jobs:
- template: /build/ci/job-template.yml
parameters:
name: Centos_x64_NetCoreApp30
buildScript: ./build.sh
container: CentosContainer
customMatrixes:
Debug_Build:
_configuration: Debug-Intrinsics
Expand All @@ -24,26 +25,25 @@ phases:
_configuration: Release-Intrinsics
_config_short: RI
_includeBenchmarkData: true
queue:
pool:
name: Hosted Ubuntu 1604
container: CentosContainer

- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: Ubuntu_x64_NetCoreApp21
buildScript: ./build.sh
queue:
container: UbuntuContainer
pool:
name: Hosted Ubuntu 1604
container: UbuntuContainer

- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: MacOS_x64_NetCoreApp21
buildScript: ./build.sh
queue:
pool:
name: Hosted macOS

- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: Windows_x64_NetCoreApp30
buildScript: build.cmd
Expand All @@ -56,17 +56,18 @@ phases:
_configuration: Release-Intrinsics
_config_short: RI
_includeBenchmarkData: true
queue:
name: Hosted VS2017
pool:
name: NetCorePublic-Pool
queue: buildpool.windows.10.amd64.vs2017.open

- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: Windows_x64_NetCoreApp21
buildScript: build.cmd
queue:
pool:
name: Hosted VS2017

- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: Windows_x64_NetFx461
buildScript: build.cmd
Expand All @@ -79,13 +80,13 @@ phases:
_configuration: Release-netfx
_config_short: RFX
_includeBenchmarkData: false
queue:
pool:
name: Hosted VS2017

- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: Windows_x86_NetCoreApp21
architecture: x86
buildScript: build.cmd
queue:
pool:
name: Hosted VS2017
22 changes: 21 additions & 1 deletion Directory.Build.targets
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@
</NativeAssemblyReference>
</ItemGroup>

<ItemGroup>
<NativeAssemblyReferenceWithMajorVersion>
<!-- Tensorflow has a different naming scheme for v1.14.0. Those binaries need to be copied along with the standard names -->
<AssemblyPathWithMajorVersion Condition="'$(OS)' != 'Windows_NT'">$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReferenceWithMajorVersion.Identity)$(LibExtension).%(NativeAssemblyReferenceWithMajorVersion.MajorVersion)</AssemblyPathWithMajorVersion>
<AssemblyPathWithMajorVersion Condition="$([MSBuild]::IsOSPlatform('osx'))">$(NativeOutputPath)$(LibPrefix)%(NativeAssemblyReferenceWithMajorVersion.Identity).%(NativeAssemblyReferenceWithMajorVersion.MajorVersion)$(LibExtension)</AssemblyPathWithMajorVersion>
</NativeAssemblyReferenceWithMajorVersion>
</ItemGroup>

<Copy SourceFiles = "@(NativeAssemblyReference->'%(FullAssemblyPath)')"
DestinationFolder="$(OutputPath)"
OverwriteReadOnlyFiles="$(OverwriteReadOnlyFiles)"
Expand All @@ -30,7 +38,19 @@
UseSymboliclinksIfPossible="$(CreateSymbolicLinksForPublishFilesIfPossible)">
<Output TaskParameter="DestinationFiles" ItemName="FileWrites"/>
</Copy>


<!-- Optionally copy the native binaries that have a version number attended (Only tensorflow right now) -->
<Copy Condition="'$(OS)' != 'Windows_NT'"
SourceFiles = "@(NativeAssemblyReferenceWithMajorVersion->'%(AssemblyPathWithMajorVersion)')"
DestinationFolder="$(OutputPath)"
OverwriteReadOnlyFiles="$(OverwriteReadOnlyFiles)"
Retries="$(CopyRetryCount)"
RetryDelayMilliseconds="$(CopyRetryDelayMilliseconds)"
UseHardlinksIfPossible="$(CreateHardLinksForPublishFilesIfPossible)"
UseSymboliclinksIfPossible="$(CreateSymbolicLinksForPublishFilesIfPossible)">
<Output TaskParameter="DestinationFiles" ItemName="FileWrites"/>
</Copy>

</Target>

<Import Project="$(ToolsDir)/versioning.targets" Condition="Exists('$(ToolsDir)/versioning.targets')" />
Expand Down
7 changes: 5 additions & 2 deletions build/BranchInfo.props
Original file line number Diff line number Diff line change
Expand Up @@ -15,20 +15,23 @@
Microsoft.ML.LightGbm;
Microsoft.ML.Mkl.Components;
Microsoft.ML.Mkl.Redist;
Microsoft.ML.TimeSeries;
Microsoft.ML.TensorFlow;
Microsoft.ML.OnnxTransformer;
</StableProjects>
<IsStableProject Condition="'$(MSBuildProjectName.Contains(.symbols))' == 'false'">$(StableProjects.Contains($(MSBuildProjectName)))</IsStableProject>
<IsStableProject Condition="'$(MSBuildProjectName.Contains(.symbols))' == 'true'">$(StableProjects.Contains($(MSBuildProjectName.Substring(0, $(MSBuildProjectName.IndexOf(.symbols))))))</IsStableProject>
<IsStableProject Condition="'$(UseStableVersionForNativeAssets)' == 'true'">true</IsStableProject>
</PropertyGroup>
<PropertyGroup Condition="'$(IsStableProject)' == 'true'">
<MajorVersion>1</MajorVersion>
<MinorVersion>2</MinorVersion>
<MinorVersion>3</MinorVersion>
<PatchVersion>0</PatchVersion>
<PreReleaseLabel>preview</PreReleaseLabel>
</PropertyGroup>
<PropertyGroup Condition="'$(IsStableProject)' != 'true'">
<MajorVersion>0</MajorVersion>
<MinorVersion>14</MinorVersion>
<MinorVersion>15</MinorVersion>
<PatchVersion>0</PatchVersion>
<PreReleaseLabel>preview</PreReleaseLabel>
</PropertyGroup>
Expand Down
9 changes: 5 additions & 4 deletions build/Dependencies.props
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,14 @@
<GoogleProtobufPackageVersion>3.5.1</GoogleProtobufPackageVersion>
<LightGBMPackageVersion>2.2.3</LightGBMPackageVersion>
<MicrosoftExtensionsPackageVersion>2.1.0</MicrosoftExtensionsPackageVersion>
<MicrosoftMLOnnxRuntimePackageVersion>0.3.0</MicrosoftMLOnnxRuntimePackageVersion>
<MicrosoftMLOnnxRuntimePackageVersion>0.4.0</MicrosoftMLOnnxRuntimePackageVersion>
<MlNetMklDepsPackageVersion>0.0.0.9</MlNetMklDepsPackageVersion>
<ParquetDotNetPackageVersion>2.1.3</ParquetDotNetPackageVersion>
<SystemDrawingCommonPackageVersion>4.5.0</SystemDrawingCommonPackageVersion>
<SystemIOFileSystemAccessControl>4.5.0</SystemIOFileSystemAccessControl>
<SystemSecurityPrincipalWindows>4.5.0</SystemSecurityPrincipalWindows>
<TensorFlowVersion>1.13.1</TensorFlowVersion>
<TensorFlowVersion>1.14.0</TensorFlowVersion>
<TensorFlowMajorVersion>1</TensorFlowMajorVersion>
</PropertyGroup>

<!-- Code Analyzer Dependencies -->
Expand All @@ -44,9 +45,9 @@
<PropertyGroup>
<BenchmarkDotNetVersion>0.11.3</BenchmarkDotNetVersion>
<MicrosoftCodeAnalysisTestingVersion>1.0.0-beta1-63812-02</MicrosoftCodeAnalysisTestingVersion>
<MicrosoftMLTestModelsPackageVersion>0.0.4-test</MicrosoftMLTestModelsPackageVersion>
<MicrosoftMLTestModelsPackageVersion>0.0.5-test</MicrosoftMLTestModelsPackageVersion>
<MicrosoftMLTensorFlowTestModelsVersion>0.0.11-test</MicrosoftMLTensorFlowTestModelsVersion>
<MicrosoftMLOnnxTestModelsVersion>0.0.4-test</MicrosoftMLOnnxTestModelsVersion>
<MicrosoftMLOnnxTestModelsVersion>0.0.5-test</MicrosoftMLOnnxTestModelsVersion>
</PropertyGroup>

</Project>
43 changes: 21 additions & 22 deletions build/ci/phase-template.yml → build/ci/job-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,18 @@ parameters:
name: ''
architecture: x64
buildScript: ''
queue: {}
pool: {}
customMatrixes: ''
codeCoverage: false
container: ''

phases:
- phase: ${{ parameters.name }}
variables:
_buildScript: ${{ parameters.buildScript }}
_phaseName: ${{ parameters.name }}
_arch: ${{ parameters.architecture }}
_codeCoverage: ${{ parameters.codeCoverage }}
queue:
${{ if eq(variables._codeCoverage, 'false') }}:
timeoutInMinutes: 30
${{ if eq(variables._codeCoverage, 'true') }}:
timeoutInMinutes: 60
parallel: 99
jobs:
- job: ${{ parameters.name }}
${{ if eq(parameters.codeCoverage, 'false') }}:
timeoutInMinutes: 40
${{ if eq(parameters.codeCoverage, 'true') }}:
timeoutInMinutes: 60
strategy:
matrix:
${{ if eq(parameters.customMatrixes, '') }}:
Debug_Build:
Expand All @@ -31,28 +26,32 @@ phases:
_includeBenchmarkData: true
${{ if ne(parameters.customMatrixes, '') }}:
${{ insert }}: ${{ parameters.customMatrixes }}
${{ insert }}: ${{ parameters.queue }}

pool: ${{ parameters.pool }}
${{ if ne(parameters.container, '') }}:
container: ${{ parameters.container }}

steps:
- ${{ if eq(parameters.queue.name, 'Hosted macOS') }}:
- ${{ if eq(parameters.pool.name, 'Hosted macOS') }}:
- script: brew update && brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f5b1ac99a7fba27c19cee0bc4f036775c889b359/Formula/libomp.rb && brew install mono-libgdiplus gettext && brew link gettext --force && brew link libomp --force
displayName: Install build dependencies
- script: $(_buildScript) -$(_configuration) -buildArch=$(_arch)
- script: ${{ parameters.buildScript }} -$(_configuration) -buildArch=${{ parameters.architecture }}
displayName: Build
- script: $(_buildScript) -- /t:DownloadExternalTestFiles /p:IncludeBenchmarkData=$(_includeBenchmarkData)
- script: ${{ parameters.buildScript }} -- /t:DownloadExternalTestFiles /p:IncludeBenchmarkData=$(_includeBenchmarkData)
displayName: Download Benchmark Data
- script: $(_buildScript) -$(_configuration) -runtests -coverage=$(_codeCoverage)
- script: ${{ parameters.buildScript }} -$(_configuration) -runtests -coverage=${{ parameters.codeCoverage }}
displayName: Run Tests.
- script: $(Build.SourcesDirectory)/Tools/dotnetcli/dotnet msbuild build/Codecoverage.proj /p:CodeCovToken=$(CODECOV_TOKEN)
displayName: Upload coverage to codecov.io
condition: and(succeeded(), eq(variables._codeCoverage, 'true'))
condition: and(succeeded(), eq(${{ parameters.codeCoverage }}, True))
- task: PublishTestResults@2
displayName: Publish Test Results
condition: succeededOrFailed()
inputs:
testRunner: 'vSTest'
searchFolder: '$(System.DefaultWorkingDirectory)/bin'
testResultsFiles: '**/*.trx'
testRunTitle: Machinelearning_Tests_$(_phaseName)_$(_configuration)_$(Build.BuildNumber)
testRunTitle: Machinelearning_Tests_${{ parameters.name }}_$(_configuration)_$(Build.BuildNumber)
configuration: $(_configuration)
mergeTestResults: true
- task: CopyFiles@2
Expand All @@ -78,5 +77,5 @@ phases:
pathToPublish: $(Build.ArtifactStagingDirectory)
artifactName: ${{ parameters.name }} $(_config_short)
artifactType: container
- script: $(_buildScript) -buildPackages
- script: ${{ parameters.buildScript }} -buildPackages
displayName: Build Packages
2 changes: 1 addition & 1 deletion build/codecoverage-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
################################################################################

phases:
- template: /build/ci/phase-template.yml
- template: /build/ci/job-template.yml
parameters:
name: Windows_x64
buildScript: build.cmd
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Input and Output Columns
The input label column data must be <xref:System.Boolean>.
The input features column data must be a known-sized vector of <xref:System.Single>.

This estimator outputs the following columns:

| Output Column Name | Column Type | Description|
| -- | -- | -- |
| `Trees` | Known-sized vector of <xref:System.Single> | The output values of all trees. Its size is identical to the total number of trees in the tree ensemble model. |
| `Leaves` | Known-sized vector of <xref:System.Single> | 0-1 vector representation to the IDs of all leaves where the input feature vector falls into. Its size is the number of total leaves in the tree ensemble model. |
| `Paths` | Known-sized vector of <xref:System.Single> | 0-1 vector representation to the paths the input feature vector passed through to reach the leaves. Its size is the number of non-leaf nodes in the tree ensemble model. |

Those output columns are all optional and user can change their names.
Please set the names of skipped columns to null so that they would not be produced.
20 changes: 20 additions & 0 deletions docs/api-reference/io-columns-tree-featurization-ranking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
### Input and Output Columns
The input label data type must be [key](xref:Microsoft.ML.Data.KeyDataViewType)
type or <xref:System.Single>. The value of the label determines relevance, where
higher values indicate higher relevance. If the label is a
[key](xref:Microsoft.ML.Data.KeyDataViewType) type, then the key index is the
relevance value, where the smallest index is the least relevant. If the label is a
<xref:System.Single>, larger values indicate higher relevance. The feature
column must be a known-sized vector of <xref:System.Single> and input row group
column must be [key](xref:Microsoft.ML.Data.KeyDataViewType) type.

This estimator outputs the following columns:

| Output Column Name | Column Type | Description|
| -- | -- | -- |
| `Trees` | Known-sized vector of <xref:System.Single> | The output values of all trees. Its size is identical to the total number of trees in the tree ensemble model. |
| `Leaves` | Known-sized vector of <xref:System.Single> | 0-1 vector representation to the IDs of all leaves where the input feature vector falls into. Its size is the number of total leaves in the tree ensemble model. |
| `Paths` | Known-sized vector of <xref:System.Single> | 0-1 vector representation to the paths the input feature vector passed through to reach the leaves. Its size is the number of non-leaf nodes in the tree ensemble model. |

Those output columns are all optional and user can change their names.
Please set the names of skipped columns to null so that they would not be produced.
14 changes: 14 additions & 0 deletions docs/api-reference/io-columns-tree-featurization-regression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Input and Output Columns
The input label column data must be <xref:System.Single>.
The input features column data must be a known-sized vector of <xref:System.Single>.

This estimator outputs the following columns:

| Output Column Name | Column Type | Description|
| -- | -- | -- |
| `Trees` | Known-sized vector of <xref:System.Single> | The output values of all trees. Its size is identical to the total number of trees in the tree ensemble model. |
| `Leaves` | Known-sized vector of <xref:System.Single> | 0-1 vector representation to the IDs of all leaves where the input feature vector falls into. Its size is the number of total leaves in the tree ensemble model. |
| `Paths` | Known-sized vector of <xref:System.Single> | 0-1 vector representation to the paths the input feature vector passed through to reach the leaves. Its size is the number of non-leaf nodes in the tree ensemble model. |

Those output columns are all optional and user can change their names.
Please set the names of skipped columns to null so that they would not be produced.
5 changes: 5 additions & 0 deletions docs/api-reference/io-time-series-ssa-forecast.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
### Input and Output Columns
There is only one input column.
The input column must be <xref:System.Single> where a <xref:System.Single> value indicates a value at a timestamp in the time series.

It produces either just one vector of forecasted values or three vectors: a vector of forecasted values, a vector of confidence lower bounds and a vector of confidence upper bounds.
4 changes: 2 additions & 2 deletions docs/api-reference/regularization-l1-l2.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
This class uses [empricial risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) (i.e., ERM)
This class uses [empirical risk minimization](https://en.wikipedia.org/wiki/Empirical_risk_minimization) (i.e., ERM)
to formulate the optimization problem built upon collected data.
Note that empricial risk is usually measured by applying a loss function on the model's predictions on collected data points.
Note that empirical risk is usually measured by applying a loss function on the model's predictions on collected data points.
If the training data does not contain enough data points
(for example, to train a linear model in $n$-dimensional space, we need at least $n$ data points),
[overfitting](https://en.wikipedia.org/wiki/Overfitting) may happen so that
Expand Down
25 changes: 25 additions & 0 deletions docs/api-reference/tree-featurization-prediction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
### Prediction Details
This estimator produces several output columns from a tree ensemble model. Assume that the model contains only one decision tree:

Node 0
/ \
/ \
/ \
/ \
Node 1 Node 2
/ \ / \
/ \ / \
/ \ Leaf -3 Node 3
Leaf -1 Leaf -2 / \
/ \
Leaf -4 Leaf -5

Assume that the input feature vector falls into `Leaf -1`. The output `Trees` may be a 1-element vector where
the only value is the decision value carried by `Leaf -1`. The output `Leaves` is a 0-1 vector. If the reached
leaf is the $i$-th (indexed by $-(i+1)$ so the first leaf is `Leaf -1`) leaf in the tree, the $i$-th value in `Leaves`
would be 1 and all other values would be 0. The output `Paths` is a 0-1 representation of the nodes passed
through before reaching the leaf. The $i$-th element in `Paths` indicates if the $i$-th node (indexed by $i$) is touched.
For example, reaching `Leaf -1` lead to $[1, 1, 0, 0]$ as the `Paths`. If there are multiple trees, this estimator
just concatenates `Trees`'s, `Leaves`'s, `Paths`'s from all trees (first tree's information comes first in the concatenated vectors).

Check the See Also section for links to usage examples.
Loading