Update to origin #3

justinormont · 2018-07-29T09:10:40Z

No description provided.

fix namespace issue and refactoring

… readable. (#324)

…or (#338) `CalibratorUtils.TrainCalibrator` and `TrainCalibratorIfNeeded` now creates `CalibratedPredictor` instead of `SchemaBindableCalibratedPredictor` whenever the predictor implements `IValueMapper`.

…356) * Use HideEnumValueAttribute for both manifest and C# API generation. * Unhide NAReplaceTransform.ReplacementKind.SpecifiedValue. This may require some other PR to resolve the corresponding issues.

When installing Microsoft.ML on an unsupported framework (like net452), it is currently getting installed successfully. However, users should be getting an error stating that net452 is not supported by this package. The cause is the build files exist for any TFM, which NuGet interprets as this package supports any TFM. Moving the build files to be consistent with the 'lib' folder support. Fix #357

…osal. (#369) * Subclasses of `Stream` now have `Close` call `base.Close` to ensure disposal. * Add DeleteOnClose to File opening. * Remove explicit delete of file. * Remove explicit close of substream. * Since no longer deleting explicitly, no longer need `_overflowPath` member.

* Changed List to HashSet to ensure that there are no duplicates

* Update fast tree argument help text * Update wording * Update API to fix test * Update core manifest JSON to update help text

* Add a way to create a single tree ensemble model from multiple tree ensemble models. * Address PR comments, and fix bugs in serializing/deserializing RegressionTrees. * Address PR comments.

add pipelineitem for Ova

…ryPoints.md and GraphRunner.md (#295) * Adding EntryPoints.md and GraphRunner.md * addressing PR feedback * Updating the title of the GraphRunner.md file * adressing Tom's feedback * adressing feedback * code formatting for class names * Addressing Gal's comments * Adding an example of an entry point. Fixing casing on ML.NET * fixing link

Corrects an unintentional "typo" in FastTreeRanking.cs where there was mistakenly a USE_FASTTREENATIVE2 instead of USE_FASTTREENATIVE. This resulted in some obscure hidden ranking options (distance weighting, normalize query lambdas, and a few others) being unavailable. These are important for some applications.

* LightGBM and test. * add test baselines and nuget source for lightGBM binaries. * Add entrypoint for lightGBM. * add unsafe flag for release build. * update nuget version. * make lightgbm test single threaded. * install gcc on OS machines to resolve dependencies on openmp thatis needed by lightgbm native code. * PR comments. Leave BREW and GCC in bash script to verify macOS tests work. * remove brew and gcc from build script. * PR feedback. * disable test on macOS. * disable test on macOS. * PR feedback.

* Adding Factorization Machines

* ONNX API documentation.

Introduce Ensemble codebase

Create a shorter temp file name for model loading, as well as remove the potential for a race condition among multiple openings by using the creation of a lock file.

… unecessary cmake version requirement (#425)

…lues (#394) * Fix EvaluatorUtils to handle label column of type key without text key values.

* removing extraneous character that broke the linux build, and with it unecessary cmake version requirement * Removing the BOM from the file

)

*Update error messages to point to GitHub issues instead of support group

…, in xml files. (#510) * Moving from xml strings to having the documentation details in xml files. For the summary text that is common between several learners, the examples will be added on a separate node. An example of how that will look like is in the LogisticRegressionBinaryClassifier and LogisticRegressionClassifier. * fixing the aftermath of renaming the XML file. * removing the Desc from the EntryPoint attribute is a bad idea. * removing the XML docs from the doc folder, and added them under the respective projects. * Some OS get picky about casing. * file name should be vanilla * Fixing comment

#505)

…#428) * Fix iris.txt dataset and modify Iris Classification tests accordingly * Modify baseline test files for multiclass classification with Iris dataset and LightGBM

The test `TrainAndPredictIrisModelUsingDirectInstantiationTest` now has analogous changes to the `TrainAndPredictIrisModelTest` test

@Ivanidzo4ka

* First attempt at removing extra code comments * Round #2 * Removing Microsoft.ML.InternalStreams per comment on #513 * Address notes from @Ivanidzo4ka * Remove TreeOrderedCandidatesSearch * Remove whitespace and reinstate commented out tests

* Adding arguments to PipelineSweep Macro * updating the unit tests * taking care of review comments; adding validations for Label, Weight, GroupId and Name columns * taking care of some review comments * some code cleanup * addressing PR comments * API changes to use RoleMappedData * taking care of review comments * using pipeline.UniqueId * taking care of review comments. update ColumnPurpose 'Group' so it is consistent with Role 'Group'

This fixes a couple of dangling `cref` in the XML Docs. This commit doesn't contain functional changes to the code. Issue: This closes #434

…ithout files. (#472) * Save Schema to context to support loading the model without files. * Use the input file's schema if the file is available.

…#522) * Conversion of ITrainer.Train returns predictor, accepts TrainContext * `ITrainer.Train` returns a predictor. There is no `CreatePredictor` method on the interface. * `ITrainer.Train` always accepts a `TrainContext`. Dataset type is no longer a generic parameter. This context object replaces the functionality previously offered by the combination of `ITrainer`, `IValidatingTrainer`, `IIncrementalTrainer`, and `IIncrementalValidatingTrainer`, which is now captured in one `ITrainer.Train` method with differently configured contexts. * All trainers updated to these two new idioms. Many trainers correspondingly improved to no longer be stateful objects. (The exceptions are those that are just too far gone to be done with less than herculean effort at refactoring them to no longer use instance fields for their computation. Most notably, LBFGS and FastTree based trainers.) * Utility code meant to deal with the complexity of the aforementioned `IT/IVT/IIT/IIVT` idiom reduced considerably. * Opportunistic improvements to `ITrainer` implementors where observed. * TrainerInfo introduction, ITrainerEx destruction * Remove `IMetaLinearTrainer`

…ting the CSharpAPI (#529) * Moving from xml strings to having the documentation details in xml files. For the summary text that is common between several learners, the examples will be added on a separate node. An example of how that will look like is in the LogisticRegressionBinaryClassifier and LogisticRegressionClassifier. * fixing the aftermath of renaming the XML file. * removing the Desc from the EntryPoint attribute is a bad idea. * removing the XML docs from the doc folder, and added them under the respective projects. * Some OS get picky about casing. * file name should be vanilla * Adding documentation for the first group of transforms * adding more documentation. changing the root of the XML documents from docs -> doc, since its only one. Switching all <see href /> to the valid <see cref /> * formatting tweaks, and adressing most of the code comments. * Extracted the examples outside of the member nodes in the xml, so that they only appear in the CSharpApi classes, and not on the runtime classes. * small fixes * addressing code comments * addressing Pete's comments. * Fixing language around the CharTokenizer description. Closes #389

* Allow CpuMath to reference C# Hardware Intrinsics APIs. Need to multi-target CpuMath for netstandard and netcoreapp3.0. Also, since we are going to move CpuMath into its own NuGet package, remove the dependency from CpuMath to the ML.Core project. Add a build parameter to enable building against .NET Core 3.0's Runtime Intrinsics APIs. Fix #534 * Respond to PR feedback.

* failing test case for multiclass * Refactored PipelineSweeperSupportedMetrics Class; added unit test for MultiClassClassification; refactored out unit tests for the PipelineSweeper * take care of review comments; display transforms/learners + metrics in pipeline * taking care of PR comments + refactor PipelineSweeperRunSummary * taking care of review comments

* remove domain from onnx operators for non-ML types. * Make ONNX compatible with Windows RS5 and add more tests. * PR feedback. * PR feedback. * fix build.

* Remove Windows and Linux configurations from netci.groovy * Add end of line to yml files * Add badges and change leg name to Linux * Not merge test results * Add searchFolder to publish test results task

…upport for type conversion (#555) * Don't fail in case of const field in Collection source. Extend support for basic C# types for DataVIew<->collection conversion.

* Added a doc for schema comprehension

* Initial code analyzer for Microsoft.ML Adds code analysis initially for correct usage of common Contracts.Except/Check patterns, naming conventions, variable usage and initializations, and other idioms used throughout the Microsoft.ML codebase. Enables analysis on Microsoft.ML projects. Also added StyleCop, with most rules currently disabled, but rules on whitespace, declaration modifier ordering, explicit access modifiers, and inteface naming check. * Analyzer is .NET Standard 1.3 to avoid problems with IDE1003. dotnet/roslyn#22368 * Projects in `src` can not use analyzer by setting `UseMLCodeAnalyzer` project property to false.

* Changed range of L2RegularizerWeight parameter in AveragedPerceptron

…erently when using with/without word tokenizer. (#548)

RowTag in metrics

… name attribute for some examples (#592) * Fixes issue 591: typos, adding the type to lists, and fixing the name attribute in OGD and Poisson * getting just the content under the memeber node, not the member itself. * merging from master

#587) `DataViewConstructionUtils`'s methods to create dataviews over .NET types will now have correctly inferred "getters" in the case of sparse vectors.

* Added placeholder * Cleaned up Infos (replaced with ColumnPairs) * Added ColumnInfo * Added all the Create() methods. * Added Mapper * Commented out the EntryPoint * Added PcaEstimator2 * PcaWorkout test passes * Added pigsty api * Fixed EntryPoint * Fixed the arguments * Fixed tests and added pigsty test * Deleted Wrapped PCA transform * Float -> float * Cleaned docstrings * Removed some unnecessary checks * Simplified unnecessary code * Moved some fields to ColumnInfo for simplifications * Simplified weight columns * Address PR comments #1 * Addressed PR comments #2 * Moved the static test * PR comments #3 * Moved schema related information out of ColumnInfo and into Mapper.ColumnSchemaInfo. * PR comments * PR comments * Updated manifest for entrypoint PcaCalculator * Fixed schema exceptions

Ivanidzo4ka and others added 30 commits June 12, 2018 10:20

fix namespace issue in CSharpGenerator and some refactoring (#339)

1fc3069

fix namespace issue and refactoring

Using named-tuple in OneToOneTransforms' constructor to make API more…

81d40a9

… readable. (#324)

Minor formatting in CollectionDataSourceTests.cs (#348)

f6c6f5b

Create CalibratedPredictor instead of SchemaBindableCalibratedPredict…

f2888be

…or (#338) `CalibratorUtils.TrainCalibrator` and `TrainCalibratorIfNeeded` now creates `CalibratedPredictor` instead of `SchemaBindableCalibratedPredictor` whenever the predictor implements `IValueMapper`.

Remove reference and dependency on System.ValueTuple (#351)

d91392f

Add link to samples (#355)

9cf7460

Use HideEnumValueAttribute for both manifest and C# API generation. (#…

8435ce9

…356) * Use HideEnumValueAttribute for both manifest and C# API generation. * Unhide NAReplaceTransform.ReplacementKind.SpecifiedValue. This may require some other PR to resolve the corresponding issues.

Return distinct array of ParameterSet when ProposeSweep is called (#368)

7f8caf7

* Changed List to HashSet to ensure that there are no duplicates

Update fast tree argument help text (#372)

09f7c66

* Update fast tree argument help text * Update wording * Update API to fix test * Update core manifest JSON to update help text

Combine multiple tree ensemble models into a single tree ensemble (#364)

8b01fc5

* Add a way to create a single tree ensemble model from multiple tree ensemble models. * Address PR comments, and fix bugs in serializing/deserializing RegressionTrees. * Address PR comments.

add pipelineitem for Ova (#363)

e5de547

add pipelineitem for Ova

Fix CV macro to output the warnings data view properly. (#385)

ead943e

Link to an example on using converting ML.NET model to ONNX. (#386)

496d3b9

Adding LDA Transform (#377)

0d5e317

Adding Factorization Machines (#383)

31ae678

* Adding Factorization Machines

ONNX API documentation. (#419)

17f944c

* ONNX API documentation.

Bring ensembles into codebase (#379)

dbbc69e

Introduce Ensemble codebase

enable macOS tests for LightGBM. (#422)

f94203e

Create a shorter temp file name for model loading. (#397)

211c043

Create a shorter temp file name for model loading, as well as remove the potential for a race condition among multiple openings by using the creation of a lock file.

removing extraneous character that broke the linux build, and with it…

6c4470f

… unecessary cmake version requirement (#425)

EvaluatorUtils to handle label column of type key without text key va…

bca008b

…lues (#394) * Fix EvaluatorUtils to handle label column of type key without text key values.

Removing non source files from solution (#362)

36b5bb1

Bump master to v0.4 (#427)

98aaeb5

Build fix - removing the BOM from the CMakeLists.txt file (#430)

2501049

* removing extraneous character that broke the linux build, and with it unecessary cmake version requirement * Removing the BOM from the file

Remove MML.DLL from Microsoft.ML nuget. (#439)

fb8cf0b

sharwell and others added 28 commits July 9, 2018 13:52

Fix failure to validate XML comments in test sources during builds (#499

f7a5526

)

Fix quotes on json (#516)

fbc00db

[Part 2] Added convenience constructors for set of transforms. (#491)

268ebbc

Update error messages after another reset (#359)

9d9d74e

*Update error messages to point to GitHub issues instead of support group

Update Onnx Convert documentation, limited to ONNX-ML target platforms (

6503167

#505)

Fix iris.txt dataset and modify Iris Classification tests accordingly (…

54596ac

…#428) * Fix iris.txt dataset and modify Iris Classification tests accordingly * Modify baseline test files for multiclass classification with Iris dataset and LightGBM

Fix TrainAndPredictIrisModelUsingDirectInstantiationTest (#527)

ceac01f

The test `TrainAndPredictIrisModelUsingDirectInstantiationTest` now has analogous changes to the `TrainAndPredictIrisModelTest` test

[Part 3] Added convenience constructors for set of transforms. (#520)

c491651

Issue 434: Fixed imprecise crefs in XML Docs (#485)

ef169b2

This fixes a couple of dangling `cref` in the XML Docs. This commit doesn't contain functional changes to the code. Issue: This closes #434

ParquetLoader - Save Schema to context to support loading the model w…

5e0a40e

…ithout files. (#472) * Save Schema to context to support loading the model without files. * Use the input file's schema if the file is available.

Ensure ONNX export is compatible with Windows RS5 (#550)

8c11759

* remove domain from onnx operators for non-ML types. * Make ONNX compatible with Windows RS5 and add more tests. * PR feedback. * PR feedback. * fix build.

Move Windows and Linux CI to VSTS (#566)

a862ccc

* Remove Windows and Linux configurations from netci.groovy * Add end of line to yml files * Add badges and change leg name to Linux * Not merge test results * Add searchFolder to publish test results task

Fix Linux CI to actually run inside a docker container (#574)

e885b73

Don't fail in case of const field in Collection source and extended s…

015a15e

…upport for type conversion (#555) * Don't fail in case of const field in Collection source. Extend support for basic C# types for DataVIew<->collection conversion.

Schema comprehension doc (#572)

8cfa2ed

* Added a doc for schema comprehension

Sweep Range of L2RegularizerWeight in AveragedPerceptron (#579)

7fea0af

* Changed range of L2RegularizerWeight parameter in AveragedPerceptron

Fixed the TextTransform bug where chargrams where being computed diff…

0e0f702

…erently when using with/without word tokenizer. (#548)

Pass fold index to cross validation metrics. (#575)

0f94a3b

RowTag in metrics

Fix creation of dataviews inferred with .NET types with sparse vectors (

2107b82

#587) `DataViewConstructionUtils`'s methods to create dataviews over .NET types will now have correctly inferred "getters" in the case of sparse vectors.

justinormont merged commit 24d939a into justinormont:sdca-l2-fix Jul 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update to origin #3

Update to origin #3

Uh oh!

justinormont commented Jul 29, 2018

Uh oh!

Uh oh!

Update to origin #3

Update to origin #3

Uh oh!

Conversation

justinormont commented Jul 29, 2018

Uh oh!

Uh oh!