-
Notifications
You must be signed in to change notification settings - Fork 63
Generate PrefixColumnConcatenator with entry point compiler instead of manually. #364
Conversation
* ColumnConcatenator() << {'features': ['age', 'parity', | ||
'induced']}) | ||
|
||
For more details see `Columns </nimbusml/concepts/columns>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need this doc #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | ||
|
||
Combines several columns into a single vector-valued column by prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combines several columns into a single vector-valued column by prefix [](start = 4, length = 69)
need this doc #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -82,8 +85,7 @@ def _get_node(self, **all_args): | |||
|
|||
# validate output | |||
if output_columns is None: | |||
raise ValueError( | |||
"'None' output passed when it cannot be none.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one is needed. Pls see example how to use this transform, it should be clear from it #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Turn off shuffling for FactorizationMachineBinaryClassifier. (#316) * Initial implementation of NGramExtractor. (#320) * Disable check which prevents artifacts from being generated by pull requests. (#330) * Update ManifestGenerator. (#329) * Added "# -- coding: utf-8 --" to preserve the character `␂` while guaranteeing successful builds with Python 2.7 (#328) * Replaced the non-ASCII characters * Revert "Replaced the non-ASCII characters" This reverts commit 4adb28c. * Update NGramExtractor_df.py * Updating coding of Schema.py to preserve the character "␂" * To re-run build tests * To re-run build tests * Edited encoding * Rerun build tests * Rerun build tests * Added utf-8 encoding to NGramExtractor.py (#339) * Image.py and Image_df.py extended testing examples are now supported on Ubuntu and CentOS (#338) * Remove skipping of Image.py and Image_df.py * Add libraries required for running Image.py and Image_df.py in Linux machines * Update build.sh * Add third party notices to package description on PyPI (#341) * Add third party notices to package description on PyPI * update * update * Add 1.5 (#344) * Add info to README.md (#342) * Add info to README.md * update * Fix DbgWinPy2.7 build which was failing when building NativeBridge. (#340) * Fix DbgWinPy2.7 build which was failing when building NativeBridge. Here is one of the error messages: libboost_numpy-vc140-mt-gd-1_64.lib(ndarray.obj) : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug' doesn't match value 'MTd_StaticDebug' in DataViewInterop.obj * Add whitespace change to start new CI run. UbuntuPy36 crashed * Fix error level when exiting build.cmd. (#345) * Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and processed HTTP-->HTTPS URL changes (#346) * Added utf-8 encoding to NGramExtractor.py * Added HTTP to HTTPS finder and converter * Changes made by ChangeHttpURLsToHttps.py * Added copyright statements * Updated FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add reports of alterable, nonalterable and invalid URLs * Revert "Changes made by ChangeHttpURLsToHttps.py" This reverts commit afa5f35. * Add URL changes made by ChangeHttpURLsToHttps.py * Revert "Add URL changes made by ChangeHttpURLsToHttps.py" This reverts commit b6a2f7f. * Revert "Add reports of alterable, nonalterable and invalid URLs" This reverts commit 9121123. * Update FindHttpURLs.py and ChangHttpURLsToHttps.py * Add HTTP to HTTPS URL reports * Changes made by ChangeHttpToHttpsURLs.py * Revert "Changes made by ChangeHttpToHttpsURLs.py" This reverts commit 72c85d9. * Revert "Add HTTP to HTTPS URL reports" This reverts commit 81c5a96. * Revert "Update FindHttpURLs.py and ChangHttpURLsToHttps.py" This reverts commit 038262f. * Update FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add URL reports * Add Http-->Https URL changes through ChangeHttpURLsToHttpsURLs.py * Removed if __name__ and main() statements * Revert "Removed if __name__ and main() statements" This reverts commit ba2742f. * Update nimbusml.pyproj * Manually converted two alterable HTTP links to HTTPS. * Rename ChangeHttpURLsToHttps.py to changeHttpURLsToHttps.py * Rename FindHttpURLs.py to findHttpURLs.py * URL in SigmoidKernel.txt is fixed for findHttpURLs.py to recognize it as an alterable URL * Changed outdated URL as original URL redirected to current URL * Update Report_InvalidUrls_FindHttpURLs.csv * Fixing reachable HTTP URLs * Update findHttpURLs.py * Updated URL reports, cleared invalid URLs * Update of report for alterable HTTP URLs after running findHttpURLs.py after running changeHttpURLsToHttps.py * Removing URL reports for merge * Renamed URL scripts and reflected this change inside these files (#348) * Renamed URL scripts and reflected this change inside these files * Fix small type in change_http_urls_to_https.py * Updated file names and naming conventions inside files * Update nimbusml.pyproj * Updated usage infos of find_http_urls.py and change_to_https.py * Updated find_http_urls.py and change_to_https.py * Execute unit tests in parallel (#331) * Wrap test estimator checks in a python unit test. * Combine the non-extended test runs together to make them more parallelizable. * Reverse the tests path args order to try and have test_estimator_checks run earlier in the test run. * Dynamically generate the test_estimator_checks unit tests. * Create the test_docs_example unit tests dynamically so they can be parallelized. * Fix KMeansPlusPlus does not work with a cluster size of 1 when using a debug version of ml.net * Fix OLS divide by 0 when given a particular set of inputs to fit. This is hidden in release versions of ml.net * Fix issue when ranking where the output of TextToKeyConverter was trying to overwrite the $scoredVectorData variable set by DatasetScorerEx. See test_metrics_evaluate_ranking_group_id_from_existing_column_in_X for a test which demonstrates the issue. It throws an exception from EntryPointNode.cs:837 when trying to get the outputs. The exception was hidden when using release builds of ML.Net. * Remove a test_estimator_check for OrdinaryLeastSquaresRegressor since it is causing invalid float values and throwing an exception which was hidden in release versions of ML.Net but visible in debug. * Update test_permutation_feature_importance tests to support parallel execution. * Rerun unit tests one extra time if any failed to check for intermittent failures. * Decrease the size of the images in the Image and Image_df examples. (#350) * Update package references to work with the latest versions from nuget.org. (#353) * Update ML.Net package references to work with RC1 * Update to ML.Net 1.4.0 * Update Microsoft.DataPrep to version 0.0.2.19-preview. * Downgrade Microsoft.DataPrep to version 0.0.2.3-preview due to issue with missing SqlJdbc package. * Update nimbusml version to 1.6.0. * Update release notes. (#354) * Added Google.Protobuf.dll to Mac and Linux builds (#358) * Modifications to support scripted temp/docs merging. (#361) * Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/decoding issue (#359) * Modified size variable in GetUnicodeTX to -1 * Update DataViewInterop.h * Fixed spacing in DataViewInterop.h * Re-enabled skipped test due to Py2.7 encoding/decoding issue * Removed unnecessary invoking of .sum() * Revert "Removed unnecessary invoking of .sum()" This reverts commit e51a64b. * Initial implementation of the temp_docs_updater script. (#363) * Update README.md * Generate PrefixColumnConcatenator with entry point compiler instead of manually. (#364) * Fix broken docs (#369) * Fix whitespaces and typos * tabs and whitespaces * Removed all references to DSSM in NimbusML (except for in test_wordembedding.py) (#374) * Added catch for predictors that do not support summary() (#375) * Added catch for summary() with FactorizationMachineBinaryClassifier * Updated test for model summary * Revert "Updated test for model summary" This reverts commit 59656fe. * Update pipeline.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Changed wording of error message * Update Microsoft.DataPrep to the latest version. (#379) * Create release notes for the 1.6.0 release. (#382) * Create release notes for version 1.6.0. * Update 1.6.0 release notes. * Bump version to 1.6.1 to fix dprep issue. (#385) * Update to latest version of DataPrep. * Bump version to 1.6.1 to fix dprep issue. * Removed "TODO: Replace with CV" comments (#389) * Disabled tests that only fail on Mac Py2.7 due to string encoding/dec… (#391) * Disabled tests that only fail on Mac Py2.7 due to string encoding/decoding bug * Update test_ngramfeaturizer.py * Add as_csr documentation to the inline docstrings for transform() and fit_transform(). (#392) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Update to the latest version of ML.Net. (#401) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Typo fixed on paragraph 15 (#399) * Typo fixed on paragraph 10 (#398) * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Update the transform output formats documentation. (#395) * Update the transform output formats documentation. * Add whitespace change to restart CI run. The mac build did not start correctly. * Add whitespace change to restart CI run. The mac build did not start correctly. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fixed broken brew command (#402) * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Checking for extended tests * Update phase-template.yml * Final touches * Re-activated NGramFeaturizer2.py (#381) * Update test_docs_example.py * Temporary change so that extended tests can be run by PRs * Revert "Temporary change so that extended tests can be run by PRs" This reverts commit 3f2b8a3. * Temporary change to be able to view extended tests' status with manual PRs * Update .vsts-ci.yml * Update .vsts-ci.yml * Update .vsts-ci.yml Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Contributing: Fix a typo (#406) * Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (#407) Note, this modification only handles intermittent crashes on Ubuntu/Mac unit test runs. It does not handle situations where the build hangs and never returns control to the build script. * Fix issue when specifying split_start='after_transforms' with CV.fit() (#410) * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Fix build issue caused by latest version of pip (>=20.0.0) (#412) * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <balmustafa117@gmail.com> Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com> Co-authored-by: Darío Hereñú <magallania@gmail.com> Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com>
* Native featurizers for AutoML (#317) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update aml branch. (#415) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Turn off shuffling for FactorizationMachineBinaryClassifier. (#316) * Initial implementation of NGramExtractor. (#320) * Disable check which prevents artifacts from being generated by pull requests. (#330) * Update ManifestGenerator. (#329) * Added "# -- coding: utf-8 --" to preserve the character `␂` while guaranteeing successful builds with Python 2.7 (#328) * Replaced the non-ASCII characters * Revert "Replaced the non-ASCII characters" This reverts commit 4adb28c. * Update NGramExtractor_df.py * Updating coding of Schema.py to preserve the character "␂" * To re-run build tests * To re-run build tests * Edited encoding * Rerun build tests * Rerun build tests * Added utf-8 encoding to NGramExtractor.py (#339) * Image.py and Image_df.py extended testing examples are now supported on Ubuntu and CentOS (#338) * Remove skipping of Image.py and Image_df.py * Add libraries required for running Image.py and Image_df.py in Linux machines * Update build.sh * Add third party notices to package description on PyPI (#341) * Add third party notices to package description on PyPI * update * update * Add 1.5 (#344) * Add info to README.md (#342) * Add info to README.md * update * Fix DbgWinPy2.7 build which was failing when building NativeBridge. (#340) * Fix DbgWinPy2.7 build which was failing when building NativeBridge. Here is one of the error messages: libboost_numpy-vc140-mt-gd-1_64.lib(ndarray.obj) : error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MDd_DynamicDebug' doesn't match value 'MTd_StaticDebug' in DataViewInterop.obj * Add whitespace change to start new CI run. UbuntuPy36 crashed * Fix error level when exiting build.cmd. (#345) * Added HTTP URLs to HTTPS URLs finder & converter Python scripts, and processed HTTP-->HTTPS URL changes (#346) * Added utf-8 encoding to NGramExtractor.py * Added HTTP to HTTPS finder and converter * Changes made by ChangeHttpURLsToHttps.py * Added copyright statements * Updated FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add reports of alterable, nonalterable and invalid URLs * Revert "Changes made by ChangeHttpURLsToHttps.py" This reverts commit afa5f35. * Add URL changes made by ChangeHttpURLsToHttps.py * Revert "Add URL changes made by ChangeHttpURLsToHttps.py" This reverts commit b6a2f7f. * Revert "Add reports of alterable, nonalterable and invalid URLs" This reverts commit 9121123. * Update FindHttpURLs.py and ChangHttpURLsToHttps.py * Add HTTP to HTTPS URL reports * Changes made by ChangeHttpToHttpsURLs.py * Revert "Changes made by ChangeHttpToHttpsURLs.py" This reverts commit 72c85d9. * Revert "Add HTTP to HTTPS URL reports" This reverts commit 81c5a96. * Revert "Update FindHttpURLs.py and ChangHttpURLsToHttps.py" This reverts commit 038262f. * Update FindHttpURLs.py and ChangeHttpURLsToHttps.py * Add URL reports * Add Http-->Https URL changes through ChangeHttpURLsToHttpsURLs.py * Removed if __name__ and main() statements * Revert "Removed if __name__ and main() statements" This reverts commit ba2742f. * Update nimbusml.pyproj * Manually converted two alterable HTTP links to HTTPS. * Rename ChangeHttpURLsToHttps.py to changeHttpURLsToHttps.py * Rename FindHttpURLs.py to findHttpURLs.py * URL in SigmoidKernel.txt is fixed for findHttpURLs.py to recognize it as an alterable URL * Changed outdated URL as original URL redirected to current URL * Update Report_InvalidUrls_FindHttpURLs.csv * Fixing reachable HTTP URLs * Update findHttpURLs.py * Updated URL reports, cleared invalid URLs * Update of report for alterable HTTP URLs after running findHttpURLs.py after running changeHttpURLsToHttps.py * Removing URL reports for merge * Renamed URL scripts and reflected this change inside these files (#348) * Renamed URL scripts and reflected this change inside these files * Fix small type in change_http_urls_to_https.py * Updated file names and naming conventions inside files * Update nimbusml.pyproj * Updated usage infos of find_http_urls.py and change_to_https.py * Updated find_http_urls.py and change_to_https.py * Execute unit tests in parallel (#331) * Wrap test estimator checks in a python unit test. * Combine the non-extended test runs together to make them more parallelizable. * Reverse the tests path args order to try and have test_estimator_checks run earlier in the test run. * Dynamically generate the test_estimator_checks unit tests. * Create the test_docs_example unit tests dynamically so they can be parallelized. * Fix KMeansPlusPlus does not work with a cluster size of 1 when using a debug version of ml.net * Fix OLS divide by 0 when given a particular set of inputs to fit. This is hidden in release versions of ml.net * Fix issue when ranking where the output of TextToKeyConverter was trying to overwrite the $scoredVectorData variable set by DatasetScorerEx. See test_metrics_evaluate_ranking_group_id_from_existing_column_in_X for a test which demonstrates the issue. It throws an exception from EntryPointNode.cs:837 when trying to get the outputs. The exception was hidden when using release builds of ML.Net. * Remove a test_estimator_check for OrdinaryLeastSquaresRegressor since it is causing invalid float values and throwing an exception which was hidden in release versions of ML.Net but visible in debug. * Update test_permutation_feature_importance tests to support parallel execution. * Rerun unit tests one extra time if any failed to check for intermittent failures. * Decrease the size of the images in the Image and Image_df examples. (#350) * Update package references to work with the latest versions from nuget.org. (#353) * Update ML.Net package references to work with RC1 * Update to ML.Net 1.4.0 * Update Microsoft.DataPrep to version 0.0.2.19-preview. * Downgrade Microsoft.DataPrep to version 0.0.2.3-preview due to issue with missing SqlJdbc package. * Update nimbusml version to 1.6.0. * Update release notes. (#354) * Added Google.Protobuf.dll to Mac and Linux builds (#358) * Modifications to support scripted temp/docs merging. (#361) * Set size variable to -1 in GetUnicodeTX to fix Python 2.7 encoding/decoding issue (#359) * Modified size variable in GetUnicodeTX to -1 * Update DataViewInterop.h * Fixed spacing in DataViewInterop.h * Re-enabled skipped test due to Py2.7 encoding/decoding issue * Removed unnecessary invoking of .sum() * Revert "Removed unnecessary invoking of .sum()" This reverts commit e51a64b. * Initial implementation of the temp_docs_updater script. (#363) * Update README.md * Generate PrefixColumnConcatenator with entry point compiler instead of manually. (#364) * Fix broken docs (#369) * Fix whitespaces and typos * tabs and whitespaces * Removed all references to DSSM in NimbusML (except for in test_wordembedding.py) (#374) * Added catch for predictors that do not support summary() (#375) * Added catch for summary() with FactorizationMachineBinaryClassifier * Updated test for model summary * Revert "Updated test for model summary" This reverts commit 59656fe. * Update pipeline.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Update test_model_summary.py * Changed wording of error message * Update Microsoft.DataPrep to the latest version. (#379) * Create release notes for the 1.6.0 release. (#382) * Create release notes for version 1.6.0. * Update 1.6.0 release notes. * Bump version to 1.6.1 to fix dprep issue. (#385) * Update to latest version of DataPrep. * Bump version to 1.6.1 to fix dprep issue. * Removed "TODO: Replace with CV" comments (#389) * Disabled tests that only fail on Mac Py2.7 due to string encoding/dec… (#391) * Disabled tests that only fail on Mac Py2.7 due to string encoding/decoding bug * Update test_ngramfeaturizer.py * Add as_csr documentation to the inline docstrings for transform() and fit_transform(). (#392) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Update to the latest version of ML.Net. (#401) * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Typo fixed on paragraph 15 (#399) * Typo fixed on paragraph 10 (#398) * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Update the transform output formats documentation. (#395) * Update the transform output formats documentation. * Add whitespace change to restart CI run. The mac build did not start correctly. * Add whitespace change to restart CI run. The mac build did not start correctly. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fixed broken brew command (#402) * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Update phase-template.yml * Checking for extended tests * Update phase-template.yml * Final touches * Re-activated NGramFeaturizer2.py (#381) * Update test_docs_example.py * Temporary change so that extended tests can be run by PRs * Revert "Temporary change so that extended tests can be run by PRs" This reverts commit 3f2b8a3. * Temporary change to be able to view extended tests' status with manual PRs * Update .vsts-ci.yml * Update .vsts-ci.yml * Update .vsts-ci.yml Co-authored-by: Gani Nazirov <ganinz@hotmail.com> * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Contributing: Fix a typo (#406) * Re-run failed unit tests on Ubuntu/Mac to fix intermittent crashes. (#407) Note, this modification only handles intermittent crashes on Ubuntu/Mac unit test runs. It does not handle situations where the build hangs and never returns control to the build script. * Fix issue when specifying split_start='after_transforms' with CV.fit() (#410) * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Fix build issue caused by latest version of pip (>=20.0.0) (#412) * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <balmustafa117@gmail.com> Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com> Co-authored-by: Darío Hereñú <magallania@gmail.com> Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com> * Fix build and test failures in the aml branch. (#418) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> * Fix build issues with aml branch (#419) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. * Fix incorrect featurizers library on Mac builds. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> * Fix issues with centos unit tests related to featurizers. (#420) * Draft, adding CategoryImputer, ToKeyImputer, ToString transformers * add tests * prelim commit * update manifest, fix unit tests/examples * upgrade version * fix tests * temp hack fix for native libs * copy libFeaturizers.so * fix version * fix cp * fix version * Update ML.Net version number. * Update the examples and unit tests. * Update to latest version of the Featurizers library. * Fix test_tostring unit test. * Temporarily skip the estimator checks unit tests. * Upgrade pip to the latest version when installing the Python packages on Windows. This fixes an issue I had where scikit-learn would not install when building NimbusML with the RlsWinPy3.6 configuration because it could not find one of the test data sets. * Update test_estimator_checks for the three new transformers. * Remove extra comma from test_estimator_checks. * Update the ML.Net version. * Add TimeSeriesImputer * Add country param to DateTimeSplitter * Upgrade TensorFlow.NET version. Required by latest version of Microsoft.ML.Dnn. * Update ML.Net version and import new AutoMLFeaturizers package. * Add back in the accidentally removed tests from test_data_with_missing.py. * Update the DateTimeSplitter examples. * Update the ToKeyImputer examples. * Update the ToString examples. * Update build to support latest nuget packages and updates. * Remove copy of libFeaturizers from linux build script. * Add TimeSeriesImputer to the NimbusML project. * Add initial DataFrame based example for TimeSeriesImputer. * Update to the latest version of manifest.json. * Add missing project include for the TimeSeriesImputer example. * Update the DateTimeSplitter examples. * Update build files to copy over the Data folder which is required for the country support in the DateTimeSplitter transform. * Add a unit test for testing the holiday name return value for DateTimeSplitter. * Add unit test for ToKeyImputer. * Update to latest version of manifest.json. Makes grain input required for TimeSeriesImputer. * Update TimeSeriesImputer_df example. * Remove TimeSeriesImputer from test_estimator_checks. * Update nuget.config to point to relative directory for ml.net packages. * Add unit test for TimeSeriesImputer. * Use environmental variable to specify the local ml.net nuget package directory. * Update to the latest version of ml.net. * Add latest version of nuget packages for building. * Update to the latest windows ml.net binaries. * Add linux ml.net binaries. * adding correct nuget packages/location * adding correct ML.NET signed packages * adding correct ML.NET signed packages * Update the referenced ML.Net versions. * Update to the latest version of the manifest. * Add RobustScaler to the public API. * Fix spacing bug in RobustScalar in manifest.json. * Update to the latest version of manifest.json which contains naming fix for RobustScaler. * Update to latest unsigned nuget packages for testing RobustScaler and latest master features. * Add RobustScaler unit tests and examples. * Update to the latest signed ML.Net nugets. * Fix RobustScaler checks in test_estimator_checks. * up version * Update to the latest version of ML.Net. * Whitespace change to start a new CI run to see if the mac build is working again. * Initial implementation of DateTimeSplitter. Ported from the aml branch. * Fix missing import in test_datetimesplitter. * Fix issue with ColumnSelector when dropping columns after DateTimeSplitter. * Use latest ML.Net dev packages from MachineLearning feed. * Re-enable the default nuget.org feed. It does not appear to cause any conflicts with getting the latest packages so long as the * is used in the PackageReference Version attributes. Keeping this enabled will allow other packages which are not part of the the MachineLearning feed to be retrieved (ie. Microsoft.MLFeaturizers). * Add whitespace change to restart CI build. Linux timed out. * Fix build issue when using pip version >= 20.0.0 * Remove local-nuget-packages, fix build and test_estimator_checks failures. * Remove DateTimeSplitter duplicates in nimbusml.pyproj * Remove duplicate ML.Featurizers import. * Fix incorrect featurizers library on Mac builds. * Fix centos unit test issues with featurizers. Co-authored-by: Gani Nazirov <ganinz@hotmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> * Add support for ONNX model export and execution. Merge to AML branch (#421) * Add initial implementation of the export to ONNX functionality. * Update the Microsoft.ML.OnnxConverter version in Platforms/build.csproj * Add test for verifying onnx export support. * Update the onnx conversion to be compatible with the latest changes in pull quest dotnet/machinelearning#3986. * Fix a few of the issues with test_export_to_onnx. * Add onnxruntime.dll to the NimbusML python package. It is already included in the Linux and Mac builds. * Initial implementation of the OnnxRunner transform. * Fix missing reference to models_onnxconverter in nimbusml.pyproj. * Exclude OnnxRunner from the test_export_to_onnx tests. * Remove OnnxRunner from test_estimator_checks. * Add back in OnnxConverter reference which was accidentally removed in merge. * Update onnx export test. TypeConverter, MeanVarianceScaler, MinMaxScaler no longer require experimental flag. * Pretty print the output of test_export_to_onnx. * Update to the latest version of ML.Net. * Update supported estimators in test_export_to_onnx. * Use the latest nightly builds for the ML.Net packages. * fix tests * fix test * Add example for OnnxRunner. (#422) * Build fix for rolling ML.NET 1.5.0-preview* and update to Pandas 1.0 (#437) * Updates for mlnet rolling build 1.5.0-preview2-28612-3 * Update pyproj * Update tests for pandas 1.0.1 * Skip check_dtype_object in TestEstimatorChecks due to pandas 1.0.0 removing Series.itemsize * Re-enable check_dtype_object and fix underlying issue causing it to fail * Remove label column from features when no Y is specified and predictor supports labels. (#439) * Fix breaking unit tests. (#440) * Update test_export_to_onnx test. (#443) * Update test_export_to_onnx test. (#444) * Fix NGramFeaturizer test * fix .0 (#445) * Add OneVsRest support to export to onnx tests and increase test coverage. (#446) * Automatically convert Categorical columns to their values before comparison in ONNX export tests. (#447) * add ORT results * Add ORT & vinod script (#449) * Add ORT validation to the export to onnx tests. (#451) * Remove unnecessary import. (#452) * Update data_frame_tool.py (#454) * Fixes for dataframe tool (#455) * add ORT results * fixes to dataframe tool and vinod * typos fixes * rollback * Fixed data_frame_tool to handle category columns correctly (#456) * Few fixes for IDV and DF formats * rollback * Regenerate entrypoint & api * Up version and fix test * Added Async suffix to RunOnBackgroundThread (#459) Added Async suffix to RunOnBackgroundThread * Update entrypoints and MarshallInvoke call (#461) * Update manifest.json * Update VariableColumnTransform.cs * Updated entrypoints * Update to use OnnxRuntime 1.2 (#462) * Updated ORT dependencies * Updated ORT Feed * Updated ORT tests for GPU * Revert "Updated ORT Feed" This reverts commit 76680f1. * Revert "Updated ORT tests for GPU" This reverts commit ae55b45. * Upgrade CI build to use latest onnxruntime and automl scenario based … (#463) * Upgrade CI build to use latest onnxruntime and automl scenario based test * simplify Co-authored-by: Gani Nazirov <ganaziro@microsoft.com> * dont run onnxruntime for python2.7 * fix automl test * Remove py2.7 Windows from CI build as latest pytest & pip are not supported anymore for Python 2.7 * fix typo * remove daily build location * use only nuget.org Co-authored-by: pieths <pieths.dev@gmail.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: Mustafa Bal <balmustafa117@gmail.com> Co-authored-by: Najeeb Kazmi <najeeb.kazmi@gmail.com> Co-authored-by: Darío Hereñú <magallania@gmail.com> Co-authored-by: Maher Jendoubi <maher.jendoubi@gmail.com> Co-authored-by: Gani Nazirov <ganaziro@microsoft.com> Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>
No description provided.