Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move DataFrame to machinelearning #5641

Merged
merged 54 commits into from
Mar 11, 2021
Merged

Move DataFrame to machinelearning #5641

merged 54 commits into from
Mar 11, 2021

Conversation

pgovind
Copy link

@pgovind pgovind commented Mar 3, 2021

Move MDA, MDAI, MDA.Tests and MDAI.Tests to machinelearning.

For an easier time reviewing, the real changes are after the Merge branch 'port' of ../corefxlab into DataFrame_1 commit.

Just for posterity, I used git-filter-repo to pare down corefxlab locally to just the MDA and MDAI parts that I was interested in and then moved them over to the machinelearning repo. git-filter-repo was way faster than git filter-branch on my machine.

Prashanth Govindarajan and others added 30 commits November 6, 2019 14:54
* Update namespace to Microsoft.Data.Analysis

* Remove "DataFrame" from the test project name
* Support reverse binary operators

* Fix file left behind in a rebase

* Fix whitespace
* Throw if inPlace is set and types mismatch

* Unit test

* Better error message

* Remove empty lines
* Version, Tags and Description for Nuget

* sq
* Publish packages to artifacts

* Flags for release
* Fix the Description method to not crash
Adds an Info method

* sq

* Address feddback

* Last round of feedback
* Fix LoadCsv to use dataType if it passed in

* sq

* Don't read the full file after guessRows lines have been read

* Address feedback

* Last round of feedback
* Rows collection, similar to Columns

* Doc

* Some minor clean up

* Make DataFrameRow a view into the DataFrame

* sq

* Address feedback

* Remove DataFrame.RowCount

* More row count changes

* sq

* Address feedback

* Merge upstream
…3.0 (dotnet#2797)

Fixing by passing in an encoding and a default buffer size.

Also, get our tests running on .NET Framework.

Fix dotnet#2783
* Params constructor on DataFrame

* Delete redundant constructors
…ns (dotnet#2801)

* Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations

* Address feedback

* Rename the value version of the APIs

* sq

* Fix build

* Address feedback

* Remove Value from the APIs

* sq

* Address feedback
* Add Apply method to PrimitiveDataFrameColumn and its container

* Add TestApply test

* Remove unused df variable in DataFrameTests

* Add xml doc comments to Apply method
* Add additional tests for ReadCsv

* Update asserts

* Add empty row and skip test pending another fix

* Remove test for another issue
* Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type).

* Remove regions

* Update some parts of the unit tests to use static factory methods to create DataFrameColumns.

* Remove errant {T} on StringDataFrameColumn.

* PR feedback

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
* Append rows to a DataFrame

* Unit test

* Update unit tests and doc

* Need to perfrom a type check every time

* sq

* Update unit test

* Address comments
* Add eng folder

* First cut of moving corefxlab to arcade

* Move arcade symbol validation inside official buil

* Move base yml file to root

* Arcade will build, publish packages and symbols

* UpdateXlf. Review this

* Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel

* Remove property that was causing the build to fail

* Moving global properties to the main Yaml instead of step in order to unblock publishing

* Committing xlfs and changing the build script to not update Xlf on build

* clean up corefxlab-base.yml

* sq

* Delete unused files and scripts

* Get rid of all the xlf stuff

* Remove UpdateXlfOnBuild for non-NT builds

* Minor cleanup

* More cleanup

* update eng\build.sh permission

* Rename to Nuget.config

* sq

* Remove the runtime spec from global.json

* Don't publish test projs

* Typo

* Move version prefix to versions.props
Change prereleaselabel to alpha

* Increment version number to list as the latest package
Increment version number of Microsoft.Experimental.Collections to list as the latest package
Turn off graph generation

* Update the Readme

* Test removing the scripts folder

* Touch readme to force a change

* Address Jose's comments

* Typo

* Move versions to eng/versions.props

* Benchmark.proj needs to refer to xunit

* Clean up dependencies.props

* Remove dependencies.props

Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
* Rename sort to orderby and add orderbydescending method

* Add doc strings

* Update bench mark test

* Update tests

* Update DataFrameColumn to use orderby

* Update doc comment

* Additions to sortby

* Revert "Additions to sortby"

This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35.

* Revert "Update doc comment"

This reverts commit 192f7797fe2b77625486637badf77046162fedbf.

* Revert "Update DataFrameColumn to use orderby"

This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4.
* Explode column types and generate converters

* Clean this

* sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change
…2861)

* Move string indexer to Columns

* API changes from the 2nd API review

* Unit tests

* Address comments
)

* Generate combinations of binary operations and Add

* Numeric Converters and CloneAsNumericColumns

* Binary, Comparison and Shift operations

* Clean up and bug fix

* Fix the binary op apis to not be overridden

* Internal constructors for exploded types

* Proper return types for exploded types

* Update unit tests

* Update csproj

* Revert "Fix the binary op apis to not be overridden"

This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1.

* Bug fix and unit test

* Constructor that takes in a container

* Unit tests

* Call the implementation where possible

* Review sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Rename to the system namespace column types

* Address comments

* Push to pull locally

* Mimic C#'s arithmetic grammar in DataFrame

* Address feedback

* Reduce the number of partial column definitions

* Address feedback
* Enable xml docs for Data.Analysis

* Fix /// summary around inheritdoc

* Minor doc changes

* sq

* sq

* Address feedback
…2885)

* Support for Exploded columns types in Arrow and IO scenarios

* Unit tests

* Address feedback
* Fix versioning to allow for individual stable packages

* sq
* Bump Microsoft.Data.Analysis version to 0.4.0
* Fix dotnet/corefxlab#2906

* Improvements and unit tests

* sq

* Better fix

* sq
…otnet#2916)

* Unit test to repro

* Fix dotnet/corefxlab#2915

Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn

* Update src/Microsoft.Data.Analysis/DataFrame.IO.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Update src/Microsoft.Data.Analysis/DataFrame.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Feedback

Co-authored-by: Günther Foidl <gue@korporal.at>
@pgovind pgovind added the Microsoft.Data.Analysis All DataFrame related issues and PRs label Mar 3, 2021
@pgovind pgovind changed the title [WIP]Move DataFrame to machinelearning Move DataFrame to machinelearning Mar 3, 2021
src/Microsoft.Data.Analysis/Microsoft.Data.Analysis.csproj Outdated Show resolved Hide resolved
<LangVersion>7.3</LangVersion>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
<SuppressFinalPackageVersion>false</SuppressFinalPackageVersion>
<VersionPrefix>0.5.0</VersionPrefix>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to decide how we want to describe package version over in the new repo.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? (I see that Versions.props is setting a default version of 1.5.5 for all the packages in the repo)

Copy link
Member

@eerhardt eerhardt Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove VersionPrefix here all together. And Microsoft.Data.Analaysis will just become an "unstable" project in machinelearning. Which means the next time we release it, it will be versioned 0.17.x.

See

<StableProjects>
Microsoft.Extensions.ML;
Microsoft.ML.DataView;
Microsoft.ML.CpuMath;
Microsoft.ML;
Microsoft.ML.Core;
Microsoft.ML.Data;
Microsoft.ML.KMeansClustering;
Microsoft.ML.PCA;
Microsoft.ML.StandardTrainers;
Microsoft.ML.Transforms;
Microsoft.ML.FastTree;
Microsoft.ML.ImageAnalytics;
Microsoft.ML.LightGbm;
Microsoft.ML.Mkl.Components;
Microsoft.ML.Mkl.Redist;
Microsoft.ML.TimeSeries;
Microsoft.ML.TensorFlow;
Microsoft.ML.OnnxTransformer;
Microsoft.ML.Vision;
</StableProjects>

<PropertyGroup Condition="'$(IsStableProject)' != 'true'">
<MajorVersion>0</MajorVersion>
<MinorVersion>17</MinorVersion>
<PatchVersion>6</PatchVersion>
</PropertyGroup>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, so 1.5.5 exists in both eng/Versions.props and eng/BranchInfo.props. Anyway, it's not important for our purposes here.

For now, using the version in BranchInfo.props is ok (we'd jump from 0.4.0 -> 0.17.6). But, if we wanted to release a new version of DataFrame, I don't think we'd want to increase the version for all the other projects right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, if we wanted to release a new version of DataFrame, I don't think we'd want to increase the version for all the other projects right?

My thoughts were that we would no longer "release a new version of DataFrame" by itself. It would now start shipping along with the rest of the ML.NET libraries on the same cycle. So when 0.17.7 ships next, we would ship Microsoft.Data.Analysis v0.17.7, same for 0.17.8, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so 1.5.5 exists in both eng/Versions.props and eng/BranchInfo.props

It looks like arcade will overwrite VersionPrefix if MajorVersion and MinorVersion are set:

https://github.com/dotnet/arcade/blob/ca7fab569267ed3bc73360882d652d119aae5653/src/Microsoft.DotNet.Arcade.Sdk/tools/Version.BeforeCommonTargets.targets#L76

@michaelgsharp - I think we can delete the VersionPrefix line in eng/Versions.props.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when 0.17.7 ships next, we would ship Microsoft.Data.Analysis v0.17.7, same for 0.17.8, etc.

Perfect. I was thinking we should do a quick MDA 0.5.0 release this month and then get on to the ML releases, but it looks like we can release preview packages whenever we want in ML, so this is not a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eerhardt I'll run a test build and see. It would be much better to only have to update in 1 place. My local build is having issues though, so I'll test it when I am able to.

src/Microsoft.Data.Analysis/Microsoft.Data.Analysis.csproj Outdated Show resolved Hide resolved
<MSBuild Projects="./../Microsoft.Data.Analysis.Interactive/Microsoft.Data.Analysis.Interactive.csproj"
Targets="_GetBuildOutputFilesWithTfm"
Properties="TargetFramework=netcoreapp3.1">
<!-- Manually hardcoding the TargetFramework to netcoreapp3.1 as that is the one that MDAI targets -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty hacky. @Anipik has been looking at defining patterns for dotnet/runtime pkgproj -> csproj transition. I wonder if he has an idea for a better pattern for this case.

I see you're taking the one configuration of MS.D.A.Interactive, putting that assembly in interactive-extensions/dotnet and ignoring any of its dependencies. Do you have any more info to share about how the interactive-extensions/dotnet folder works? What is the expectation of components that use that convention?

Copy link
Member

@eerhardt eerhardt Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any more info to share about how the interactive-extensions/dotnet folder works?

There isn't great documentation on this yet.

You can see some docs at https://github.com/dotnet/interactive/tree/main/docs#extending-net-interactive. Note there is an unlinked "Publishing your extension using NuGet" article that looks like it hasn't been written yet.

I think the best "doc" is the sample at:

https://github.com/dotnet/interactive/blob/main/samples/extensions/Library.nuget/Library.nuget.csproj

cc @jonsequitur @colombod @brettfo @LadyNaggaga

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the minimum TargetFramework required for dotnet interactive extensions? Are dependencies allowed? Should the extensions deps file be included?

I think the way this is included now would add the file multiple times if M.D.A had multiple TargetFrameworks. I think there might be a different hook other than TfmSpecificPackageFile to use for this scenario. I'll see if I can come up with a sample.

Copy link
Member

@eerhardt eerhardt Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the minimum TargetFramework required for dotnet interactive extensions?

You need to reference: https://www.nuget.org/packages/Microsoft.DotNet.Interactive/1.0.0-beta.21155.3, which is a netstandard2.1 library. So I guess that's the minimum? dotnet/interactive has already gone to net5.0, so I think we could bump this from netcoreapp3.1 to net5.0.

Are dependencies allowed?

I assume any dependencies you have outside of your NuGet package that aren't already dependencies of dotnet/interactive need to be included. But as it is now, Microsoft.Data.Analysis.Interactive doesn't have any external dependencies.

Should the extensions deps file be included?

@colombod @brettfo would know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I poked around the source and it looks like it just does a LoadFrom: https://github.com/dotnet/interactive/blob/6fd92eb9d72c119b6e7822ca87d96427399a3307/src/Microsoft.DotNet.Interactive/Extensions/AssemblyBasedExtensionLoader.cs#L85

I wonder how that plugin then finds the right copy of M.D.A 🤔 Anyhow, that's not what this comment was about. That's a thought for another day.

Copy link
Member

@ericstj ericstj Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hacking around a bit with this and found that you could do this like the following which is marginally better:

    <BeforePack>$(BeforePack);IncludeInteractiveExtension</BeforePack>
  </PropertyGroup>

  <Target Name="IncludeInteractiveExtension">
    <MSBuild Projects="../ext/ext.csproj" Targets="GetTargetPath">
      <Output TaskParameter="TargetOutputs" ItemName="_interactiveExtension" />
    </MSBuild>
    <ItemGroup>
      <Content Include="@(_interactiveExtension)" Pack="True" PackagePath="interactive-extensions/dotnet" />
    </ItemGroup>
  </Target>

This works so long as the interactive extension doesn't cross-target, but it seems like that's a requirement already. If you ever wanted to cross-target you could define a target in the extension project that ran in outer build and invoked the appropriate inner build framework to select the one that goes in the package (and avoid hardcoding the TFM in a different project).

Finally: what actually builds this project during pack? I see that M.D.A can't take a project reference since M.D.A.I itself references M.D.A. If you wanted the build to happen during pack you could change the target above to Build instead of GetTargetPath.

Make MDA.test use the props defined TFM
Comment out 2 unit tests
@codecov
Copy link

codecov bot commented Mar 4, 2021

Codecov Report

Merging #5641 (d325692) into master (f93fa09) will decrease coverage by 6.21%.
The diff coverage is 65.35%.

@@            Coverage Diff             @@
##           master    #5641      +/-   ##
==========================================
- Coverage   74.45%   68.24%   -6.22%     
==========================================
  Files        1072     1130      +58     
  Lines      195996   240121   +44125     
  Branches    21547    24920    +3373     
==========================================
+ Hits       145933   163862   +17929     
- Misses      44270    69787   +25517     
- Partials     5793     6472     +679     
Flag Coverage Δ
Debug 68.24% <65.35%> (-6.22%) ⬇️
production 62.89% <64.01%> (-7.72%) ⬇️
test 89.13% <90.35%> (+1.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
....Data.Analysis/DataFrameColumn.BinaryOperations.cs 0.00% <0.00%> (ø)
...FrameColumn.BinaryOperationAPIs.ExplodedColumns.cs 7.57% <ø> (ø)
...lysis/PrimitiveDataFrameColumn.BinaryOperations.cs 39.97% <ø> (ø)
...alysis/PrimitiveDataFrameColumn.BinaryOperators.cs 9.50% <ø> (ø)
...ata.Analysis/PrimitiveDataFrameColumnArithmetic.cs 49.49% <ø> (ø)
...a.Analysis/PrimitiveDataFrameColumnComputations.cs 45.70% <ø> (ø)
...Microsoft.Data.Analysis.Tests/DataFrame.IOTests.cs 98.38% <ø> (ø)
...ysis.Tests/DataFrameColumn.BinaryOperationTests.cs 100.00% <ø> (ø)
...ft.Data.Analysis.Tests/DataFrameTests.IDataView.cs 100.00% <ø> (ø)
...st/Microsoft.Data.Analysis.Tests/DataFrameTests.cs 99.94% <ø> (ø)
... and 128 more

@pgovind
Copy link
Author

pgovind commented Mar 5, 2021

Alright, I addressed this round of feedback. The only open one left is the Version for DataFrame.

@@ -0,0 +1,18 @@
<Project Sdk="Microsoft.NET.Sdk">

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabs hurt my eyes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, this setting must be mis-matched locally on my machine between VSCode and VS :\ Will fix in a bit

Copy link
Author

@pgovind pgovind Mar 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how did you see tabs in this file in GH? I changed all tabs to spaces (no idea how tabs occurred in the first place) in 4 csprojs in the PR, but unfortunately there is no "Change all tabs to spaces in all files" in VS I think. For now, a solution wide regex search yields no tabs in source code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked too indented then I saw it while trying to select white space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use CodeFlow to review PRs, and you can tell it to show whitespace. Arrows are tabs, dots are spaces.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has codeflow been fixed to work better with Github comment threads? That was a deal breaker in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it still has some issues with comment threads. Typically I only create new conversations from CodeFlow. And respond to threads in GH. I usually have them both open on 2 different windows. I just like seeing the full file, searching across the change, seeing the tree view of files, etc in CodeFlow. It's a much better experience for me.

<PropertyGroup>
<TargetFramework>netcoreapp3.1</TargetFramework>
<IsPackable>false</IsPackable>
<NoWarn>$(NoWarn);MSML_ParameterLocalVarName;SA1028</NoWarn>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SA1028 is about code comments not formatting correctly. That seems like something we should be able to fix (either here or in a follow up PR)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. My plan is to do it in a follow up PR. SA1028 and a bunch of the other warnings are easily fixable

@pgovind
Copy link
Author

pgovind commented Mar 9, 2021

Let me know if someone would like something else addressed in this PR. Just waiting for a sign off at this point. I think this is good to go in now.

</ItemGroup>

<ItemGroup>
<ProjectReference Include="..\Microsoft.ML.DataView\Microsoft.ML.DataView.csproj" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just putting this out there for thought:

We didn't depend on the latest Microsoft.ML.DataView before. We depended on 1.0.0. I wonder if it would be better if we didn't force this dependency to the latest version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any unification or breaking change issues? In general we've taken an aggressive update cadence with all packages in dotnet. It helps to ensure folks are patched by default, since Nuget doesn't automatically lift transitive dependencies.

I think someone can still use an old Microsoft.ML with a new DataView right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, ML.NET makes heavy usage of InternalsVisibleTo.

However, I looked and I don't think that is true for Microsoft.ML.DataView because at one point we wanted to split it completely apart from Microsoft.ML (we even named it Microsoft.Data.DataView for a time). So I think we should be OK with using a new Microsoft.Data.DataView with an older Microsoft.ML, but I don't think anyone tests it.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment on a dependency version. Other than that, I think we need to wait for green CI. But everything else looks good.

@pgovind
Copy link
Author

pgovind commented Mar 9, 2021

I think we need to wait for green CI

Wait I don't expect CI to be green here. Or is CI fixed in the repo already and I just haven't rebased yet? From a glance at the CI logs, it looks like the URLs to download some onnx images are wrong/have moved?

@eerhardt
Copy link
Member

eerhardt commented Mar 9, 2021

Wait I don't expect CI to be green here

We need to wait for the repo itself to be working before merging any more changes into it. If you merged this, there is a large potential that your change will break the branch even more.

@pgovind
Copy link
Author

pgovind commented Mar 11, 2021

Alright, CI is green! Thanks for the reviews everyone and @michaelgsharp for fixing CI

@pgovind pgovind merged commit b916d37 into dotnet:master Mar 11, 2021
@michaelgsharp michaelgsharp mentioned this pull request Mar 11, 2021
jwood803 added a commit to jwood803/machinelearning that referenced this pull request Apr 21, 2021
* update tensorflow.net to 0.20.0 (dotnet#5404)

* upgrade to 3.1

* write inline data using invariantCulture

* upodate tensorflow

* update Microsoft.ML.Vision

* fix test && comment

* udpate tensorflow.net to 0.20.1

* update tf major version

* downgrade tf runtime to 1.14.1

* Update Dependencies.props

* Update Dependencies.props

* update tffact to stop running test on linux with glibc < 2.3)

* fix TensorFlowTransformInputShapeTest

* use tf.v1 api

* fix comment:

* fix building error

* fix test

* fix nit

* remove linq

Co-authored-by: BigBigMiao <BigBigMiao@github.com>

* ProduceWordBags Onnx Export Fix  (dotnet#5435)

* fix for issue

* fix documentation

* aligning test

* adding back line

* aligning fix

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* [SrCnnEntireAnomalyDetector] Upgrade boundary calculation and expected value calculation (dotnet#5436)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Update OnnxRuntime to 1.5.2 (dotnet#5439)

* Added prerelease feed and updated to 1.5.2

* Remove prerelease feed

* Updated docs

* Update doc

* Fixed MacOS CI Pipeline builds (dotnet#5457)

* Added MacOS Homebrew bug fix

* nit fix

* Improving error message  (dotnet#5444)

* better error fix

* revisions

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Fixed MacOS daily & nightly builds due to Homebrew bug (dotnet#5467)

* Fixed MacOS nightly builds due to Homebrew bug

* Edit workaround

* Remove untapping of python2

* Nit edit

* Remove installation of mono-libgdiplus

* try installing mono-libgdiplus

* unlink python 3.8

* Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment (dotnet#5163)

* Fix issue when parsing float string fails on pl-PL culture set

* Added InvariantCulture float parsing as per CodeReview request

* Update src/Microsoft.ML.AutoML/Sweepers/SweeperProbabilityUtils.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Update Parameters.cs

* Added PL test

* Added multiple cultures

* debugging CI failure

* Debug runSpecific

* Revert "Debug runSpecific"

This reverts commit 95b7280.

* Removed LightGBM and addressed comments

* Increased time

* Increase time

* Increased time

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* handle exception during GetNextPipeline for AutoML (dotnet#5455)

* handle exception during GetNextPipeline for AutoML

* take comments

* Changing LoadRawImages Sample (dotnet#5460)

replacing example

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (dotnet#5445)

* Use ctx.CalncelExecution() to fix AutoML max-time experiment bug

* Added unit test for checking canceled experiment

* Nit fix

* Different run time on Linux

* Review

* Testing four ouput

* Used reflection to test for contexts being canceled

* Reviews

* Reviews

* Added main MLContext listener-timer

* Added PRNG on _context, held onto timers for avoiding GC

* Addressed reviews

* Unit test edits

* Increase run time of experiment to guarantee probabilities

* Edited unit test to check produced schema of next run model's predictions

* Remove scheme check as different CI builds result in varying schemas

* Decrease max experiment time unit test time

* Added Timers

* Increase second timer time, edit unit test

* Added try catch for OperationCanceledException in Execute()

* Add AggregateException try catch to slow unit tests for parallel testing

* Reviews

* Final reviews

* Added LightGBMFact to binary classification test

* Removed extra Operation Stopped exception try catch

* Add back OperationCanceledException to Experiment.cs

* fix issue 5020, allow ML.NET to load tf model with primitive input and output column (dotnet#5468)

* handle exception during GetNextPipeline for AutoML

* take comments

* Enable TesnflowTransformer take primitive type as input column

* undo unnecessary changes

* add test

* update on test

* remove unnecessary line

* take comments

* maxModels instead of time for AutoML unit test (dotnet#5471)

Uses the internal `maxModels` parameter instead of `MaxExperimentTimeInSeconds` for the exit criteria of AutoML. 

This is to increase the test stability in case the test is run on a slower machine.

* Disabling AutoFitMaxExperimentTimeTest

Disabling AutoFitMaxExperimentTimeTest

* Fix AutoFitMaxExperimentTimeTest (dotnet#5506)

*Fixed test
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* Fix SR anomaly score calculation at beginning (dotnet#5502)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

* adjust score calculation for first & second points

* fix sr do not report anomaly at beginning

* fix a issue in batch process

* remove a unused parameter

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Merge arcade to master (dotnet#5525)

* Initial commit for Arcade migration

* Added omitted files

* Changed strong name signing to use the same key for shipping and test assemblies

* arcade linux build (dotnet#5423)

* arcade linux build

* put file execution permission change into source control

* The `-test` command for windows. Nuget packages (dotnet#5464)

* working on testing

* testing updates

* tests almost working

* build changes

* all tests should be working

* changes from PR comments

* fixes for .net 3.1

* Fixed extension check. Removed <PackageId> where not needed

* Removed pkg folder and updated paths.

* Added test key. (dotnet#5475)

* Added test key.

* Update PublicKey.cs

Removed extra newline.

* Update ComponentCatalog.cs

Fixed 3 spaces to 4.

* Windows CI working (dotnet#5477)

* ci testing changes

* comments from pr

* Added Linux & Mac changes for Arcade (dotnet#5479)

* Initial Windows, Linux, Macos builds test

* Add Linux/MacOS specific CI requirements

* Run Arcade CI tests on MacOS/Linux

* Fix final package building

* Add benchmark download to benchmars .csporj file

* Print detailed status of each unit test

* Install CentOS & Ubuntu build dependencies

* Use container names to differenciate between Ubuntu & CentOS

* Remove sudo usage in CentOS

* Fix Linux build dependencies

* Add -y param to apt install

* Remove installation of Linux dependencies

* Minor additions

* Rename Benchmarks to PerformanceTests for Arcade

* Changes

* Added benchmark doc changes

* Pre-merge changes

* Fixing failing Arcade Windows Builds (dotnet#5482)

* Try Windows build single quote fix

* Remove %20

* Added variable space value

* Using variables for spacing

* Added space values as job parameters

* Try conditional variables again

* fix official builds

* Revert "fix official builds"

This reverts commit 7dbbdc7.

* fixing tensorflow rebase issue

* Fixes for many of the CI builds. (dotnet#5496)

* yml log changes

* Fix NetFX builds by ensuring assembly version is set correctly and not to Arcade default of 42.42.42.42 (dotnet#5503)

* Fixed official builds for Arcade SDK (dotnet#5512)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

Co-authored-by: Frank Dong <frdong@microsoft.com>

* fix code generator tests failure (dotnet#5520)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

* fix code generate test fails

* only add necessary dependency

Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>

* Fixed memory leaks from OnnxTransformer (dotnet#5518)

* Fixed memory leak from OnnxTransformer and related x86 build fixes

* Reverting x86 build related fixes to focus only on the memory leaks

* Updated docs

* Reverted OnnxRuntimeOutputCatcher to private class

* Addressed code review comments

* Refactored OnnxTransform back to using MapperBase based on code review comments

* Handle integration tests and nightly build testing (dotnet#5509)

* Make -integrationTests work

* Update .yml file

* Added the TargetArchitecture properties

* Try out -integrationTest

* Missed -integrationTest flag

* Renamed FunctionalTestBaseClass to IntegrationTestBaseClass

* Missed rename

* Modified tests to make them more stable

* Fixed leak in object pool (dotnet#5521)

Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>

* fix benchmark test timeout issue (dotnet#5530)

* removed old build stuff (dotnet#5531)

* Fixes Code Coverage in Arcade (dotnet#5528)

* arcade code coverage changes

* adding Michael's changes

* updating path

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Removed CODEOWNERS file to unify review process (dotnet#5535)

* Fix publishing problems (dotnet#5538)

* Removed our dependency to BuildTools by using the NugetCommand Azure Task.
* We should publish a nuget named "SampleUtils", but we were publishing it with the name "SamplesUtils"
* The naming conventions of our published nugets didn't match the ones described on arcade's docs: Versioning.md. I've also added the option so that when queuing the publishing build, we can pass the VERSIONKIND variable with value "release", so that it produces the nugets with arcade's conventions for "Release official build" nugets (as opposed to the "Daily official build" naming convention that's going to be used now by our CI that publishes nightly nugets).

* Updated prerelease label (dotnet#5540)

* Fix warnings from CI Build (dotnet#5541)

* fix warnings

* also add conditional copy asset to native.proj

* test fix warnings

* supress nuget warning 5118

* supress other warning

* remove unnecessary change

* put skip warning at Directory.Buil.props

* Updated build instructions (dotnet#5534)

* Updated build instructions

* Adressed reviews

* Reviews

* removed the rest of the old pkg references: (dotnet#5537)

* Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (dotnet#5395)

* Fix for issue 744

* cleanup

* fixing report output

* fixedTestReferenceOutputs

* Fixed test reference outputs for NetCore31

* change top k acc output string format

* Ranking algorithm now uses first appearance in dataset rather than worstCase

* fixed benchmark

* various minor changes from code review

* limit TopK to OutputTopKAcc parameter

* top k output name changes

* make old TopK readOnly

* restored old baselineOutputs since respecting outputTopK param means no topK in most test output

* fix test fails, re-add names parameter

* Clean up commented code

* that'll teach me to edit from the github webpage

* use existing method, fix nits

* Slight comment change

* Comment change / Touch to kick off build pipeline

* fix whitespace

* Added new test

* Code formatting nits

* Code formatting nit

* Fixed undefined rankofCorrectLabel and trailing whitespace warning

* Removed _numUnknownClassInstances and added test for unknown labels

* Add weight to seenRanks

* Nits

* Removed FastTree import

Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Fixed Spelling on stopwords (dotnet#5524)

* Changes to onnx export. (dotnet#5544)

* Add back missing test project from running on arcade (dotnet#5545)

* add back test result upload and add missing test project from running

* fix identification

* filter out performance test result files to avoid warnings

* [CodeGenerator] Fix MLNet.CLI build error. (dotnet#5546)

* upgrade to 3.1

* write inline data using invariantCulture

* fix mlnet build error

* Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (dotnet#5548)

* Fixed bug

* Tensorflow fix (dotnet#5547)

* fix tensorflow issue on sample repo

* add comments

* Update to OnnxRuntime 1.6.0 and fixed bug with sequences outputs (dotnet#5529)

* Use onnx prerelease

* Upgrade to onnx 1.6.0

* Updated docs

* Fixed problem with sequences

* added in DcgTruncationLevel to AutoML api (dotnet#5433)

* added in DcgTruncationLevel to automl api

* changed default to 10

* updated basline output

* fixed failing tests and baselines

* Changes from PR comments.

* Update src/Microsoft.ML.AutoML/Experiment/MetricsAgents/RankingMetricsAgent.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Changes based on PR comments.

* Fix ranking test.

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Created release notes for v1.5.3 (dotnet#5543)

* Created release notes for v1.5.3

* Updated with review comments

* Updated with review comments

* Updated release notes with latest PRs

* Fixed typo

* Forward logs of Experiment's sub MLContexts to main MLContext (dotnet#5554)

* Forward logs of Experiment's sub MLContexts to main MLContext

* Adressed reviews

* Update Stale docs (dotnet#5550)

* Updated OnnxMl.md

* Updated MlNetMklDeps docs

* Typo

* typo

* continueOnError on Brew Workaround (dotnet#5555)

* continueOnError:true

* Fix publishing symbols (dotnet#5556)

* Disable Portable PDB conversion

* Push packages to artifacts

* Fix symbols issues

* Added note about Microsoft.ML.dll

* try out just packing

* Return Build=false, but actually use configuration

* Added missing TargetArchitecture

* add back tests

* Added missing flags

* Updated version to 1.5.4 (dotnet#5557)

* Fixed version numbers in the right place (dotnet#5558)

* Updated version to 1.5.4

* Updated version to 1.5.4

* eng (dotnet#5560)

* Renamed release notes file (dotnet#5561)

* Renamed release notes file

* Updated version number in release notes

* Add SymSgdNative reference to AutoML.Tests.csproj (dotnet#5559)

* runSpecific in YAML

* RunSpecific in test

* Add SymSgdNative reference

* Revert "RunSpecific in test"

This reverts commit fed12b2.

* Revert "runSpecific in YAML"

This reverts commit f9f328d.

* Nuget.config url fix for roslyn compilers (dotnet#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* added in note that PredictionEngine is not thread safe (dotnet#5583)

* Onnx Export for ValueMapping estimator (dotnet#5577)

* Fixed Averaged Perceptron default value (dotnet#5586)

* fixed missed averaged perceptron default value

* fixed extension api

* fixed test baselines

* fixing official build (dotnet#5596)

* Release/1.5.4 fix (dotnet#5599)

* Nuget.config url fix for roslyn compilers (dotnet#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* fixing official build (dotnet#5596)

* Remove references to Microsoft.ML.Scoring (dotnet#5602)

This was the very first ONNX .NET bindings, it was replaced with Microsoft.ML.OnnxRuntime
then Microsoft.ML.OnnxRuntime.Managed.

* Make ColumnInference serializable (dotnet#5611)

* upgrade to 3.1

* write inline data using invariantCulture

* make column inference serializable

* add test json

* add approvaltests

* fixerd nuget.config (dotnet#5614)

* Fix issue in SRCnnEntireAnomalyDetector (dotnet#5579)

* update

* refine codes

* update comments

* update for nit

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Offer suggestions for possibly mistyped label column names in AutoML (dotnet#5574) (dotnet#5624)

* Offer suggestions for possibly mistyped label column names

* review changes

* TimeSeries - fix confidence parameter type for some detectors (dotnet#4058) (dotnet#5623)

* TimeSeries - fix confidence parameter type for some detectors.

- The public API exposed confidence parameters as int even though it's internally implemented as double
- There was no workaround since all classes where double is used are internal
- This caused major issues for software requiring high precision predictions
- This change to API should be backwards compatible since int can be passed to parameter of type double

* TimeSeries - reintroduce original methods with confidence parameter of type int (to not break the API).

* TimeSeries - make catalog API methods with int confidence parameter deprecated.

- Tests adjusted to not use the deprecated methods

* Update Conversion.cs (dotnet#5627)

* Documentation updates (dotnet#5635)

* documentation updates

* fixed spelling error

* Update docs/building/unix-instructions.md

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

* AutoML aggregate exception (dotnet#5631)

* added check for aggregate exception

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* pulled message out to private variable so its not duplicated

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Treat TensorFlow output as non-batched. (dotnet#5634)

* Can now not treat output as batched.

* updated comments based on PR comments.

* Fixing saving/loading with new parameter.

* Updates based on PR comments

* Update src/Microsoft.ML.TensorFlow/TensorflowUtils.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* reverted accidental test changes

* fixes based on PR comments

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Added in release notes for 1.5.5 (dotnet#5639)

* added in release notes

* Update release-1.5.5.md

Removed incorrect PR.

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* updating version after release (dotnet#5642)

* Move DataFrame to machinelearning (dotnet#5641)

* Change namespace to Microsoft.Data.Analysis (dotnet#2773)

* Update namespace to Microsoft.Data.Analysis

* Remove "DataFrame" from the test project name

* APIs for reversed binary operators (dotnet#2769)

* Support reverse binary operators

* Fix file left behind in a rebase

* Fix whitespace

* Throw for incompatible inPlace (dotnet#2778)

* Throw if inPlace is set and types mismatch

* Unit test

* Better error message

* Remove empty lines

* Version, Tags and Description for Nuget (dotnet#2779)

* Version, Tags and Description for Nuget

* sq

* Flags for release  (dotnet#2781)

* Publish packages to artifacts

* Flags for release

* Fix the Description method to not throw (dotnet#2786)

* Fix the Description method to not crash
Adds an Info method

* sq

* Address feddback

* Last round of feedback

* Use dataTypes if it passed in to LoadCsv (dotnet#2791)

* Fix LoadCsv to use dataType if it passed in

* sq

* Don't read the full file after guessRows lines have been read

* Address feedback

* Last round of feedback

* Creating a `Rows` property, similar to `Columns` (dotnet#2794)

* Rows collection, similar to Columns

* Doc

* Some minor clean up

* Make DataFrameRow a view into the DataFrame

* sq

* Address feedback

* Remove DataFrame.RowCount

* More row count changes

* sq

* Address feedback

* Merge upstream

* DataFrame.LoadCsv throws an exception on projects targeting < netcore3.0 (dotnet#2797)

Fixing by passing in an encoding and a default buffer size.

Also, get our tests running on .NET Framework.

Fix dotnet#2783

* Params constructor on DataFrame (dotnet#2800)

* Params constructor on DataFrame

* Delete redundant constructors

* Remove `T : unmanaged` constraint from DataFrameColumn.BinaryOperations (dotnet#2801)

* Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations

* Address feedback

* Rename the value version of the APIs

* sq

* Fix build

* Address feedback

* Remove Value from the APIs

* sq

* Address feedback

* Bump version to 0.2.0 (dotnet#2803)

* Add Apply<TResult>method to PrimitiveDataFrameColumn (dotnet#2807)

* Add Apply method to PrimitiveDataFrameColumn and its container

* Add TestApply test

* Remove unused df variable in DataFrameTests

* Add xml doc comments to Apply method

* Add additional tests for ReadCsv (dotnet#2811)

* Add additional tests for ReadCsv

* Update asserts

* Add empty row and skip test pending another fix

* Remove test for another issue

* Added static factory methods to DataFrameColumn  (dotnet#2808)

* Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type).

* Remove regions

* Update some parts of the unit tests to use static factory methods to create DataFrameColumns.

* Remove errant {T} on StringDataFrameColumn.

* PR feedback

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Append rows to a DataFrame (dotnet#2823)

* Append rows to a DataFrame

* Unit test

* Update unit tests and doc

* Need to perfrom a type check every time

* sq

* Update unit test

* Address comments

* Move corefxlab to arcade (dotnet#2795)

* Add eng folder

* First cut of moving corefxlab to arcade

* Move arcade symbol validation inside official buil

* Move base yml file to root

* Arcade will build, publish packages and symbols

* UpdateXlf. Review this

* Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel

* Remove property that was causing the build to fail

* Moving global properties to the main Yaml instead of step in order to unblock publishing

* Committing xlfs and changing the build script to not update Xlf on build

* clean up corefxlab-base.yml

* sq

* Delete unused files and scripts

* Get rid of all the xlf stuff

* Remove UpdateXlfOnBuild for non-NT builds

* Minor cleanup

* More cleanup

* update eng\build.sh permission

* Rename to Nuget.config

* sq

* Remove the runtime spec from global.json

* Don't publish test projs

* Typo

* Move version prefix to versions.props
Change prereleaselabel to alpha

* Increment version number to list as the latest package
Increment version number of Microsoft.Experimental.Collections to list as the latest package
Turn off graph generation

* Update the Readme

* Test removing the scripts folder

* Touch readme to force a change

* Address Jose's comments

* Typo

* Move versions to eng/versions.props

* Benchmark.proj needs to refer to xunit

* Clean up dependencies.props

* Remove dependencies.props

Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>

* Rename Sort to OrderBy (dotnet#2814)

* Rename sort to orderby and add orderbydescending method

* Add doc strings

* Update bench mark test

* Update tests

* Update DataFrameColumn to use orderby

* Update doc comment

* Additions to sortby

* Revert "Additions to sortby"

This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35.

* Revert "Update doc comment"

This reverts commit 192f7797fe2b77625486637badf77046162fedbf.

* Revert "Update DataFrameColumn to use orderby"

This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4.

* Explode column types and generate converters (dotnet#2857)

* Explode column types and generate converters

* Clean this

* sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Address remaining concerns from the 2nd DataFrame API Review  (dotnet#2861)

* Move string indexer to Columns

* API changes from the 2nd API review

* Unit tests

* Address comments

* Add binary operations and operators on the exploded columns (dotnet#2867)

* Generate combinations of binary operations and Add

* Numeric Converters and CloneAsNumericColumns

* Binary, Comparison and Shift operations

* Clean up and bug fix

* Fix the binary op apis to not be overridden

* Internal constructors for exploded types

* Proper return types for exploded types

* Update unit tests

* Update csproj

* Revert "Fix the binary op apis to not be overridden"

This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1.

* Bug fix and unit test

* Constructor that takes in a container

* Unit tests

* Call the implementation where possible

* Review sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Rename to the system namespace column types

* Address comments

* Push to pull locally

* Mimic C#'s arithmetic grammar in DataFrame

* Address feedback

* Reduce the number of partial column definitions

* Address feedback

* Add APIs to get the strongly typed columns from a DataFrame (dotnet#2878)

* CP

* sq

* sq

* Improve docs

* Enable xml docs for Data.Analysis (dotnet#2882)

* Enable xml docs for Data.Analysis

* Fix /// summary around inheritdoc

* Minor doc changes

* sq

* sq

* Address feedback

* Add Apply to ArrowStringDataFrameColumn (dotnet#2889)

* Support for Exploded columns types in Arrow and IO scenarios (dotnet#2885)

* Support for Exploded columns types in Arrow and IO scenarios

* Unit tests

* Address feedback

* Bump version (dotnet#2890)

* Fix versioning to allow for individual stable packages (dotnet#2891)

* Fix versioning to allow for individual stable packages

* sq

* Bump Microsoft.Data.Analysis version to 0.4.0 (dotnet#2892)

* Bump Microsoft.Data.Analysis version to 0.4.0

* Fix dotnet/corefxlab#2906 (dotnet#2907)

* Fix dotnet/corefxlab#2906

* Improvements and unit tests

* sq

* Better fix

* sq

* Improve LoadCsv to handle null values when deducing the column types (dotnet#2916)

* Unit test to repro

* Fix dotnet/corefxlab#2915

Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn

* Update src/Microsoft.Data.Analysis/DataFrame.IO.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Update src/Microsoft.Data.Analysis/DataFrame.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Feedback

Co-authored-by: Günther Foidl <gue@korporal.at>

* Create a 0.4.0 package (dotnet#2918)

* Revert "Create a 0.4.0 package (dotnet#2918)" (dotnet#2919)

This reverts commit 0bef531.

* Produce a 0.4.0 build (dotnet#2920)

* Default Length for StringDataFrameColumn (dotnet#2921) (dotnet#2923)

* Increment version and stop producing stable packages (dotnet#2922)

* Increment version and stop producing stable packages

* Add DataFrame object formatter. (dotnet#2931)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Fix a bug in InsertColumn

* Add Microsoft.Data.Analysis.nuget project (dotnet#2933)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj.

* Add Microsoft.Data.Analysis.nuget project.

* Move project to src. Fix nuget project settings.

* Remove NoBuild property from project.

* Remove IncludeBuildOutput and IncludeSymbols from project.

* Add VersionPrefix to project.

* Add IncludeBuildOutput property.

* Add unit tests.

* Downgrade from netcoreapp3.1 to netcoreapp3.0

* Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0)

* Add netcoreapp3.1 to global settings

* Add dotnet 3.1.5 runtime to global settings

* Build fixes

* Moving MDAI into interactive-extensions folder of the package

* Minor refactoring

* Respond to PR feedback

Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* ColumnName indexer on DataFrame (dotnet#2959)

* ColumnName indexer on DataFrame

Fixes dotnet/corefxlab#2934

* Unit tests

* Null column name

* Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: false (dotnet#2956)

* implement FillNulls method for ArrowStringDataFrameColumn

* additional asserts for testcase

* Prevent DataFrame.Sample() method from returning duplicated rows (dotnet#2939)

* resolves dotnet#2806

* replace forloop with ArraySegment<T>

* reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows)

* Add WriteCsv plus unit tests. (dotnet#2947)

* Add WriteCsv plus unit tests.

* Add CultureInfo to WriteCsv. Remove index column param. Update unit tests.

* Add CR changes. CultureInfo. Separator.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Missing values default to a `StringDataFrameColumn` (dotnet#2982)

* Make LoadCsv more robust

* Test empty string column

* Retain prev guess where possible

* Update FromArrowRecordBatches for dotnet-spark (dotnet#2978)

* Support for RecordBatches with StructArrays

* Sq

* Address comments

* Nits

* Nits

* Implement DataFrame.LoadCsvFromString (dotnet#2988)

* Implement DataFrame.LoadCsvFromString

* Address comments

* Part 1 of porting the csv reader (dotnet#2997)

* Move to the test folder

* Suppress warnings

* Move extensions reference out of props

Make MDA.test use the props defined TFM
Comment out 2 unit tests

* Address feedback

* Address feedback

* Default to preview version

* Update nuget.config

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com>
Co-authored-by: Jon Wood <jwood803@users.noreply.github.com>
Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>
Co-authored-by: Rhys Parry <rhys@i-think22.net>
Co-authored-by: daniel costea <dcostea@users.noreply.github.com>
Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com>

* Update to the latest Microsoft.DotNet.Interactive (dotnet#5710)

* Update to the latest Microsoft.DotNet.Interactive

* Add System.CommandLine nuget feed

* Fix Data.Analysis.Interactive test

* added main branch to yml files (dotnet#5715)

* Renamed master to main (dotnet#5717)

* renamed master to main

* Update vsts-ci.yml

* updated urls

* renamed master to main (dotnet#5719)

* IDataView to DataFrame (dotnet#5712)

* IDataView -> DataFrame

Implement the virtual function

* More APIs and unit tests

* ANother unit test

* Address feedback

* Last bit of feedback

* Fix some stuff and unit tests

* sq

* Move RowCursor back

* Remove unused param

Docs
maxRows
More unit tests
Fixed ArrowStringDataFrameColumn construction in the unit test

* Improve csv parsing (dotnet#5711)

* Part 2 of TextFieldParser.

Next up is hooking up ReadCsv to use TextFieldParser

* Make LoadCsv use TextFieldParser

* More unit tests

* cleanup

* Address feedback

* Last bit of feedback

* Remove extra var

* Remove duplicate file

* Rename strings.resx to Strings.resx

* rename the designer.cs file too

* Fix doc markdown (dotnet#5732)

Fixed documentation markdown remarks for
* MulticlassClassificationMetrics.LogLoss
* MulticlassClassificationMetrics.LogLossReduction

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Use Official package for SharpZipLib (dotnet#5735)

Co-authored-by: Xiaoyun Zhang <bigmiao.zhang@gmail.com>
Co-authored-by: BigBigMiao <BigBigMiao@github.com>
Co-authored-by: Keren Fuentes <dkeren@seas.upenn.edu>
Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>
Co-authored-by: Yuanxiang Ying <yingyuanxiang34@sina.com>
Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Piotr Telman <ptelman@users.noreply.github.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Harish Kulkarni <harishsk@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Jason DeBoever <github@deboever.us>
Co-authored-by: Leo Gaunt <36968548+LeoGaunt@users.noreply.github.com>
Co-authored-by: Keren Fuentes <kerenfuentes313@gmail.com>
Co-authored-by: Eric StJohn <ericstj@microsoft.com>
Co-authored-by: Ivan Agarský <agarskyivan@gmail.com>
Co-authored-by: Andrej Kmetík <akmetik@gmail.com>
Co-authored-by: Phan Tấn Tài <37982283+4201104140@users.noreply.github.com>
Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com>
Co-authored-by: Jon Wood <jwood803@users.noreply.github.com>
Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>
Co-authored-by: Rhys Parry <rhys@i-think22.net>
Co-authored-by: daniel costea <dcostea@users.noreply.github.com>
Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com>
Co-authored-by: Robin Windey <ro.windey@gmail.com>
jwood803 added a commit to jwood803/machinelearning that referenced this pull request Apr 26, 2021
* update tensorflow.net to 0.20.0 (dotnet#5404)

* upgrade to 3.1

* write inline data using invariantCulture

* upodate tensorflow

* update Microsoft.ML.Vision

* fix test && comment

* udpate tensorflow.net to 0.20.1

* update tf major version

* downgrade tf runtime to 1.14.1

* Update Dependencies.props

* Update Dependencies.props

* update tffact to stop running test on linux with glibc < 2.3)

* fix TensorFlowTransformInputShapeTest

* use tf.v1 api

* fix comment:

* fix building error

* fix test

* fix nit

* remove linq

Co-authored-by: BigBigMiao <BigBigMiao@github.com>

* ProduceWordBags Onnx Export Fix  (dotnet#5435)

* fix for issue

* fix documentation

* aligning test

* adding back line

* aligning fix

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* [SrCnnEntireAnomalyDetector] Upgrade boundary calculation and expected value calculation (dotnet#5436)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Update OnnxRuntime to 1.5.2 (dotnet#5439)

* Added prerelease feed and updated to 1.5.2

* Remove prerelease feed

* Updated docs

* Update doc

* Fixed MacOS CI Pipeline builds (dotnet#5457)

* Added MacOS Homebrew bug fix

* nit fix

* Improving error message  (dotnet#5444)

* better error fix

* revisions

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Fixed MacOS daily & nightly builds due to Homebrew bug (dotnet#5467)

* Fixed MacOS nightly builds due to Homebrew bug

* Edit workaround

* Remove untapping of python2

* Nit edit

* Remove installation of mono-libgdiplus

* try installing mono-libgdiplus

* unlink python 3.8

* Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment (dotnet#5163)

* Fix issue when parsing float string fails on pl-PL culture set

* Added InvariantCulture float parsing as per CodeReview request

* Update src/Microsoft.ML.AutoML/Sweepers/SweeperProbabilityUtils.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Update Parameters.cs

* Added PL test

* Added multiple cultures

* debugging CI failure

* Debug runSpecific

* Revert "Debug runSpecific"

This reverts commit 95b7280.

* Removed LightGBM and addressed comments

* Increased time

* Increase time

* Increased time

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* handle exception during GetNextPipeline for AutoML (dotnet#5455)

* handle exception during GetNextPipeline for AutoML

* take comments

* Changing LoadRawImages Sample (dotnet#5460)

replacing example

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (dotnet#5445)

* Use ctx.CalncelExecution() to fix AutoML max-time experiment bug

* Added unit test for checking canceled experiment

* Nit fix

* Different run time on Linux

* Review

* Testing four ouput

* Used reflection to test for contexts being canceled

* Reviews

* Reviews

* Added main MLContext listener-timer

* Added PRNG on _context, held onto timers for avoiding GC

* Addressed reviews

* Unit test edits

* Increase run time of experiment to guarantee probabilities

* Edited unit test to check produced schema of next run model's predictions

* Remove scheme check as different CI builds result in varying schemas

* Decrease max experiment time unit test time

* Added Timers

* Increase second timer time, edit unit test

* Added try catch for OperationCanceledException in Execute()

* Add AggregateException try catch to slow unit tests for parallel testing

* Reviews

* Final reviews

* Added LightGBMFact to binary classification test

* Removed extra Operation Stopped exception try catch

* Add back OperationCanceledException to Experiment.cs

* fix issue 5020, allow ML.NET to load tf model with primitive input and output column (dotnet#5468)

* handle exception during GetNextPipeline for AutoML

* take comments

* Enable TesnflowTransformer take primitive type as input column

* undo unnecessary changes

* add test

* update on test

* remove unnecessary line

* take comments

* maxModels instead of time for AutoML unit test (dotnet#5471)

Uses the internal `maxModels` parameter instead of `MaxExperimentTimeInSeconds` for the exit criteria of AutoML. 

This is to increase the test stability in case the test is run on a slower machine.

* Disabling AutoFitMaxExperimentTimeTest

Disabling AutoFitMaxExperimentTimeTest

* Fix AutoFitMaxExperimentTimeTest (dotnet#5506)

*Fixed test
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* Fix SR anomaly score calculation at beginning (dotnet#5502)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

* adjust score calculation for first & second points

* fix sr do not report anomaly at beginning

* fix a issue in batch process

* remove a unused parameter

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Merge arcade to master (dotnet#5525)

* Initial commit for Arcade migration

* Added omitted files

* Changed strong name signing to use the same key for shipping and test assemblies

* arcade linux build (dotnet#5423)

* arcade linux build

* put file execution permission change into source control

* The `-test` command for windows. Nuget packages (dotnet#5464)

* working on testing

* testing updates

* tests almost working

* build changes

* all tests should be working

* changes from PR comments

* fixes for .net 3.1

* Fixed extension check. Removed <PackageId> where not needed

* Removed pkg folder and updated paths.

* Added test key. (dotnet#5475)

* Added test key.

* Update PublicKey.cs

Removed extra newline.

* Update ComponentCatalog.cs

Fixed 3 spaces to 4.

* Windows CI working (dotnet#5477)

* ci testing changes

* comments from pr

* Added Linux & Mac changes for Arcade (dotnet#5479)

* Initial Windows, Linux, Macos builds test

* Add Linux/MacOS specific CI requirements

* Run Arcade CI tests on MacOS/Linux

* Fix final package building

* Add benchmark download to benchmars .csporj file

* Print detailed status of each unit test

* Install CentOS & Ubuntu build dependencies

* Use container names to differenciate between Ubuntu & CentOS

* Remove sudo usage in CentOS

* Fix Linux build dependencies

* Add -y param to apt install

* Remove installation of Linux dependencies

* Minor additions

* Rename Benchmarks to PerformanceTests for Arcade

* Changes

* Added benchmark doc changes

* Pre-merge changes

* Fixing failing Arcade Windows Builds (dotnet#5482)

* Try Windows build single quote fix

* Remove %20

* Added variable space value

* Using variables for spacing

* Added space values as job parameters

* Try conditional variables again

* fix official builds

* Revert "fix official builds"

This reverts commit 7dbbdc7.

* fixing tensorflow rebase issue

* Fixes for many of the CI builds. (dotnet#5496)

* yml log changes

* Fix NetFX builds by ensuring assembly version is set correctly and not to Arcade default of 42.42.42.42 (dotnet#5503)

* Fixed official builds for Arcade SDK (dotnet#5512)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

Co-authored-by: Frank Dong <frdong@microsoft.com>

* fix code generator tests failure (dotnet#5520)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

* fix code generate test fails

* only add necessary dependency

Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>

* Fixed memory leaks from OnnxTransformer (dotnet#5518)

* Fixed memory leak from OnnxTransformer and related x86 build fixes

* Reverting x86 build related fixes to focus only on the memory leaks

* Updated docs

* Reverted OnnxRuntimeOutputCatcher to private class

* Addressed code review comments

* Refactored OnnxTransform back to using MapperBase based on code review comments

* Handle integration tests and nightly build testing (dotnet#5509)

* Make -integrationTests work

* Update .yml file

* Added the TargetArchitecture properties

* Try out -integrationTest

* Missed -integrationTest flag

* Renamed FunctionalTestBaseClass to IntegrationTestBaseClass

* Missed rename

* Modified tests to make them more stable

* Fixed leak in object pool (dotnet#5521)

Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>

* fix benchmark test timeout issue (dotnet#5530)

* removed old build stuff (dotnet#5531)

* Fixes Code Coverage in Arcade (dotnet#5528)

* arcade code coverage changes

* adding Michael's changes

* updating path

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Removed CODEOWNERS file to unify review process (dotnet#5535)

* Fix publishing problems (dotnet#5538)

* Removed our dependency to BuildTools by using the NugetCommand Azure Task.
* We should publish a nuget named "SampleUtils", but we were publishing it with the name "SamplesUtils"
* The naming conventions of our published nugets didn't match the ones described on arcade's docs: Versioning.md. I've also added the option so that when queuing the publishing build, we can pass the VERSIONKIND variable with value "release", so that it produces the nugets with arcade's conventions for "Release official build" nugets (as opposed to the "Daily official build" naming convention that's going to be used now by our CI that publishes nightly nugets).

* Updated prerelease label (dotnet#5540)

* Fix warnings from CI Build (dotnet#5541)

* fix warnings

* also add conditional copy asset to native.proj

* test fix warnings

* supress nuget warning 5118

* supress other warning

* remove unnecessary change

* put skip warning at Directory.Buil.props

* Updated build instructions (dotnet#5534)

* Updated build instructions

* Adressed reviews

* Reviews

* removed the rest of the old pkg references: (dotnet#5537)

* Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (dotnet#5395)

* Fix for issue 744

* cleanup

* fixing report output

* fixedTestReferenceOutputs

* Fixed test reference outputs for NetCore31

* change top k acc output string format

* Ranking algorithm now uses first appearance in dataset rather than worstCase

* fixed benchmark

* various minor changes from code review

* limit TopK to OutputTopKAcc parameter

* top k output name changes

* make old TopK readOnly

* restored old baselineOutputs since respecting outputTopK param means no topK in most test output

* fix test fails, re-add names parameter

* Clean up commented code

* that'll teach me to edit from the github webpage

* use existing method, fix nits

* Slight comment change

* Comment change / Touch to kick off build pipeline

* fix whitespace

* Added new test

* Code formatting nits

* Code formatting nit

* Fixed undefined rankofCorrectLabel and trailing whitespace warning

* Removed _numUnknownClassInstances and added test for unknown labels

* Add weight to seenRanks

* Nits

* Removed FastTree import

Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Fixed Spelling on stopwords (dotnet#5524)

* Changes to onnx export. (dotnet#5544)

* Add back missing test project from running on arcade (dotnet#5545)

* add back test result upload and add missing test project from running

* fix identification

* filter out performance test result files to avoid warnings

* [CodeGenerator] Fix MLNet.CLI build error. (dotnet#5546)

* upgrade to 3.1

* write inline data using invariantCulture

* fix mlnet build error

* Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (dotnet#5548)

* Fixed bug

* Tensorflow fix (dotnet#5547)

* fix tensorflow issue on sample repo

* add comments

* Update to OnnxRuntime 1.6.0 and fixed bug with sequences outputs (dotnet#5529)

* Use onnx prerelease

* Upgrade to onnx 1.6.0

* Updated docs

* Fixed problem with sequences

* added in DcgTruncationLevel to AutoML api (dotnet#5433)

* added in DcgTruncationLevel to automl api

* changed default to 10

* updated basline output

* fixed failing tests and baselines

* Changes from PR comments.

* Update src/Microsoft.ML.AutoML/Experiment/MetricsAgents/RankingMetricsAgent.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Changes based on PR comments.

* Fix ranking test.

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Created release notes for v1.5.3 (dotnet#5543)

* Created release notes for v1.5.3

* Updated with review comments

* Updated with review comments

* Updated release notes with latest PRs

* Fixed typo

* Forward logs of Experiment's sub MLContexts to main MLContext (dotnet#5554)

* Forward logs of Experiment's sub MLContexts to main MLContext

* Adressed reviews

* Update Stale docs (dotnet#5550)

* Updated OnnxMl.md

* Updated MlNetMklDeps docs

* Typo

* typo

* continueOnError on Brew Workaround (dotnet#5555)

* continueOnError:true

* Fix publishing symbols (dotnet#5556)

* Disable Portable PDB conversion

* Push packages to artifacts

* Fix symbols issues

* Added note about Microsoft.ML.dll

* try out just packing

* Return Build=false, but actually use configuration

* Added missing TargetArchitecture

* add back tests

* Added missing flags

* Updated version to 1.5.4 (dotnet#5557)

* Fixed version numbers in the right place (dotnet#5558)

* Updated version to 1.5.4

* Updated version to 1.5.4

* eng (dotnet#5560)

* Renamed release notes file (dotnet#5561)

* Renamed release notes file

* Updated version number in release notes

* Add SymSgdNative reference to AutoML.Tests.csproj (dotnet#5559)

* runSpecific in YAML

* RunSpecific in test

* Add SymSgdNative reference

* Revert "RunSpecific in test"

This reverts commit fed12b2.

* Revert "runSpecific in YAML"

This reverts commit f9f328d.

* Nuget.config url fix for roslyn compilers (dotnet#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* added in note that PredictionEngine is not thread safe (dotnet#5583)

* Onnx Export for ValueMapping estimator (dotnet#5577)

* Fixed Averaged Perceptron default value (dotnet#5586)

* fixed missed averaged perceptron default value

* fixed extension api

* fixed test baselines

* fixing official build (dotnet#5596)

* Release/1.5.4 fix (dotnet#5599)

* Nuget.config url fix for roslyn compilers (dotnet#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* fixing official build (dotnet#5596)

* Remove references to Microsoft.ML.Scoring (dotnet#5602)

This was the very first ONNX .NET bindings, it was replaced with Microsoft.ML.OnnxRuntime
then Microsoft.ML.OnnxRuntime.Managed.

* Make ColumnInference serializable (dotnet#5611)

* upgrade to 3.1

* write inline data using invariantCulture

* make column inference serializable

* add test json

* add approvaltests

* fixerd nuget.config (dotnet#5614)

* Fix issue in SRCnnEntireAnomalyDetector (dotnet#5579)

* update

* refine codes

* update comments

* update for nit

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Offer suggestions for possibly mistyped label column names in AutoML (dotnet#5574) (dotnet#5624)

* Offer suggestions for possibly mistyped label column names

* review changes

* TimeSeries - fix confidence parameter type for some detectors (dotnet#4058) (dotnet#5623)

* TimeSeries - fix confidence parameter type for some detectors.

- The public API exposed confidence parameters as int even though it's internally implemented as double
- There was no workaround since all classes where double is used are internal
- This caused major issues for software requiring high precision predictions
- This change to API should be backwards compatible since int can be passed to parameter of type double

* TimeSeries - reintroduce original methods with confidence parameter of type int (to not break the API).

* TimeSeries - make catalog API methods with int confidence parameter deprecated.

- Tests adjusted to not use the deprecated methods

* Update Conversion.cs (dotnet#5627)

* Documentation updates (dotnet#5635)

* documentation updates

* fixed spelling error

* Update docs/building/unix-instructions.md

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

* AutoML aggregate exception (dotnet#5631)

* added check for aggregate exception

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* pulled message out to private variable so its not duplicated

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Treat TensorFlow output as non-batched. (dotnet#5634)

* Can now not treat output as batched.

* updated comments based on PR comments.

* Fixing saving/loading with new parameter.

* Updates based on PR comments

* Update src/Microsoft.ML.TensorFlow/TensorflowUtils.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* reverted accidental test changes

* fixes based on PR comments

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Added in release notes for 1.5.5 (dotnet#5639)

* added in release notes

* Update release-1.5.5.md

Removed incorrect PR.

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* updating version after release (dotnet#5642)

* Move DataFrame to machinelearning (dotnet#5641)

* Change namespace to Microsoft.Data.Analysis (dotnet#2773)

* Update namespace to Microsoft.Data.Analysis

* Remove "DataFrame" from the test project name

* APIs for reversed binary operators (dotnet#2769)

* Support reverse binary operators

* Fix file left behind in a rebase

* Fix whitespace

* Throw for incompatible inPlace (dotnet#2778)

* Throw if inPlace is set and types mismatch

* Unit test

* Better error message

* Remove empty lines

* Version, Tags and Description for Nuget (dotnet#2779)

* Version, Tags and Description for Nuget

* sq

* Flags for release  (dotnet#2781)

* Publish packages to artifacts

* Flags for release

* Fix the Description method to not throw (dotnet#2786)

* Fix the Description method to not crash
Adds an Info method

* sq

* Address feddback

* Last round of feedback

* Use dataTypes if it passed in to LoadCsv (dotnet#2791)

* Fix LoadCsv to use dataType if it passed in

* sq

* Don't read the full file after guessRows lines have been read

* Address feedback

* Last round of feedback

* Creating a `Rows` property, similar to `Columns` (dotnet#2794)

* Rows collection, similar to Columns

* Doc

* Some minor clean up

* Make DataFrameRow a view into the DataFrame

* sq

* Address feedback

* Remove DataFrame.RowCount

* More row count changes

* sq

* Address feedback

* Merge upstream

* DataFrame.LoadCsv throws an exception on projects targeting < netcore3.0 (dotnet#2797)

Fixing by passing in an encoding and a default buffer size.

Also, get our tests running on .NET Framework.

Fix dotnet#2783

* Params constructor on DataFrame (dotnet#2800)

* Params constructor on DataFrame

* Delete redundant constructors

* Remove `T : unmanaged` constraint from DataFrameColumn.BinaryOperations (dotnet#2801)

* Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations

* Address feedback

* Rename the value version of the APIs

* sq

* Fix build

* Address feedback

* Remove Value from the APIs

* sq

* Address feedback

* Bump version to 0.2.0 (dotnet#2803)

* Add Apply<TResult>method to PrimitiveDataFrameColumn (dotnet#2807)

* Add Apply method to PrimitiveDataFrameColumn and its container

* Add TestApply test

* Remove unused df variable in DataFrameTests

* Add xml doc comments to Apply method

* Add additional tests for ReadCsv (dotnet#2811)

* Add additional tests for ReadCsv

* Update asserts

* Add empty row and skip test pending another fix

* Remove test for another issue

* Added static factory methods to DataFrameColumn  (dotnet#2808)

* Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type).

* Remove regions

* Update some parts of the unit tests to use static factory methods to create DataFrameColumns.

* Remove errant {T} on StringDataFrameColumn.

* PR feedback

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Append rows to a DataFrame (dotnet#2823)

* Append rows to a DataFrame

* Unit test

* Update unit tests and doc

* Need to perfrom a type check every time

* sq

* Update unit test

* Address comments

* Move corefxlab to arcade (dotnet#2795)

* Add eng folder

* First cut of moving corefxlab to arcade

* Move arcade symbol validation inside official buil

* Move base yml file to root

* Arcade will build, publish packages and symbols

* UpdateXlf. Review this

* Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel

* Remove property that was causing the build to fail

* Moving global properties to the main Yaml instead of step in order to unblock publishing

* Committing xlfs and changing the build script to not update Xlf on build

* clean up corefxlab-base.yml

* sq

* Delete unused files and scripts

* Get rid of all the xlf stuff

* Remove UpdateXlfOnBuild for non-NT builds

* Minor cleanup

* More cleanup

* update eng\build.sh permission

* Rename to Nuget.config

* sq

* Remove the runtime spec from global.json

* Don't publish test projs

* Typo

* Move version prefix to versions.props
Change prereleaselabel to alpha

* Increment version number to list as the latest package
Increment version number of Microsoft.Experimental.Collections to list as the latest package
Turn off graph generation

* Update the Readme

* Test removing the scripts folder

* Touch readme to force a change

* Address Jose's comments

* Typo

* Move versions to eng/versions.props

* Benchmark.proj needs to refer to xunit

* Clean up dependencies.props

* Remove dependencies.props

Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>

* Rename Sort to OrderBy (dotnet#2814)

* Rename sort to orderby and add orderbydescending method

* Add doc strings

* Update bench mark test

* Update tests

* Update DataFrameColumn to use orderby

* Update doc comment

* Additions to sortby

* Revert "Additions to sortby"

This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35.

* Revert "Update doc comment"

This reverts commit 192f7797fe2b77625486637badf77046162fedbf.

* Revert "Update DataFrameColumn to use orderby"

This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4.

* Explode column types and generate converters (dotnet#2857)

* Explode column types and generate converters

* Clean this

* sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Address remaining concerns from the 2nd DataFrame API Review  (dotnet#2861)

* Move string indexer to Columns

* API changes from the 2nd API review

* Unit tests

* Address comments

* Add binary operations and operators on the exploded columns (dotnet#2867)

* Generate combinations of binary operations and Add

* Numeric Converters and CloneAsNumericColumns

* Binary, Comparison and Shift operations

* Clean up and bug fix

* Fix the binary op apis to not be overridden

* Internal constructors for exploded types

* Proper return types for exploded types

* Update unit tests

* Update csproj

* Revert "Fix the binary op apis to not be overridden"

This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1.

* Bug fix and unit test

* Constructor that takes in a container

* Unit tests

* Call the implementation where possible

* Review sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Rename to the system namespace column types

* Address comments

* Push to pull locally

* Mimic C#'s arithmetic grammar in DataFrame

* Address feedback

* Reduce the number of partial column definitions

* Address feedback

* Add APIs to get the strongly typed columns from a DataFrame (dotnet#2878)

* CP

* sq

* sq

* Improve docs

* Enable xml docs for Data.Analysis (dotnet#2882)

* Enable xml docs for Data.Analysis

* Fix /// summary around inheritdoc

* Minor doc changes

* sq

* sq

* Address feedback

* Add Apply to ArrowStringDataFrameColumn (dotnet#2889)

* Support for Exploded columns types in Arrow and IO scenarios (dotnet#2885)

* Support for Exploded columns types in Arrow and IO scenarios

* Unit tests

* Address feedback

* Bump version (dotnet#2890)

* Fix versioning to allow for individual stable packages (dotnet#2891)

* Fix versioning to allow for individual stable packages

* sq

* Bump Microsoft.Data.Analysis version to 0.4.0 (dotnet#2892)

* Bump Microsoft.Data.Analysis version to 0.4.0

* Fix dotnet/corefxlab#2906 (dotnet#2907)

* Fix dotnet/corefxlab#2906

* Improvements and unit tests

* sq

* Better fix

* sq

* Improve LoadCsv to handle null values when deducing the column types (dotnet#2916)

* Unit test to repro

* Fix dotnet/corefxlab#2915

Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn

* Update src/Microsoft.Data.Analysis/DataFrame.IO.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Update src/Microsoft.Data.Analysis/DataFrame.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Feedback

Co-authored-by: Günther Foidl <gue@korporal.at>

* Create a 0.4.0 package (dotnet#2918)

* Revert "Create a 0.4.0 package (dotnet#2918)" (dotnet#2919)

This reverts commit 0bef531.

* Produce a 0.4.0 build (dotnet#2920)

* Default Length for StringDataFrameColumn (dotnet#2921) (dotnet#2923)

* Increment version and stop producing stable packages (dotnet#2922)

* Increment version and stop producing stable packages

* Add DataFrame object formatter. (dotnet#2931)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Fix a bug in InsertColumn

* Add Microsoft.Data.Analysis.nuget project (dotnet#2933)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj.

* Add Microsoft.Data.Analysis.nuget project.

* Move project to src. Fix nuget project settings.

* Remove NoBuild property from project.

* Remove IncludeBuildOutput and IncludeSymbols from project.

* Add VersionPrefix to project.

* Add IncludeBuildOutput property.

* Add unit tests.

* Downgrade from netcoreapp3.1 to netcoreapp3.0

* Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0)

* Add netcoreapp3.1 to global settings

* Add dotnet 3.1.5 runtime to global settings

* Build fixes

* Moving MDAI into interactive-extensions folder of the package

* Minor refactoring

* Respond to PR feedback

Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* ColumnName indexer on DataFrame (dotnet#2959)

* ColumnName indexer on DataFrame

Fixes dotnet/corefxlab#2934

* Unit tests

* Null column name

* Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: false (dotnet#2956)

* implement FillNulls method for ArrowStringDataFrameColumn

* additional asserts for testcase

* Prevent DataFrame.Sample() method from returning duplicated rows (dotnet#2939)

* resolves dotnet#2806

* replace forloop with ArraySegment<T>

* reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows)

* Add WriteCsv plus unit tests. (dotnet#2947)

* Add WriteCsv plus unit tests.

* Add CultureInfo to WriteCsv. Remove index column param. Update unit tests.

* Add CR changes. CultureInfo. Separator.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Missing values default to a `StringDataFrameColumn` (dotnet#2982)

* Make LoadCsv more robust

* Test empty string column

* Retain prev guess where possible

* Update FromArrowRecordBatches for dotnet-spark (dotnet#2978)

* Support for RecordBatches with StructArrays

* Sq

* Address comments

* Nits

* Nits

* Implement DataFrame.LoadCsvFromString (dotnet#2988)

* Implement DataFrame.LoadCsvFromString

* Address comments

* Part 1 of porting the csv reader (dotnet#2997)

* Move to the test folder

* Suppress warnings

* Move extensions reference out of props

Make MDA.test use the props defined TFM
Comment out 2 unit tests

* Address feedback

* Address feedback

* Default to preview version

* Update nuget.config

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com>
Co-authored-by: Jon Wood <jwood803@users.noreply.github.com>
Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>
Co-authored-by: Rhys Parry <rhys@i-think22.net>
Co-authored-by: daniel costea <dcostea@users.noreply.github.com>
Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com>

* Update to the latest Microsoft.DotNet.Interactive (dotnet#5710)

* Update to the latest Microsoft.DotNet.Interactive

* Add System.CommandLine nuget feed

* Fix Data.Analysis.Interactive test

* added main branch to yml files (dotnet#5715)

* Renamed master to main (dotnet#5717)

* renamed master to main

* Update vsts-ci.yml

* updated urls

* renamed master to main (dotnet#5719)

* IDataView to DataFrame (dotnet#5712)

* IDataView -> DataFrame

Implement the virtual function

* More APIs and unit tests

* ANother unit test

* Address feedback

* Last bit of feedback

* Fix some stuff and unit tests

* sq

* Move RowCursor back

* Remove unused param

Docs
maxRows
More unit tests
Fixed ArrowStringDataFrameColumn construction in the unit test

* Improve csv parsing (dotnet#5711)

* Part 2 of TextFieldParser.

Next up is hooking up ReadCsv to use TextFieldParser

* Make LoadCsv use TextFieldParser

* More unit tests

* cleanup

* Address feedback

* Last bit of feedback

* Remove extra var

* Remove duplicate file

* Rename strings.resx to Strings.resx

* rename the designer.cs file too

* Fix doc markdown (dotnet#5732)

Fixed documentation markdown remarks for
* MulticlassClassificationMetrics.LogLoss
* MulticlassClassificationMetrics.LogLossReduction

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Use Official package for SharpZipLib (dotnet#5735)

Co-authored-by: Xiaoyun Zhang <bigmiao.zhang@gmail.com>
Co-authored-by: BigBigMiao <BigBigMiao@github.com>
Co-authored-by: Keren Fuentes <dkeren@seas.upenn.edu>
Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>
Co-authored-by: Yuanxiang Ying <yingyuanxiang34@sina.com>
Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Piotr Telman <ptelman@users.noreply.github.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Harish Kulkarni <harishsk@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Jason DeBoever <github@deboever.us>
Co-authored-by: Leo Gaunt <36968548+LeoGaunt@users.noreply.github.com>
Co-authored-by: Keren Fuentes <kerenfuentes313@gmail.com>
Co-authored-by: Eric StJohn <ericstj@microsoft.com>
Co-authored-by: Ivan Agarský <agarskyivan@gmail.com>
Co-authored-by: Andrej Kmetík <akmetik@gmail.com>
Co-authored-by: Phan Tấn Tài <37982283+4201104140@users.noreply.github.com>
Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com>
Co-authored-by: Jon Wood <jwood803@users.noreply.github.com>
Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>
Co-authored-by: Rhys Parry <rhys@i-think22.net>
Co-authored-by: daniel costea <dcostea@users.noreply.github.com>
Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com>
Co-authored-by: Robin Windey <ro.windey@gmail.com>
michaelgsharp added a commit that referenced this pull request Aug 24, 2021
* Merge from main repository (#1)

* update tensorflow.net to 0.20.0 (#5404)

* upgrade to 3.1

* write inline data using invariantCulture

* upodate tensorflow

* update Microsoft.ML.Vision

* fix test && comment

* udpate tensorflow.net to 0.20.1

* update tf major version

* downgrade tf runtime to 1.14.1

* Update Dependencies.props

* Update Dependencies.props

* update tffact to stop running test on linux with glibc < 2.3)

* fix TensorFlowTransformInputShapeTest

* use tf.v1 api

* fix comment:

* fix building error

* fix test

* fix nit

* remove linq

Co-authored-by: BigBigMiao <BigBigMiao@github.com>

* ProduceWordBags Onnx Export Fix  (#5435)

* fix for issue

* fix documentation

* aligning test

* adding back line

* aligning fix

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* [SrCnnEntireAnomalyDetector] Upgrade boundary calculation and expected value calculation (#5436)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Update OnnxRuntime to 1.5.2 (#5439)

* Added prerelease feed and updated to 1.5.2

* Remove prerelease feed

* Updated docs

* Update doc

* Fixed MacOS CI Pipeline builds (#5457)

* Added MacOS Homebrew bug fix

* nit fix

* Improving error message  (#5444)

* better error fix

* revisions

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Fixed MacOS daily & nightly builds due to Homebrew bug (#5467)

* Fixed MacOS nightly builds due to Homebrew bug

* Edit workaround

* Remove untapping of python2

* Nit edit

* Remove installation of mono-libgdiplus

* try installing mono-libgdiplus

* unlink python 3.8

* Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment (#5163)

* Fix issue when parsing float string fails on pl-PL culture set

* Added InvariantCulture float parsing as per CodeReview request

* Update src/Microsoft.ML.AutoML/Sweepers/SweeperProbabilityUtils.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Update Parameters.cs

* Added PL test

* Added multiple cultures

* debugging CI failure

* Debug runSpecific

* Revert "Debug runSpecific"

This reverts commit 95b728099415cacbe8cf3819ec51ce50cec94eb2.

* Removed LightGBM and addressed comments

* Increased time

* Increase time

* Increased time

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* handle exception during GetNextPipeline for AutoML (#5455)

* handle exception during GetNextPipeline for AutoML

* take comments

* Changing LoadRawImages Sample (#5460)

replacing example

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (#5445)

* Use ctx.CalncelExecution() to fix AutoML max-time experiment bug

* Added unit test for checking canceled experiment

* Nit fix

* Different run time on Linux

* Review

* Testing four ouput

* Used reflection to test for contexts being canceled

* Reviews

* Reviews

* Added main MLContext listener-timer

* Added PRNG on _context, held onto timers for avoiding GC

* Addressed reviews

* Unit test edits

* Increase run time of experiment to guarantee probabilities

* Edited unit test to check produced schema of next run model's predictions

* Remove scheme check as different CI builds result in varying schemas

* Decrease max experiment time unit test time

* Added Timers

* Increase second timer time, edit unit test

* Added try catch for OperationCanceledException in Execute()

* Add AggregateException try catch to slow unit tests for parallel testing

* Reviews

* Final reviews

* Added LightGBMFact to binary classification test

* Removed extra Operation Stopped exception try catch

* Add back OperationCanceledException to Experiment.cs

* fix issue 5020, allow ML.NET to load tf model with primitive input and output column (#5468)

* handle exception during GetNextPipeline for AutoML

* take comments

* Enable TesnflowTransformer take primitive type as input column

* undo unnecessary changes

* add test

* update on test

* remove unnecessary line

* take comments

* maxModels instead of time for AutoML unit test (#5471)

Uses the internal `maxModels` parameter instead of `MaxExperimentTimeInSeconds` for the exit criteria of AutoML. 

This is to increase the test stability in case the test is run on a slower machine.

* Disabling AutoFitMaxExperimentTimeTest

Disabling AutoFitMaxExperimentTimeTest

* Fix AutoFitMaxExperimentTimeTest (#5506)

*Fixed test
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* Fix SR anomaly score calculation at beginning (#5502)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

* adjust score calculation for first & second points

* fix sr do not report anomaly at beginning

* fix a issue in batch process

* remove a unused parameter

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Merge arcade to master (#5525)

* Initial commit for Arcade migration

* Added omitted files

* Changed strong name signing to use the same key for shipping and test assemblies

* arcade linux build (#5423)

* arcade linux build

* put file execution permission change into source control

* The `-test` command for windows. Nuget packages (#5464)

* working on testing

* testing updates

* tests almost working

* build changes

* all tests should be working

* changes from PR comments

* fixes for .net 3.1

* Fixed extension check. Removed <PackageId> where not needed

* Removed pkg folder and updated paths.

* Added test key. (#5475)

* Added test key.

* Update PublicKey.cs

Removed extra newline.

* Update ComponentCatalog.cs

Fixed 3 spaces to 4.

* Windows CI working (#5477)

* ci testing changes

* comments from pr

* Added Linux & Mac changes for Arcade (#5479)

* Initial Windows, Linux, Macos builds test

* Add Linux/MacOS specific CI requirements

* Run Arcade CI tests on MacOS/Linux

* Fix final package building

* Add benchmark download to benchmars .csporj file

* Print detailed status of each unit test

* Install CentOS & Ubuntu build dependencies

* Use container names to differenciate between Ubuntu & CentOS

* Remove sudo usage in CentOS

* Fix Linux build dependencies

* Add -y param to apt install

* Remove installation of Linux dependencies

* Minor additions

* Rename Benchmarks to PerformanceTests for Arcade

* Changes

* Added benchmark doc changes

* Pre-merge changes

* Fixing failing Arcade Windows Builds (#5482)

* Try Windows build single quote fix

* Remove %20

* Added variable space value

* Using variables for spacing

* Added space values as job parameters

* Try conditional variables again

* fix official builds

* Revert "fix official builds"

This reverts commit 7dbbdc7b946f4f48db5452887ad9bf53616a37e8.

* fixing tensorflow rebase issue

* Fixes for many of the CI builds. (#5496)

* yml log changes

* Fix NetFX builds by ensuring assembly version is set correctly and not to Arcade default of 42.42.42.42 (#5503)

* Fixed official builds for Arcade SDK (#5512)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

Co-authored-by: Frank Dong <frdong@microsoft.com>

* fix code generator tests failure (#5520)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

* fix code generate test fails

* only add necessary dependency

Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>

* Fixed memory leaks from OnnxTransformer (#5518)

* Fixed memory leak from OnnxTransformer and related x86 build fixes

* Reverting x86 build related fixes to focus only on the memory leaks

* Updated docs

* Reverted OnnxRuntimeOutputCatcher to private class

* Addressed code review comments

* Refactored OnnxTransform back to using MapperBase based on code review comments

* Handle integration tests and nightly build testing (#5509)

* Make -integrationTests work

* Update .yml file

* Added the TargetArchitecture properties

* Try out -integrationTest

* Missed -integrationTest flag

* Renamed FunctionalTestBaseClass to IntegrationTestBaseClass

* Missed rename

* Modified tests to make them more stable

* Fixed leak in object pool (#5521)

Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>

* fix benchmark test timeout issue (#5530)

* removed old build stuff (#5531)

* Fixes Code Coverage in Arcade (#5528)

* arcade code coverage changes

* adding Michael's changes

* updating path

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Removed CODEOWNERS file to unify review process (#5535)

* Fix publishing problems (#5538)

* Removed our dependency to BuildTools by using the NugetCommand Azure Task.
* We should publish a nuget named "SampleUtils", but we were publishing it with the name "SamplesUtils"
* The naming conventions of our published nugets didn't match the ones described on arcade's docs: Versioning.md. I've also added the option so that when queuing the publishing build, we can pass the VERSIONKIND variable with value "release", so that it produces the nugets with arcade's conventions for "Release official build" nugets (as opposed to the "Daily official build" naming convention that's going to be used now by our CI that publishes nightly nugets).

* Updated prerelease label (#5540)

* Fix warnings from CI Build (#5541)

* fix warnings

* also add conditional copy asset to native.proj

* test fix warnings

* supress nuget warning 5118

* supress other warning

* remove unnecessary change

* put skip warning at Directory.Buil.props

* Updated build instructions (#5534)

* Updated build instructions

* Adressed reviews

* Reviews

* removed the rest of the old pkg references: (#5537)

* Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (#5395)

* Fix for issue 744

* cleanup

* fixing report output

* fixedTestReferenceOutputs

* Fixed test reference outputs for NetCore31

* change top k acc output string format

* Ranking algorithm now uses first appearance in dataset rather than worstCase

* fixed benchmark

* various minor changes from code review

* limit TopK to OutputTopKAcc parameter

* top k output name changes

* make old TopK readOnly

* restored old baselineOutputs since respecting outputTopK param means no topK in most test output

* fix test fails, re-add names parameter

* Clean up commented code

* that'll teach me to edit from the github webpage

* use existing method, fix nits

* Slight comment change

* Comment change / Touch to kick off build pipeline

* fix whitespace

* Added new test

* Code formatting nits

* Code formatting nit

* Fixed undefined rankofCorrectLabel and trailing whitespace warning

* Removed _numUnknownClassInstances and added test for unknown labels

* Add weight to seenRanks

* Nits

* Removed FastTree import

Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Fixed Spelling on stopwords (#5524)

* Changes to onnx export. (#5544)

* Add back missing test project from running on arcade (#5545)

* add back test result upload and add missing test project from running

* fix identification

* filter out performance test result files to avoid warnings

* [CodeGenerator] Fix MLNet.CLI build error. (#5546)

* upgrade to 3.1

* write inline data using invariantCulture

* fix mlnet build error

* Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (#5548)

* Fixed bug

* Tensorflow fix (#5547)

* fix tensorflow issue on sample repo

* add comments

* Update to OnnxRuntime 1.6.0 and fixed bug with sequences outputs (#5529)

* Use onnx prerelease

* Upgrade to onnx 1.6.0

* Updated docs

* Fixed problem with sequences

* added in DcgTruncationLevel to AutoML api (#5433)

* added in DcgTruncationLevel to automl api

* changed default to 10

* updated basline output

* fixed failing tests and baselines

* Changes from PR comments.

* Update src/Microsoft.ML.AutoML/Experiment/MetricsAgents/RankingMetricsAgent.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Changes based on PR comments.

* Fix ranking test.

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Created release notes for v1.5.3 (#5543)

* Created release notes for v1.5.3

* Updated with review comments

* Updated with review comments

* Updated release notes with latest PRs

* Fixed typo

* Forward logs of Experiment's sub MLContexts to main MLContext (#5554)

* Forward logs of Experiment's sub MLContexts to main MLContext

* Adressed reviews

* Update Stale docs (#5550)

* Updated OnnxMl.md

* Updated MlNetMklDeps docs

* Typo

* typo

* continueOnError on Brew Workaround (#5555)

* continueOnError:true

* Fix publishing symbols (#5556)

* Disable Portable PDB conversion

* Push packages to artifacts

* Fix symbols issues

* Added note about Microsoft.ML.dll

* try out just packing

* Return Build=false, but actually use configuration

* Added missing TargetArchitecture

* add back tests

* Added missing flags

* Updated version to 1.5.4 (#5557)

* Fixed version numbers in the right place (#5558)

* Updated version to 1.5.4

* Updated version to 1.5.4

* eng (#5560)

* Renamed release notes file (#5561)

* Renamed release notes file

* Updated version number in release notes

* Add SymSgdNative reference to AutoML.Tests.csproj (#5559)

* runSpecific in YAML

* RunSpecific in test

* Add SymSgdNative reference

* Revert "RunSpecific in test"

This reverts commit fed12b26ae71e7a95d2dd1f4703541138a780d75.

* Revert "runSpecific in YAML"

This reverts commit f9f328d52cd5b4281ad38b7a6af20c219dd0fd44.

* Nuget.config url fix for roslyn compilers (#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* added in note that PredictionEngine is not thread safe (#5583)

* Onnx Export for ValueMapping estimator (#5577)

* Fixed Averaged Perceptron default value (#5586)

* fixed missed averaged perceptron default value

* fixed extension api

* fixed test baselines

* fixing official build (#5596)

* Release/1.5.4 fix (#5599)

* Nuget.config url fix for roslyn compilers (#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* fixing official build (#5596)

* Remove references to Microsoft.ML.Scoring (#5602)

This was the very first ONNX .NET bindings, it was replaced with Microsoft.ML.OnnxRuntime
then Microsoft.ML.OnnxRuntime.Managed.

* Make ColumnInference serializable (#5611)

* upgrade to 3.1

* write inline data using invariantCulture

* make column inference serializable

* add test json

* add approvaltests

* fixerd nuget.config (#5614)

* Fix issue in SRCnnEntireAnomalyDetector (#5579)

* update

* refine codes

* update comments

* update for nit

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Offer suggestions for possibly mistyped label column names in AutoML (#5574) (#5624)

* Offer suggestions for possibly mistyped label column names

* review changes

* TimeSeries - fix confidence parameter type for some detectors (#4058) (#5623)

* TimeSeries - fix confidence parameter type for some detectors.

- The public API exposed confidence parameters as int even though it's internally implemented as double
- There was no workaround since all classes where double is used are internal
- This caused major issues for software requiring high precision predictions
- This change to API should be backwards compatible since int can be passed to parameter of type double

* TimeSeries - reintroduce original methods with confidence parameter of type int (to not break the API).

* TimeSeries - make catalog API methods with int confidence parameter deprecated.

- Tests adjusted to not use the deprecated methods

* Update Conversion.cs (#5627)

* Documentation updates (#5635)

* documentation updates

* fixed spelling error

* Update docs/building/unix-instructions.md

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

* AutoML aggregate exception (#5631)

* added check for aggregate exception

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* pulled message out to private variable so its not duplicated

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Treat TensorFlow output as non-batched. (#5634)

* Can now not treat output as batched.

* updated comments based on PR comments.

* Fixing saving/loading with new parameter.

* Updates based on PR comments

* Update src/Microsoft.ML.TensorFlow/TensorflowUtils.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* reverted accidental test changes

* fixes based on PR comments

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Added in release notes for 1.5.5 (#5639)

* added in release notes

* Update release-1.5.5.md

Removed incorrect PR.

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* updating version after release (#5642)

* Move DataFrame to machinelearning (#5641)

* Change namespace to Microsoft.Data.Analysis (#2773)

* Update namespace to Microsoft.Data.Analysis

* Remove "DataFrame" from the test project name

* APIs for reversed binary operators (#2769)

* Support reverse binary operators

* Fix file left behind in a rebase

* Fix whitespace

* Throw for incompatible inPlace (#2778)

* Throw if inPlace is set and types mismatch

* Unit test

* Better error message

* Remove empty lines

* Version, Tags and Description for Nuget (#2779)

* Version, Tags and Description for Nuget

* sq

* Flags for release  (#2781)

* Publish packages to artifacts

* Flags for release

* Fix the Description method to not throw (#2786)

* Fix the Description method to not crash
Adds an Info method

* sq

* Address feddback

* Last round of feedback

* Use dataTypes if it passed in to LoadCsv (#2791)

* Fix LoadCsv to use dataType if it passed in

* sq

* Don't read the full file after guessRows lines have been read

* Address feedback

* Last round of feedback

* Creating a `Rows` property, similar to `Columns` (#2794)

* Rows collection, similar to Columns

* Doc

* Some minor clean up

* Make DataFrameRow a view into the DataFrame

* sq

* Address feedback

* Remove DataFrame.RowCount

* More row count changes

* sq

* Address feedback

* Merge upstream

* DataFrame.LoadCsv throws an exception on projects targeting < netcore3.0 (#2797)

Fixing by passing in an encoding and a default buffer size.

Also, get our tests running on .NET Framework.

Fix #2783

* Params constructor on DataFrame (#2800)

* Params constructor on DataFrame

* Delete redundant constructors

* Remove `T : unmanaged` constraint from DataFrameColumn.BinaryOperations (#2801)

* Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations

* Address feedback

* Rename the value version of the APIs

* sq

* Fix build

* Address feedback

* Remove Value from the APIs

* sq

* Address feedback

* Bump version to 0.2.0 (#2803)

* Add Apply<TResult>method to PrimitiveDataFrameColumn (#2807)

* Add Apply method to PrimitiveDataFrameColumn and its container

* Add TestApply test

* Remove unused df variable in DataFrameTests

* Add xml doc comments to Apply method

* Add additional tests for ReadCsv (#2811)

* Add additional tests for ReadCsv

* Update asserts

* Add empty row and skip test pending another fix

* Remove test for another issue

* Added static factory methods to DataFrameColumn  (#2808)

* Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type).

* Remove regions

* Update some parts of the unit tests to use static factory methods to create DataFrameColumns.

* Remove errant {T} on StringDataFrameColumn.

* PR feedback

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Append rows to a DataFrame (#2823)

* Append rows to a DataFrame

* Unit test

* Update unit tests and doc

* Need to perfrom a type check every time

* sq

* Update unit test

* Address comments

* Move corefxlab to arcade (#2795)

* Add eng folder

* First cut of moving corefxlab to arcade

* Move arcade symbol validation inside official buil

* Move base yml file to root

* Arcade will build, publish packages and symbols

* UpdateXlf. Review this

* Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel

* Remove property that was causing the build to fail

* Moving global properties to the main Yaml instead of step in order to unblock publishing

* Committing xlfs and changing the build script to not update Xlf on build

* clean up corefxlab-base.yml

* sq

* Delete unused files and scripts

* Get rid of all the xlf stuff

* Remove UpdateXlfOnBuild for non-NT builds

* Minor cleanup

* More cleanup

* update eng\build.sh permission

* Rename to Nuget.config

* sq

* Remove the runtime spec from global.json

* Don't publish test projs

* Typo

* Move version prefix to versions.props
Change prereleaselabel to alpha

* Increment version number to list as the latest package
Increment version number of Microsoft.Experimental.Collections to list as the latest package
Turn off graph generation

* Update the Readme

* Test removing the scripts folder

* Touch readme to force a change

* Address Jose's comments

* Typo

* Move versions to eng/versions.props

* Benchmark.proj needs to refer to xunit

* Clean up dependencies.props

* Remove dependencies.props

Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>

* Rename Sort to OrderBy (#2814)

* Rename sort to orderby and add orderbydescending method

* Add doc strings

* Update bench mark test

* Update tests

* Update DataFrameColumn to use orderby

* Update doc comment

* Additions to sortby

* Revert "Additions to sortby"

This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35.

* Revert "Update doc comment"

This reverts commit 192f7797fe2b77625486637badf77046162fedbf.

* Revert "Update DataFrameColumn to use orderby"

This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4.

* Explode column types and generate converters (#2857)

* Explode column types and generate converters

* Clean this

* sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Address remaining concerns from the 2nd DataFrame API Review  (#2861)

* Move string indexer to Columns

* API changes from the 2nd API review

* Unit tests

* Address comments

* Add binary operations and operators on the exploded columns (#2867)

* Generate combinations of binary operations and Add

* Numeric Converters and CloneAsNumericColumns

* Binary, Comparison and Shift operations

* Clean up and bug fix

* Fix the binary op apis to not be overridden

* Internal constructors for exploded types

* Proper return types for exploded types

* Update unit tests

* Update csproj

* Revert "Fix the binary op apis to not be overridden"

This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1.

* Bug fix and unit test

* Constructor that takes in a container

* Unit tests

* Call the implementation where possible

* Review sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Rename to the system namespace column types

* Address comments

* Push to pull locally

* Mimic C#'s arithmetic grammar in DataFrame

* Address feedback

* Reduce the number of partial column definitions

* Address feedback

* Add APIs to get the strongly typed columns from a DataFrame (#2878)

* CP

* sq

* sq

* Improve docs

* Enable xml docs for Data.Analysis (#2882)

* Enable xml docs for Data.Analysis

* Fix /// summary around inheritdoc

* Minor doc changes

* sq

* sq

* Address feedback

* Add Apply to ArrowStringDataFrameColumn (#2889)

* Support for Exploded columns types in Arrow and IO scenarios (#2885)

* Support for Exploded columns types in Arrow and IO scenarios

* Unit tests

* Address feedback

* Bump version (#2890)

* Fix versioning to allow for individual stable packages (#2891)

* Fix versioning to allow for individual stable packages

* sq

* Bump Microsoft.Data.Analysis version to 0.4.0 (#2892)

* Bump Microsoft.Data.Analysis version to 0.4.0

* Fix https://github.com/dotnet/corefxlab/issues/2906 (#2907)

* Fix https://github.com/dotnet/corefxlab/issues/2906

* Improvements and unit tests

* sq

* Better fix

* sq

* Improve LoadCsv to handle null values when deducing the column types (#2916)

* Unit test to repro

* Fix https://github.com/dotnet/corefxlab/issues/2915

Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn

* Update src/Microsoft.Data.Analysis/DataFrame.IO.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Update src/Microsoft.Data.Analysis/DataFrame.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Feedback

Co-authored-by: Günther Foidl <gue@korporal.at>

* Create a 0.4.0 package (#2918)

* Revert "Create a 0.4.0 package (#2918)" (#2919)

This reverts commit 0bef531289744274ab97e8bbb9e5694b0d855689.

* Produce a 0.4.0 build (#2920)

* Default Length for StringDataFrameColumn (#2921) (#2923)

* Increment version and stop producing stable packages (#2922)

* Increment version and stop producing stable packages

* Add DataFrame object formatter. (#2931)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Fix a bug in InsertColumn

* Add Microsoft.Data.Analysis.nuget project (#2933)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj.

* Add Microsoft.Data.Analysis.nuget project.

* Move project to src. Fix nuget project settings.

* Remove NoBuild property from project.

* Remove IncludeBuildOutput and IncludeSymbols from project.

* Add VersionPrefix to project.

* Add IncludeBuildOutput property.

* Add unit tests.

* Downgrade from netcoreapp3.1 to netcoreapp3.0

* Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0)

* Add netcoreapp3.1 to global settings

* Add dotnet 3.1.5 runtime to global settings

* Build fixes

* Moving MDAI into interactive-extensions folder of the package

* Minor refactoring

* Respond to PR feedback

Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* ColumnName indexer on DataFrame (#2959)

* ColumnName indexer on DataFrame

Fixes https://github.com/dotnet/corefxlab/issues/2934

* Unit tests

* Null column name

* Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: false (#2956)

* implement FillNulls method for ArrowStringDataFrameColumn

* additional asserts for testcase

* Prevent DataFrame.Sample() method from returning duplicated rows (#2939)

* resolves #2806

* replace forloop with ArraySegment<T>

* reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows)

* Add WriteCsv plus unit tests. (#2947)

* Add WriteCsv plus unit tests.

* Add CultureInfo to WriteCsv. Remove index column param. Update unit tests.

* Add CR changes. CultureInfo. Separator.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Missing values default to a `StringDataFrameColumn` (#2982)

* Make LoadCsv more robust

* Test empty string column

* Retain prev guess where possible

* Update FromArrowRecordBatches for dotnet-spark (#2978)

* Support for RecordBatches with StructArrays

* Sq

* Address comments

* Nits

* Nits

* Implement DataFrame.LoadCsvFromString (#2988)

* Implement DataFrame.LoadCsvFromString

* Address comments

* Part 1 of porting the csv reader (#2997)

* Move to the test folder

* Suppress warnings

* Move extensions reference out of props

Make MDA.test use the props defined TFM
Comment out 2 unit tests

* Address feedback

* Address feedback

* Default to preview version

* Update nuget.config

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com>
Co-authored-by: Jon Wood <jwood803@users.noreply.github.com>
Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>
Co-authored-by: Rhys Parry <rhys@i-think22.net>
Co-authored-by: daniel costea <dcostea@users.noreply.github.com>
Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com>

* Update to the latest Microsoft.DotNet.Interactive (#5710)

* Update to the latest Microsoft.DotNet.Interactive

* Add System.CommandLine nuget feed

* Fix Data.Analysis.Interactive test

* added main branch to yml files (#5715)

* Renamed master to main (#5717)

* renamed master to main

* Update vsts-ci.yml

* updated urls

* renamed master to main (#5719)

* IDataView to DataFrame (#5712)

* IDataView -> DataFrame

Implement the virtual function

* More APIs and unit tests

* ANother unit test

* Address feedback

* Last bit of feedback

* Fix some stuff and unit tests

* sq

* Move RowCursor back

* Remove unused param

Docs
maxRows
More unit tests
Fixed ArrowStringDataFrameColumn construction in the unit test

* Improve csv parsing (#5711)

* Part 2 of TextFieldParser.

Next up is hooking up ReadCsv to use TextFieldParser

* Make LoadCsv use TextFieldParser

* More unit tests

* cleanup

* Address feedback

* Last bit of feedback

* Remove extra var

* Remove duplicate file

* Rename strings.resx to Strings.resx

* rename the designer.cs file too

* Fix doc markdown (#5732)

Fixed documentation markdown remarks for
* MulticlassClassificationMetrics.LogLoss
* MulticlassClassificationMetrics.LogLossReduction

Signed-off-by: Robin Windey <ro.windey@gmail.com>

* Use Official package for SharpZipLib (#5735)

Co-authored-by: Xiaoyun Zhang <bigmiao.zhang@gmail.com>
Co-authored-by: BigBigMiao <BigBigMiao@github.com>
Co-authored-by: Keren Fuentes <dkeren@seas.upenn.edu>
Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>
Co-authored-by: Yuanxiang Ying <yingyuanxiang34@sina.com>
Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Piotr Telman <ptelman@users.noreply.github.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Harish Kulkarni <harishsk@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Jason DeBoever <github@deboever.us>
Co-authored-by: Leo Gaunt <36968548+LeoGaunt@users.noreply.github.com>
Co-authored-by: Keren Fuentes <kerenfuentes313@gmail.com>
Co-authored-by: Eric StJohn <ericstj@microsoft.com>
Co-authored-by: Ivan Agarský <agarskyivan@gmail.com>
Co-authored-by: Andrej Kmetík <akmetik@gmail.com>
Co-authored-by: Phan Tấn Tài <37982283+4201104140@users.noreply.github.com>
Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Haytam Zanid <34218324+zHaytam@users.noreply.github.com>
Co-authored-by: Jon Wood <jwood803@users.noreply.github.com>
Co-authored-by: Sam <1965570+MgSam@users.noreply.github.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Günther Foidl <gue@korporal.at>
Co-authored-by: Rhys Parry <rhys@i-think22.net>
Co-authored-by: daniel costea <dcostea@users.noreply.github.com>
Co-authored-by: Ramon <56896136+RamonWill@users.noreply.github.com>
Co-authored-by: Robin Windey <ro.windey@gmail.com>

* Actually merge from main (#2)

* update tensorflow.net to 0.20.0 (#5404)

* upgrade to 3.1

* write inline data using invariantCulture

* upodate tensorflow

* update Microsoft.ML.Vision

* fix test && comment

* udpate tensorflow.net to 0.20.1

* update tf major version

* downgrade tf runtime to 1.14.1

* Update Dependencies.props

* Update Dependencies.props

* update tffact to stop running test on linux with glibc < 2.3)

* fix TensorFlowTransformInputShapeTest

* use tf.v1 api

* fix comment:

* fix building error

* fix test

* fix nit

* remove linq

Co-authored-by: BigBigMiao <BigBigMiao@github.com>

* ProduceWordBags Onnx Export Fix  (#5435)

* fix for issue

* fix documentation

* aligning test

* adding back line

* aligning fix

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* [SrCnnEntireAnomalyDetector] Upgrade boundary calculation and expected value calculation (#5436)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Update OnnxRuntime to 1.5.2 (#5439)

* Added prerelease feed and updated to 1.5.2

* Remove prerelease feed

* Updated docs

* Update doc

* Fixed MacOS CI Pipeline builds (#5457)

* Added MacOS Homebrew bug fix

* nit fix

* Improving error message  (#5444)

* better error fix

* revisions

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Fixed MacOS daily & nightly builds due to Homebrew bug (#5467)

* Fixed MacOS nightly builds due to Homebrew bug

* Edit workaround

* Remove untapping of python2

* Nit edit

* Remove installation of mono-libgdiplus

* try installing mono-libgdiplus

* unlink python 3.8

* Auto.ML: Fix issue when parsing float string fails on pl-PL culture set using Regression Experiment (#5163)

* Fix issue when parsing float string fails on pl-PL culture set

* Added InvariantCulture float parsing as per CodeReview request

* Update src/Microsoft.ML.AutoML/Sweepers/SweeperProbabilityUtils.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Update Parameters.cs

* Added PL test

* Added multiple cultures

* debugging CI failure

* Debug runSpecific

* Revert "Debug runSpecific"

This reverts commit 95b728099415cacbe8cf3819ec51ce50cec94eb2.

* Removed LightGBM and addressed comments

* Increased time

* Increase time

* Increased time

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* handle exception during GetNextPipeline for AutoML (#5455)

* handle exception during GetNextPipeline for AutoML

* take comments

* Changing LoadRawImages Sample (#5460)

replacing example

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Use Timer and ctx.CancelExecution() to fix AutoML max-time experiment bug (#5445)

* Use ctx.CalncelExecution() to fix AutoML max-time experiment bug

* Added unit test for checking canceled experiment

* Nit fix

* Different run time on Linux

* Review

* Testing four ouput

* Used reflection to test for contexts being canceled

* Reviews

* Reviews

* Added main MLContext listener-timer

* Added PRNG on _context, held onto timers for avoiding GC

* Addressed reviews

* Unit test edits

* Increase run time of experiment to guarantee probabilities

* Edited unit test to check produced schema of next run model's predictions

* Remove scheme check as different CI builds result in varying schemas

* Decrease max experiment time unit test time

* Added Timers

* Increase second timer time, edit unit test

* Added try catch for OperationCanceledException in Execute()

* Add AggregateException try catch to slow unit tests for parallel testing

* Reviews

* Final reviews

* Added LightGBMFact to binary classification test

* Removed extra Operation Stopped exception try catch

* Add back OperationCanceledException to Experiment.cs

* fix issue 5020, allow ML.NET to load tf model with primitive input and output column (#5468)

* handle exception during GetNextPipeline for AutoML

* take comments

* Enable TesnflowTransformer take primitive type as input column

* undo unnecessary changes

* add test

* update on test

* remove unnecessary line

* take comments

* maxModels instead of time for AutoML unit test (#5471)

Uses the internal `maxModels` parameter instead of `MaxExperimentTimeInSeconds` for the exit criteria of AutoML. 

This is to increase the test stability in case the test is run on a slower machine.

* Disabling AutoFitMaxExperimentTimeTest

Disabling AutoFitMaxExperimentTimeTest

* Fix AutoFitMaxExperimentTimeTest (#5506)

*Fixed test
Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>

* Fix SR anomaly score calculation at beginning (#5502)

* adjust expected value

* update boundary calculation

* fix boundary

* adjust default values

* fix percent case

* fix error in anomaly score calculation

* adjust score calculation for first & second points

* fix sr do not report anomaly at beginning

* fix a issue in batch process

* remove a unused parameter

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Merge arcade to master (#5525)

* Initial commit for Arcade migration

* Added omitted files

* Changed strong name signing to use the same key for shipping and test assemblies

* arcade linux build (#5423)

* arcade linux build

* put file execution permission change into source control

* The `-test` command for windows. Nuget packages (#5464)

* working on testing

* testing updates

* tests almost working

* build changes

* all tests should be working

* changes from PR comments

* fixes for .net 3.1

* Fixed extension check. Removed <PackageId> where not needed

* Removed pkg folder and updated paths.

* Added test key. (#5475)

* Added test key.

* Update PublicKey.cs

Removed extra newline.

* Update ComponentCatalog.cs

Fixed 3 spaces to 4.

* Windows CI working (#5477)

* ci testing changes

* comments from pr

* Added Linux & Mac changes for Arcade (#5479)

* Initial Windows, Linux, Macos builds test

* Add Linux/MacOS specific CI requirements

* Run Arcade CI tests on MacOS/Linux

* Fix final package building

* Add benchmark download to benchmars .csporj file

* Print detailed status of each unit test

* Install CentOS & Ubuntu build dependencies

* Use container names to differenciate between Ubuntu & CentOS

* Remove sudo usage in CentOS

* Fix Linux build dependencies

* Add -y param to apt install

* Remove installation of Linux dependencies

* Minor additions

* Rename Benchmarks to PerformanceTests for Arcade

* Changes

* Added benchmark doc changes

* Pre-merge changes

* Fixing failing Arcade Windows Builds (#5482)

* Try Windows build single quote fix

* Remove %20

* Added variable space value

* Using variables for spacing

* Added space values as job parameters

* Try conditional variables again

* fix official builds

* Revert "fix official builds"

This reverts commit 7dbbdc7b946f4f48db5452887ad9bf53616a37e8.

* fixing tensorflow rebase issue

* Fixes for many of the CI builds. (#5496)

* yml log changes

* Fix NetFX builds by ensuring assembly version is set correctly and not to Arcade default of 42.42.42.42 (#5503)

* Fixed official builds for Arcade SDK (#5512)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

Co-authored-by: Frank Dong <frdong@microsoft.com>

* fix code generator tests failure (#5520)

* Added fixes for official builds

* Make .sh files executable

* fix mkl nuget issue

* fix code generate test fails

* only add necessary dependency

Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>

* Fixed memory leaks from OnnxTransformer (#5518)

* Fixed memory leak from OnnxTransformer and related x86 build fixes

* Reverting x86 build related fixes to focus only on the memory leaks

* Updated docs

* Reverted OnnxRuntimeOutputCatcher to private class

* Addressed code review comments

* Refactored OnnxTransform back to using MapperBase based on code review comments

* Handle integration tests and nightly build testing (#5509)

* Make -integrationTests work

* Update .yml file

* Added the TargetArchitecture properties

* Try out -integrationTest

* Missed -integrationTest flag

* Renamed FunctionalTestBaseClass to IntegrationTestBaseClass

* Missed rename

* Modified tests to make them more stable

* Fixed leak in object pool (#5521)

Co-authored-by: frank-dong-ms <55860649+frank-dong-ms@users.noreply.github.com>
Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com>
Co-authored-by: Mustafa Bal <5262061+mstfbl@users.noreply.github.com>
Co-authored-by: Frank Dong <frdong@microsoft.com>
Co-authored-by: Michael Sharp <misharp@microsoft.com>
Co-authored-by: Antonio Velázquez <38739674+antoniovs1029@users.noreply.github.com>

* fix benchmark test timeout issue (#5530)

* removed old build stuff (#5531)

* Fixes Code Coverage in Arcade (#5528)

* arcade code coverage changes

* adding Michael's changes

* updating path

Co-authored-by: Keren Fuentes <kedejesu@microsoft.com>

* Removed CODEOWNERS file to unify review process (#5535)

* Fix publishing problems (#5538)

* Removed our dependency to BuildTools by using the NugetCommand Azure Task.
* We should publish a nuget named "SampleUtils", but we were publishing it with the name "SamplesUtils"
* The naming conventions of our published nugets didn't match the ones described on arcade's docs: Versioning.md. I've also added the option so that when queuing the publishing build, we can pass the VERSIONKIND variable with value "release", so that it produces the nugets with arcade's conventions for "Release official build" nugets (as opposed to the "Daily official build" naming convention that's going to be used now by our CI that publishes nightly nugets).

* Updated prerelease label (#5540)

* Fix warnings from CI Build (#5541)

* fix warnings

* also add conditional copy asset to native.proj

* test fix warnings

* supress nuget warning 5118

* supress other warning

* remove unnecessary change

* put skip warning at Directory.Buil.props

* Updated build instructions (#5534)

* Updated build instructions

* Adressed reviews

* Reviews

* removed the rest of the old pkg references: (#5537)

* Perf improvement for TopK Accuracy and return all topK in Classification Evaluator (#5395)

* Fix for issue 744

* cleanup

* fixing report output

* fixedTestReferenceOutputs

* Fixed test reference outputs for NetCore31

* change top k acc output string format

* Ranking algorithm now uses first appearance in dataset rather than worstCase

* fixed benchmark

* various minor changes from code review

* limit TopK to OutputTopKAcc parameter

* top k output name changes

* make old TopK readOnly

* restored old baselineOutputs since respecting outputTopK param means no topK in most test output

* fix test fails, re-add names parameter

* Clean up commented code

* that'll teach me to edit from the github webpage

* use existing method, fix nits

* Slight comment change

* Comment change / Touch to kick off build pipeline

* fix whitespace

* Added new test

* Code formatting nits

* Code formatting nit

* Fixed undefined rankofCorrectLabel and trailing whitespace warning

* Removed _numUnknownClassInstances and added test for unknown labels

* Add weight to seenRanks

* Nits

* Removed FastTree import

Co-authored-by: Antonio Velazquez <anvelazq@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Fixed Spelling on stopwords (#5524)

* Changes to onnx export. (#5544)

* Add back missing test project from running on arcade (#5545)

* add back test result upload and add missing test project from running

* fix identification

* filter out performance test result files to avoid warnings

* [CodeGenerator] Fix MLNet.CLI build error. (#5546)

* upgrade to 3.1

* write inline data using invariantCulture

* fix mlnet build error

* Fixed AutoML CrossValSummaryRunner for TopKAccuracyForAllK (#5548)

* Fixed bug

* Tensorflow fix (#5547)

* fix tensorflow issue on sample repo

* add comments

* Update to OnnxRuntime 1.6.0 and fixed bug with sequences outputs (#5529)

* Use onnx prerelease

* Upgrade to onnx 1.6.0

* Updated docs

* Fixed problem with sequences

* added in DcgTruncationLevel to AutoML api (#5433)

* added in DcgTruncationLevel to automl api

* changed default to 10

* updated basline output

* fixed failing tests and baselines

* Changes from PR comments.

* Update src/Microsoft.ML.AutoML/Experiment/MetricsAgents/RankingMetricsAgent.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Changes based on PR comments.

* Fix ranking test.

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Created release notes for v1.5.3 (#5543)

* Created release notes for v1.5.3

* Updated with review comments

* Updated with review comments

* Updated release notes with latest PRs

* Fixed typo

* Forward logs of Experiment's sub MLContexts to main MLContext (#5554)

* Forward logs of Experiment's sub MLContexts to main MLContext

* Adressed reviews

* Update Stale docs (#5550)

* Updated OnnxMl.md

* Updated MlNetMklDeps docs

* Typo

* typo

* continueOnError on Brew Workaround (#5555)

* continueOnError:true

* Fix publishing symbols (#5556)

* Disable Portable PDB conversion

* Push packages to artifacts

* Fix symbols issues

* Added note about Microsoft.ML.dll

* try out just packing

* Return Build=false, but actually use configuration

* Added missing TargetArchitecture

* add back tests

* Added missing flags

* Updated version to 1.5.4 (#5557)

* Fixed version numbers in the right place (#5558)

* Updated version to 1.5.4

* Updated version to 1.5.4

* eng (#5560)

* Renamed release notes file (#5561)

* Renamed release notes file

* Updated version number in release notes

* Add SymSgdNative reference to AutoML.Tests.csproj (#5559)

* runSpecific in YAML

* RunSpecific in test

* Add SymSgdNative reference

* Revert "RunSpecific in test"

This reverts commit fed12b26ae71e7a95d2dd1f4703541138a780d75.

* Revert "runSpecific in YAML"

This reverts commit f9f328d52cd5b4281ad38b7a6af20c219dd0fd44.

* Nuget.config url fix for roslyn compilers (#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* added in note that PredictionEngine is not thread safe (#5583)

* Onnx Export for ValueMapping estimator (#5577)

* Fixed Averaged Perceptron default value (#5586)

* fixed missed averaged perceptron default value

* fixed extension api

* fixed test baselines

* fixing official build (#5596)

* Release/1.5.4 fix (#5599)

* Nuget.config url fix for roslyn compilers (#5584)

* fixed nuget url, versions, and failing tests

* changes from pr comments and MacOS changes

* MacOS homebrew bug workaround

* removed unnused nuget url

* fixing official build (#5596)

* Remove references to Microsoft.ML.Scoring (#5602)

This was the very first ONNX .NET bindings, it was replaced with Microsoft.ML.OnnxRuntime
then Microsoft.ML.OnnxRuntime.Managed.

* Make ColumnInference serializable (#5611)

* upgrade to 3.1

* write inline data using invariantCulture

* make column inference serializable

* add test json

* add approvaltests

* fixerd nuget.config (#5614)

* Fix issue in SRCnnEntireAnomalyDetector (#5579)

* update

* refine codes

* update comments

* update for nit

Co-authored-by: yuyi@microsoft.com <Yuanxiang.Ying@microsoft.com>

* Offer suggestions for possibly mistyped label column names in AutoML (#5574) (#5624)

* Offer suggestions for possibly mistyped label column names

* review changes

* TimeSeries - fix confidence parameter type for some detectors (#4058) (#5623)

* TimeSeries - fix confidence parameter type for some detectors.

- The public API exposed confidence parameters as int even though it's internally implemented as double
- There was no workaround since all classes where double is used are internal
- This caused major issues for software requiring high precision predictions
- This change to API should be backwards compatible since int can be passed to parameter of type double

* TimeSeries - reintroduce original methods with confidence parameter of type int (to not break the API).

* TimeSeries - make catalog API methods with int confidence parameter deprecated.

- Tests adjusted to not use the deprecated methods

* Update Conversion.cs (#5627)

* Documentation updates (#5635)

* documentation updates

* fixed spelling error

* Update docs/building/unix-instructions.md

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

Co-authored-by: Santiago Fernandez Madero <safern@microsoft.com>

* AutoML aggregate exception (#5631)

* added check for aggregate exception

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* pulled message out to private variable so its not duplicated

* Update src/Microsoft.ML.AutoML/Experiment/Experiment.cs

Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
Co-authored-by: Justin Ormont <justinormont@users.noreply.github.com>

* Treat TensorFlow output as non-batched. (#5634)

* Can now not treat output as batched.

* updated comments based on PR comments.

* Fixing saving/loading with new parameter.

* Updates based on PR comments

* Update src/Microsoft.ML.TensorFlow/TensorflowUtils.cs

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* reverted accidental test changes

* fixes based on PR comments

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Added in release notes for 1.5.5 (#5639)

* added in release notes

* Update release-1.5.5.md

Removed incorrect PR.

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update docs/release-notes/1.5.5/release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* Update release-1.5.5.md

Co-authored-by: Eric StJohn <ericstj@microsoft.com>

* updating version after release (#5642)

* Move DataFrame to machinelearning (#5641)

* Change namespace to Microsoft.Data.Analysis (#2773)

* Update namespace to Microsoft.Data.Analysis

* Remove "DataFrame" from the test project name

* APIs for reversed binary operators (#2769)

* Support reverse binary operators

* Fix file left behind in a rebase

* Fix whitespace

* Throw for incompatible inPlace (#2778)

* Throw if inPlace is set and types mismatch

* Unit test

* Better error message

* Remove empty lines

* Version, Tags and Description for Nuget (#2779)

* Version, Tags and Description for Nuget

* sq

* Flags for release  (#2781)

* Publish packages to artifacts

* Flags for release

* Fix the Description method to not throw (#2786)

* Fix the Description method to not crash
Adds an Info method

* sq

* Address feddback

* Last round of feedback

* Use dataTypes if it passed in to LoadCsv (#2791)

* Fix LoadCsv to use dataType if it passed in

* sq

* Don't read the full file after guessRows lines have been read

* Address feedback

* Last round of feedback

* Creating a `Rows` property, similar to `Columns` (#2794)

* Rows collection, similar to Columns

* Doc

* Some minor clean up

* Make DataFrameRow a view into the DataFrame

* sq

* Address feedback

* Remove DataFrame.RowCount

* More row count changes

* sq

* Address feedback

* Merge upstream

* DataFrame.LoadCsv throws an exception on projects targeting < netcore3.0 (#2797)

Fixing by passing in an encoding and a default buffer size.

Also, get our tests running on .NET Framework.

Fix #2783

* Params constructor on DataFrame (#2800)

* Params constructor on DataFrame

* Delete redundant constructors

* Remove `T : unmanaged` constraint from DataFrameColumn.BinaryOperations (#2801)

* Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations

* Address feedback

* Rename the value version of the APIs

* sq

* Fix build

* Address feedback

* Remove Value from the APIs

* sq

* Address feedback

* Bump version to 0.2.0 (#2803)

* Add Apply<TResult>method to PrimitiveDataFrameColumn (#2807)

* Add Apply method to PrimitiveDataFrameColumn and its container

* Add TestApply test

* Remove unused df variable in DataFrameTests

* Add xml doc comments to Apply method

* Add additional tests for ReadCsv (#2811)

* Add additional tests for ReadCsv

* Update asserts

* Add empty row and skip test pending another fix

* Remove test for another issue

* Added static factory methods to DataFrameColumn  (#2808)

* Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type).

* Remove regions

* Update some parts of the unit tests to use static factory methods to create DataFrameColumns.

* Remove errant {T} on StringDataFrameColumn.

* PR feedback

Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* Append rows to a DataFrame (#2823)

* Append rows to a DataFrame

* Unit test

* Update unit tests and doc

* Need to perfrom a type check every time

* sq

* Update unit test

* Address comments

* Move corefxlab to arcade (#2795)

* Add eng folder

* First cut of moving corefxlab to arcade

* Move arcade symbol validation inside official buil

* Move base yml file to root

* Arcade will build, publish packages and symbols

* UpdateXlf. Review this

* Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel

* Remove property that was causing the build to fail

* Moving global properties to the main Yaml instead of step in order to unblock publishing

* Committing xlfs and changing the build script to not update Xlf on build

* clean up corefxlab-base.yml

* sq

* Delete unused files and scripts

* Get rid of all the xlf stuff

* Remove UpdateXlfOnBuild for non-NT builds

* Minor cleanup

* More cleanup

* update eng\build.sh permission

* Rename to Nuget.config

* sq

* Remove the runtime spec from global.json

* Don't publish test projs

* Typo

* Move version prefix to versions.props
Change prereleaselabel to alpha

* Increment version number to list as the latest package
Increment version number of Microsoft.Experimental.Collections to list as the latest package
Turn off graph generation

* Update the Readme

* Test removing the scripts folder

* Touch readme to force a change

* Address Jose's comments

* Typo

* Move versions to eng/versions.props

* Benchmark.proj needs to refer to xunit

* Clean up dependencies.props

* Remove dependencies.props

Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>

* Rename Sort to OrderBy (#2814)

* Rename sort to orderby and add orderbydescending method

* Add doc strings

* Update bench mark test

* Update tests

* Update DataFrameColumn to use orderby

* Update doc comment

* Additions to sortby

* Revert "Additions to sortby"

This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35.

* Revert "Update doc comment"

This reverts commit 192f7797fe2b77625486637badf77046162fedbf.

* Revert "Update DataFrameColumn to use orderby"

This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4.

* Explode column types and generate converters (#2857)

* Explode column types and generate converters

* Clean this

* sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Address remaining concerns from the 2nd DataFrame API Review  (#2861)

* Move string indexer to Columns

* API changes from the 2nd API review

* Unit tests

* Address comments

* Add binary operations and operators on the exploded columns (#2867)

* Generate combinations of binary operations and Add

* Numeric Converters and CloneAsNumericColumns

* Binary, Comparison and Shift operations

* Clean up and bug fix

* Fix the binary op apis to not be overridden

* Internal constructors for exploded types

* Proper return types for exploded types

* Update unit tests

* Update csproj

* Revert "Fix the binary op apis to not be overridden"

This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1.

* Bug fix and unit test

* Constructor that takes in a container

* Unit tests

* Call the implementation where possible

* Review sq

* sq

* Cherry pick for next commit

* sq

* Undo unnecessary change

* Rename to the system namespace column types

* Address comments

* Push to pull locally

* Mimic C#'s arithmetic grammar in DataFrame

* Address feedback

* Reduce the number of partial column definitions

* Address feedback

* Add APIs to get the strongly typed columns from a DataFrame (#2878)

* CP

* sq

* sq

* Improve docs

* Enable xml docs for Data.Analysis (#2882)

* Enable xml docs for Data.Analysis

* Fix /// summary around inheritdoc

* Minor doc changes

* sq

* sq

* Address feedback

* Add Apply to ArrowStringDataFrameColumn (#2889)

* Support for Exploded columns types in Arrow and IO scenarios (#2885)

* Support for Exploded columns types in Arrow and IO scenarios

* Unit tests

* Address feedback

* Bump version (#2890)

* Fix versioning to allow for individual stable packages (#2891)

* Fix versioning to allow for individual stable packages

* sq

* Bump Microsoft.Data.Analysis version to 0.4.0 (#2892)

* Bump Microsoft.Data.Analysis version to 0.4.0

* Fix https://github.com/dotnet/corefxlab/issues/2906 (#2907)

* Fix https://github.com/dotnet/corefxlab/issues/2906

* Improvements and unit tests

* sq

* Better fix

* sq

* Improve LoadCsv to handle null values when deducing the column types (#2916)

* Unit test to repro

* Fix https://github.com/dotnet/corefxlab/issues/2915

Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn

* Update src/Microsoft.Data.Analysis/DataFrame.IO.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Update src/Microsoft.Data.Analysis/DataFrame.cs

Co-authored-by: Günther Foidl <gue@korporal.at>

* Feedback

Co-authored-by: Günther Foidl <gue@korporal.at>

* Create a 0.4.0 package (#2918)

* Revert "Create a 0.4.0 package (#2918)" (#2919)

This reverts commit 0bef531289744274ab97e8bbb9e5694b0d855689.

* Produce a 0.4.0 build (#2920)

* Default Length for StringDataFrameColumn (#2921) (#2923)

* Increment version and stop producing stable packages (#2922)

* Increment version and stop producing stable packages

* Add DataFrame object formatter. (#2931)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Fix a bug in InsertColumn

* Add Microsoft.Data.Analysis.nuget project (#2933)

* Add DataFrame object formatter.

* Update nuget dependencies.

* Apply CR fixes.

* Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj.

* Add Microsoft.Data.Analysis.nuget project.

* Move project to src. Fix nuget project settings.

* Remove NoBuild property from project.

* Remove IncludeBuildOutput and IncludeSymbols from project.

* Add VersionPrefix to project.

* Add IncludeBuildOutput property.

* Add unit tests.

* Downgrade from netcoreapp3.1 to netcoreapp3.0

* Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0)

* Add netcoreapp3.1 to global settings

* Add dotnet 3.1.5 runtime to global settings

* Build fixes

* Moving MDAI into interactive-extensions folder of the package

* Minor refactoring

* Respond to PR feedback

Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>

* ColumnName indexer on DataFrame (#2959)

* ColumnName indexer on DataFrame

Fixes https://github.com/dotnet/corefxlab/issues/2934

* Unit tests

* Null column name

* Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: false (#2956)

* implement FillNulls method for ArrowStringDataFrameColumn

* additional asserts for testcase

* Prevent DataFrame.Sample() method from returning duplicated rows (#2939)

* resolves #2806

* replace forloop with ArraySegment<T>

* reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows)

* Add WriteCsv plus unit tests. (#2947)

* Add WriteCsv plus unit tests.

* Add CultureInfo to WriteCsv. Remove index column param. Update unit tests.

* Add CR changes. CultureInfo. Separator.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Format decimal types individually. Fix culture info. Fix unit tests.

* Missing values default to a `StringDataFrameColumn` (#2982)

* Make LoadCsv more robust

* Test empty string column

* Retain prev guess where possible

* Update FromArrowRecordBatches for dotnet-spark (#2978)

* Support for RecordBatches with StructAr…
@ghost ghost locked as resolved and limited conversation to collaborators Mar 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Microsoft.Data.Analysis All DataFrame related issues and PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants