Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation to stop mentioning interfaces that no longer exist #4673

Merged
merged 30 commits into from
Jan 27, 2020

Conversation

antoniovs1029
Copy link
Member

@antoniovs1029 antoniovs1029 commented Jan 17, 2020

So some weeks ago I noticed that the documentation under docs/code mention several interfaces that currently don't exist in ML.NET's codebase. They existed back in TLC... and perhaps some of them, if not all of them, actually made it into ML.NET before version 1.0 release... but the case is that they no longer exist, so I just wanted to update them.

After looking into the TLC code, I decided to make the following replacements in the docs:
ISchema -> DataViewSchema
IRandom -> System.Random
IRow -> DataViewRow
IRowCursor and ICursor -> DataViewRowCursor
IDataReader -> IDataLoader

Side notes

  1. About IRandom: In TLC it seems that only SysRandom and TauswortheHybrid implemented IRandom. In ML.NET IRandom and SysRandom don't exist, but TauswortheHybrid inherits from System.Random. So I guess it's safe to replace IRandom for "System.Random" in the docs.

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner January 17, 2020 22:31
@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Jan 17, 2020

So, any ideas about what has replaced IDataReader? It's also mentioned in the docs, but I am not totally sure which would be the "equivalent" in ML.NET.

https://github.com/dotnet/machinelearning/blob/master/docs/code/MlNetHighLevelConcepts.md#data-reader #Resolved

@@ -159,7 +160,7 @@ method, and you use this delegate multiple times to fetch the actual column
values as you `MoveNext` through the cursor.

Some history to motivate this: In the first version of `IDataView` the
`IRowCursor` implementation did not actually have these "getters" but rather
`DataViewRowCursor` implementation (formerly known as `IRowCursor`) did not actually have these "getters" but rather
Copy link
Contributor

@justinormont justinormont Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see one usage of and various references to IRowCursor -- https://github.com/dotnet/machinelearning/search?utf8=%E2%9C%93&q=IRowCursor&type=

Example:

var cursors = new IRowCursor[prList.Count];
#Resolved

Copy link
Member Author

@antoniovs1029 antoniovs1029 Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems it is the only place in "actual code" where IRowCursor appears.

It is wrapped inside a #if !CORECLR (link). So when building ML.NET nothing inside that block is compiled. If I remove the CORECLR if directive, then I get some errors, not only corresponding to the IRowCursor (which doesn't exist currently in ML.NET).

I don't know if I should change it, then. #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in BinaryClassifierEvaluator will not compile even if we change IRowCursor to DataViewRowCursor, since it uses a class that does not exist in our code: XYPlot. We had that class in an internal repo, and it used System.Windows.Forms which I believe is only for .Net Framework, so we didn't include it in ML.NET. I think that code can safely be deleted.


In reply to: 368201681 [](ancestors = 368201681)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open another PR to remove that piece of code then, since I prefer that this PR remains only as changes in the documentation. Thanks for the clarification, @yaeldekel !


In reply to: 368281931 [](ancestors = 368281931,368201681)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened the PR in here: #4694


In reply to: 369266609 [](ancestors = 369266609,368281931,368201681)

@antoniovs1029 antoniovs1029 requested a review from a team as a code owner January 18, 2020 02:47
@antoniovs1029 antoniovs1029 changed the title Update docs/code documentation to stop mentioning interfaces that no longer exist Update documentation to stop mentioning interfaces that no longer exist Jan 18, 2020
@@ -254,7 +254,6 @@ public override bool IsColumnActive(DataViewSchema.Column column)
/// Base class for transforms that operate row by row with each destination column using one
/// source column. It provides an extension mechanism to allow a destination column to depend
/// on multiple input columns.
/// This class provides the implementation of ISchema and IRowCursor.
Copy link
Member Author

@antoniovs1029 antoniovs1029 Jan 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since OneToOneTransformBase currently doesn't inherit from DataViewSchema or DataViewRowCursor, I guess it's safe to remove this. #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is intended to explain that classes deriving from OneToOneTransformBase do not need to worry about creating a schema or a cursor, since everything necessary is already implemented in OneToOneTransformBase. This is still true even when we don't have the ISchema and IRowCursor interfaces:

  • It has the class deriving from DataViewRowCursor that is used by all classes deriving from OneToOneTransformBase.
  • It's true that it doesn't implement DataViewSchema (since there is only one schema class, it is not abstract), but it does have a Bindings class, that has an AsSchema property.

In reply to: 368201579 [](ancestors = 368201579)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So I've added some more lines to this comment to explain what you've just said in here. Please let me know if it's clear enough.


In reply to: 368283641 [](ancestors = 368283641,368201579)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You can mention that by "don't need to worry about" we mean that the abstract property OutputSchema and the abstract method GetRowCursorCore of TransformBase are implemented in this class. Other than that looks clear enough.


In reply to: 369197668 [](ancestors = 369197668,368283641,368201579)

@yaeldekel
Copy link

IDataReader is the old name of IDataLoader.
Since the loaders don't actually load anything until you get a row cursor (not even when you call Load...), we initially thought we should make a distinction, but we ended up calling them loaders anyway.


In reply to: 575820928 [](ancestors = 575820928)

@antoniovs1029
Copy link
Member Author

I've just seen that PR #2731 actually explicitly states that IDataReader became IDataLoader. Some docs were updated on that PR, but I guess that they've missed updating the MlNetHighLevelConcepts.md file (which I've just updated in here)

There are also some mentions of IDataReader on the following spec:
https://github.com/dotnet/machinelearning/blob/master/docs/specs/mlnet-database-loader/mlnet-database-loader-specs.md

But after discussing it with @harishsk, it seems that it might be better to remove that file (although I would still need to check it with other people that worked on the database loader).


In reply to: 575986788 [](ancestors = 575986788,575820928)

@antoniovs1029
Copy link
Member Author

antoniovs1029 commented Jan 22, 2020

Ok, so after further discussion there's no need to remove the mlnet-database-loader-specs.md file.

Evenmore, it turns out that the IDataReader mentioned in there isn't ML.NET's old IDataReader, but actually System.Data.IDataReader, so I will only make a minor change in that file to make it clear they are referring to that interface.


In reply to: 576902800 [](ancestors = 576902800,575986788,575820928)

Copy link

@yaeldekel yaeldekel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@justinormont justinormont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoniovs1029 antoniovs1029 merged commit af4ca7c into dotnet:master Jan 27, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants