Skip to content

Extension on IDataReader<IMultiStreamSource>, and DataReader<IMultiStreamSource, TShape> to read from one or several file paths, rather than requiring constructing an IMultiStreamSource #1281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 19, 2018

Conversation

sfilipi
Copy link
Member

@sfilipi sfilipi commented Oct 17, 2018

Addresses part of #1090 in this first iteration, by adding a Read extension method taking a string as param, on the IDataReader.

Still looking at how to make so all the derived classes of DataReader<IMultiStreamSource, TShape> can have a Read that takes a string as param.

@sfilipi sfilipi self-assigned this Oct 17, 2018
@sfilipi
Copy link
Member Author

sfilipi commented Oct 17, 2018

@TomFinley looking at the second part of your suggestion on #1041, on creating an extension on IDataReaderEstimator<IMultiStreamSource, ...>, would that be only for the Fit method? #Closed

@sfilipi sfilipi added the API Issues pertaining the friendly API label Oct 17, 2018

public static class LocalPathReader
Copy link
Contributor

@Zruty0 Zruty0 Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalPathReader [](start = 24, length = 15)

I don't think it belongs to TextLoaderStatic. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... where does it belong :)


In reply to: 226074992 [](ancestors = 226074992)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just added an instance method to TextLoader :)


In reply to: 226082986 [](ancestors = 226082986,226074992)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my MLContext PR


In reply to: 226084273 [](ancestors = 226084273,226082986,226074992)

Copy link
Contributor

@Zruty0 Zruty0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Zruty0
Copy link
Contributor

Zruty0 commented Oct 17, 2018

What do you think about updating the cookbook and associated test cases?


In reply to: 430700966 [](ancestors = 430700966)

@sfilipi
Copy link
Member Author

sfilipi commented Oct 17, 2018

The examples in the cookbook use Read in the DataReader type .
Not sure how to go about that, see the question to Tom above.

Can't make an extension for that because of the TShape generic.


In reply to: 430768483 [](ancestors = 430768483,430700966)

@Zruty0
Copy link
Contributor

Zruty0 commented Oct 17, 2018

Nevermind then.


In reply to: 430771669 [](ancestors = 430771669,430768483,430700966)

@sfilipi
Copy link
Member Author

sfilipi commented Oct 17, 2018

Nevermind == close this PR?


In reply to: 430778815 [](ancestors = 430778815,430771669,430768483,430700966)

@sfilipi sfilipi changed the title WIP: Extension on IDataReader<IMultiStreamSource> to read from a string Extension on IDataReader<IMultiStreamSource> to read from a string Oct 17, 2018
@markusweimer
Copy link
Member

markusweimer commented Oct 18, 2018

Minor nit: The wording of the PR title might be confusing. When I first saw it, I thought that we allow reading of data from a string, while the PR adds an overload to IDataReader to accept a filename. #Resolved

@eerhardt
Copy link
Member

eerhardt commented Oct 18, 2018

/// multiple paths separated by +.

Does this comment still apply? #Resolved


Refers to: src/Microsoft.ML.Data/DataLoadSave/MultiFileSource.cs:13 in 0425636. [](commit_id = 0425636, deletion_comment = False)

if (concatenated != null && concatenated.Length > 1)
{
if (paths.Length > 1)
throw Contracts.Except($"Pass a single string to the {nameof(MultiFileSource)} constructor, if you are using wildcards.");
Copy link
Member

@eerhardt eerhardt Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to put this in xml doc comments. #Resolved

@sfilipi
Copy link
Member Author

sfilipi commented Oct 18, 2018

/// multiple paths separated by +.

yes, due to MAML backwards compat we need to support multiple paths.


In reply to: 431087633 [](ancestors = 431087633)


Refers to: src/Microsoft.ML.Data/DataLoadSave/MultiFileSource.cs:13 in 0425636. [](commit_id = 0425636, deletion_comment = False)

adding comments
fixing tests that were using null files in the evaluator.
@sfilipi sfilipi changed the title Extension on IDataReader<IMultiStreamSource> to read from a string Extension on IDataReader<IMultiStreamSource> to read from one or several file paths, rather than requiring constructing an IMultiStreamSource Oct 18, 2018
@sfilipi sfilipi changed the title Extension on IDataReader<IMultiStreamSource> to read from one or several file paths, rather than requiring constructing an IMultiStreamSource Extension on IDataReader<IMultiStreamSource>, and DataReader<IMultiStreamSource, TShape> to read from one or several file paths, rather than requiring constructing an IMultiStreamSource Oct 18, 2018
Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sfilipi looks great.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
API Issues pertaining the friendly API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants