-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding basic support for handling vectors in the database loader. #4138
Conversation
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoaderCursor.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoaderCursor.cs
Outdated
Show resolved
Hide resolved
… the TestModel folder
…le, and removing some dead code.
This should be ready for review now. |
|
||
var mockProviderFactory = new MockProviderFactory(mlContext, loader); | ||
var databaseSource = new DatabaseSource(mockProviderFactory, connectionString, commandText); | ||
var providerFactory = DbProviderFactories.GetFactory("System.Data.SqlClient"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use SqlClientFactory.Instance
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just using DbProviderFactories as that seemed to be the normal way to get registered factories and more closely matched what users wanted.
get => throw new NotImplementedException(); | ||
set => throw new NotImplementedException(); | ||
var databaseFile = GetTestDatabasePath(databaseName); | ||
return $@"Data Source=(LocalDB)\MSSQLLocalDB;AttachDbFilename={databaseFile};Database={databaseName};Integrated Security=True"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work on non-Windows machines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test, but I believe so.
Linux supports SQL as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your CI is failing with
System.PlatformNotSupportedException : LocalDB is not supported on this platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to disable for Linux for the time being and can log an issue for adding a mysql database as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also failing for MacOS. It would be unfortunate to not have any unix tests for the DatabaseLoader....
In reply to: 319267186 [](ancestors = 319267186)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but I'm not sure I'll get a MySQL database together before this needs to be in for the next preview.
When is that cutoff?
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoaderCursor.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
ctx.Writer.WriteBoolByte(info.SourceIndex.HasValue); | ||
if (info.SourceIndex.HasValue) | ||
ctx.Writer.Write(info.SourceIndex.GetValueOrDefault()); | ||
ctx.Writer.Write(info.Segments.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info.Segments
can be null
, right? (According to ColInfo's constructor, it can be. It has an AssertValueOrNull
line in it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as TextLoader, so if there is a bug here, there is a bug there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a bug there as well.
That is not true.
machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs
Lines 576 to 579 in 429f8cc
private ColInfo(string name, DataViewType colType, Segment[] segs, int isegVar, int sizeBase) | |
{ | |
Contracts.AssertNonEmpty(name); | |
Contracts.AssertNonEmpty(segs); |
TextLoader guarantees that Segment[] segs
is a non empty array. DatabaseLoader allows it to be null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to be ctx.Writer.Write((info.Segments?.Length).GetValueOrDefault());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the very next line going to NullRef
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have a test that saves and loads a DatabaseLoader into a model.
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
@@ -67,6 +67,12 @@ public static Type ToType(this DbType dbType) | |||
} | |||
} | |||
|
|||
/// <summary>Maps a <see cref="DataKind"/> to the associated <see cref="DbType"/>.</summary> | |||
public static DbType ToDbType(this DataKind dataKind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used anymore, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in CreateDatabaseLoader<TInput>
when processing the memberInfo kind.
src/Microsoft.ML.Experimental/DataLoadSave/Database/DatabaseLoader.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for the nice work. It will make customers lives easier to consume many columns from a Database.
@tannergooding - half of the Windows legs are failing on your new tests with an exception of the form:
|
Added a connection timeout to try and give localdb a chance to initialize if it hadn't been previously |
Merging. The failing leg is failing due to an AutoML test, which is unrelated to this work. |
No description provided.