-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQL command timeout option to database loader. #4494
Conversation
@@ -14,7 +14,8 @@ public sealed class DatabaseSource | |||
/// <param name="providerFactory">The factory used to create the <see cref="DbConnection"/>..</param> | |||
/// <param name="connectionString">The string used to open the connection.</param> | |||
/// <param name="commandText">The text command to run against the data source.</param> | |||
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText) | |||
/// <param name="commandTimeout">The time in seconds to wait for the command to execute. Default is 30 sec.</param> | |||
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText, int commandTimeout = 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change. You can't add a parameter (even an optional parameter) to an already shipped method.
You will need to create a new overload to add this new property. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -79,6 +79,63 @@ public void IrisLightGbm() | |||
}).PredictedLabel); | |||
} | |||
|
|||
[LightGBMFact] | |||
public void IrisLightGbmConnectionTimeout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this testing the timeout? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just tests that a command timeout be specified, what would you recommend that the test look like to confirm that it is being exercised?
In reply to: 349252966 [](ancestors = 349252966)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure. But my initial thinking is doing something that would make the command timeout, and ensuring the correct behavior occurs. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test where SQL query has delay added, and command timeout is exercised.
In reply to: 349721564 [](ancestors = 349721564)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also test for user initiated cancellation? I wonder if this way of testing introduces test failures on the infra or problems with other databases than SQL Server?
As for an example, here is one another way: https://github.com/dotnet/orleans/blob/585680f3c06cb82f22e07b65330b769d333a8aae/test/Extensions/TesterAdoNet/StorageTests/RelationalStoreTests.cs#L179. (For context, this is actually testing timeouts while streaming data. This piece isn't the best possible otherwise, but should give perhaps food for thoughts in the future.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also test for user initiated cancellation? I wonder if this way of testing introduces test failures on the infra or problems with other databases than SQL Server?
As for an example, here is one another way: https://github.com/dotnet/orleans/blob/585680f3c06cb82f22e07b65330b769d333a8aae/test/Extensions/TesterAdoNet/StorageTests/RelationalStoreTests.cs#L179. (For context, this is actually testing timeouts while streaming data. This piece isn't the best possible otherwise, but should give perhaps food for thoughts in the future.)
Could you elaborate on what 'user-initiated cancellation' would mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashbhandare I was thinking that why fix the command timeout at https://github.com/dotnet/machinelearning/pull/4494/files#diff-6bd895b22ba4c7e741ba7526ac214e5aR33 when it could be per query? While maybe also providing an overload to a user supplied cancellation token and then writing a test that excercises that token and timeout. But maybe I misunderstood this and DatabaseSource
is intended to use per query or this is the behaviour hoped.
Sorry for the delay, I lost the plot for a while.
@@ -14,7 +14,8 @@ public sealed class DatabaseSource | |||
/// <param name="providerFactory">The factory used to create the <see cref="DbConnection"/>..</param> | |||
/// <param name="connectionString">The string used to open the connection.</param> | |||
/// <param name="commandText">The text command to run against the data source.</param> | |||
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText) | |||
/// <param name="commandTimeout">The time in seconds to wait for the command to execute. Default is 30 sec.</param> | |||
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText, int commandTimeout = 30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely worth making it explicitly clear that this is in seconds. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The XML description of the parameter does state that the time is in seconds already
In reply to: 349254631 [](ancestors = 349254631)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I may, as a bit of an outsider, for a reason or another it's not always easy to see XML comments. So being explicit like commandTimeoutInSeconds = 30
is easier to catch. I believe it's sort of a pattern used quite commonly too. (It's a pedantic nit, but maybe also using the whole word 'seconds' or the SI unit?, https://physics.nist.gov/cuu/Units/checklist.html) #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't matching the name of the actual property that is being set be more consistent. See https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.commandtimeout?view=netcore-3.0.
In reply to: 351692404 [](ancestors = 351692404)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ML API here is not, strictly speaking, tied to SQL. It is meant to be usable with any database provider. It is more "tied" to DbCommand.CommandTimeout
: https://docs.microsoft.com/en-us/dotnet/api/system.data.common.dbcommand.commandtimeout?view=netframework-4.8#System_Data_Common_DbCommand_CommandTimeout
Adding InSeconds
just helps clarify the unit of measurement that the timeout is (which can be particularly helpful as .NET APIs frequently deal in milliseconds instead). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
var model = pipeline.Fit(trainingData); | ||
} | ||
catch(Exception e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Assert.Throws
instead.
Also - you should ensure the correct type of exception is thrown. #Resolved
@@ -25,9 +25,29 @@ public DatabaseSource(DbProviderFactory providerFactory, string connectionString | |||
CommandText = commandText; | |||
} | |||
|
|||
/// <summary>Creates a new instance of the <see cref="DatabaseSource" /> class.</summary> | |||
/// <param name="providerFactory">The factory used to create the <see cref="DbConnection"/>..</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the double ending ..
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding - any further comments on this? |
Codecov Report
@@ Coverage Diff @@
## master #4494 +/- ##
=========================================
Coverage ? 75.12%
=========================================
Files ? 908
Lines ? 160268
Branches ? 17251
=========================================
Hits ? 120408
Misses ? 35046
Partials ? 4814
|
I'm voting strongly for this bugfix! |
Hi @thtemme, what is the upper bound you've seen for the timeout of SQL statements, and where have you seen them? We can update our timeouts if it is indeed the case that 30 seconds isn't enough to load big datasets in ML .NET. Thanks! |
@mstfbl This PR is NOT updating the timeout. It is providing an option to update the timeout at runtime. The user can specify whatever timeout they want once this PR is merged. You do not need to update the timeout as you say "then we can update our timeouts through this PR.". |
Fixes #4484
This change adds a way to specify the SQL command timeout https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.commandtimeout?view=netframework-4.8
while creating the DatabaseSource.