Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL command timeout option to database loader. #4494

Closed
wants to merge 5 commits into from

Conversation

ashbhandare
Copy link
Contributor

Fixes #4484

This change adds a way to specify the SQL command timeout https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.commandtimeout?view=netframework-4.8
while creating the DatabaseSource.

@ashbhandare ashbhandare requested a review from a team as a code owner November 21, 2019 18:36
@@ -14,7 +14,8 @@ public sealed class DatabaseSource
/// <param name="providerFactory">The factory used to create the <see cref="DbConnection"/>..</param>
/// <param name="connectionString">The string used to open the connection.</param>
/// <param name="commandText">The text command to run against the data source.</param>
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText)
/// <param name="commandTimeout">The time in seconds to wait for the command to execute. Default is 30 sec.</param>
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText, int commandTimeout = 30)
Copy link
Member

@eerhardt eerhardt Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change. You can't add a parameter (even an optional parameter) to an already shipped method.

You will need to create a new overload to add this new property. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing


In reply to: 349252694 [](ancestors = 349252694)

@@ -79,6 +79,63 @@ public void IrisLightGbm()
}).PredictedLabel);
}

[LightGBMFact]
public void IrisLightGbmConnectionTimeout()
Copy link
Member

@eerhardt eerhardt Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this testing the timeout? #Resolved

Copy link
Contributor Author

@ashbhandare ashbhandare Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just tests that a command timeout be specified, what would you recommend that the test look like to confirm that it is being exercised?


In reply to: 349252966 [](ancestors = 349252966)

Copy link
Member

@eerhardt eerhardt Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure. But my initial thinking is doing something that would make the command timeout, and ensuring the correct behavior occurs. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test where SQL query has delay added, and command timeout is exercised.


In reply to: 349721564 [](ancestors = 349721564)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also test for user initiated cancellation? I wonder if this way of testing introduces test failures on the infra or problems with other databases than SQL Server?

As for an example, here is one another way: https://github.com/dotnet/orleans/blob/585680f3c06cb82f22e07b65330b769d333a8aae/test/Extensions/TesterAdoNet/StorageTests/RelationalStoreTests.cs#L179. (For context, this is actually testing timeouts while streaming data. This piece isn't the best possible otherwise, but should give perhaps food for thoughts in the future.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you also test for user initiated cancellation? I wonder if this way of testing introduces test failures on the infra or problems with other databases than SQL Server?

As for an example, here is one another way: https://github.com/dotnet/orleans/blob/585680f3c06cb82f22e07b65330b769d333a8aae/test/Extensions/TesterAdoNet/StorageTests/RelationalStoreTests.cs#L179. (For context, this is actually testing timeouts while streaming data. This piece isn't the best possible otherwise, but should give perhaps food for thoughts in the future.)

Could you elaborate on what 'user-initiated cancellation' would mean in this context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashbhandare I was thinking that why fix the command timeout at https://github.com/dotnet/machinelearning/pull/4494/files#diff-6bd895b22ba4c7e741ba7526ac214e5aR33 when it could be per query? While maybe also providing an overload to a user supplied cancellation token and then writing a test that excercises that token and timeout. But maybe I misunderstood this and DatabaseSource is intended to use per query or this is the behaviour hoped.

Sorry for the delay, I lost the plot for a while.

@@ -14,7 +14,8 @@ public sealed class DatabaseSource
/// <param name="providerFactory">The factory used to create the <see cref="DbConnection"/>..</param>
/// <param name="connectionString">The string used to open the connection.</param>
/// <param name="commandText">The text command to run against the data source.</param>
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText)
/// <param name="commandTimeout">The time in seconds to wait for the command to execute. Default is 30 sec.</param>
public DatabaseSource(DbProviderFactory providerFactory, string connectionString, string commandText, int commandTimeout = 30)
Copy link
Member

@tannergooding tannergooding Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely worth making it explicitly clear that this is in seconds. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML description of the parameter does state that the time is in seconds already


In reply to: 349254631 [](ancestors = 349254631)

Copy link
Contributor

@veikkoeeva veikkoeeva Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may, as a bit of an outsider, for a reason or another it's not always easy to see XML comments. So being explicit like commandTimeoutInSeconds = 30 is easier to catch. I believe it's sort of a pattern used quite commonly too. (It's a pedantic nit, but maybe also using the whole word 'seconds' or the SI unit?, https://physics.nist.gov/cuu/Units/checklist.html) #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't matching the name of the actual property that is being set be more consistent. See https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlcommand.commandtimeout?view=netcore-3.0.


In reply to: 351692404 [](ancestors = 351692404)

Copy link
Member

@tannergooding tannergooding Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ML API here is not, strictly speaking, tied to SQL. It is meant to be usable with any database provider. It is more "tied" to DbCommand.CommandTimeout: https://docs.microsoft.com/en-us/dotnet/api/system.data.common.dbcommand.commandtimeout?view=netframework-4.8#System_Data_Common_DbCommand_CommandTimeout

Adding InSeconds just helps clarify the unit of measurement that the timeout is (which can be particularly helpful as .NET APIs frequently deal in milliseconds instead). #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing.


In reply to: 352740161 [](ancestors = 352740161)


var model = pipeline.Fit(trainingData);
}
catch(Exception e)
Copy link
Member

@eerhardt eerhardt Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Assert.Throws instead.

Also - you should ensure the correct type of exception is thrown. #Resolved

@@ -25,9 +25,29 @@ public DatabaseSource(DbProviderFactory providerFactory, string connectionString
CommandText = commandText;
}

/// <summary>Creates a new instance of the <see cref="DatabaseSource" /> class.</summary>
/// <param name="providerFactory">The factory used to create the <see cref="DbConnection"/>..</param>
Copy link
Member

@eerhardt eerhardt Dec 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the double ending .. ? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing here and in the above constructor.


In reply to: 355678098 [](ancestors = 355678098)

@eerhardt
Copy link
Member

eerhardt commented Dec 9, 2019

@tannergooding - any further comments on this?

@codecov
Copy link

codecov bot commented Dec 10, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@c957445). Click here to learn what that means.
The diff coverage is 92.3%.

@@            Coverage Diff            @@
##             master    #4494   +/-   ##
=========================================
  Coverage          ?   75.12%           
=========================================
  Files             ?      908           
  Lines             ?   160268           
  Branches          ?    17251           
=========================================
  Hits              ?   120408           
  Misses            ?    35046           
  Partials          ?     4814
Flag Coverage Δ
#Debug 75.12% <92.3%> (?)
#production 70.52% <100%> (?)
#test 90.31% <88.88%> (?)
Impacted Files Coverage Δ
...ft.ML.Data/DataLoadSave/Database/DatabaseSource.cs 100% <100%> (ø)
...Data/DataLoadSave/Database/DatabaseLoaderCursor.cs 23.38% <100%> (ø)
test/Microsoft.ML.Tests/DatabaseLoaderTests.cs 90.86% <88.88%> (ø)

@thtemme
Copy link

thtemme commented May 8, 2020

I'm voting strongly for this bugfix!
In a big data pipeline long running SQL Statements are normal. So it is not clear why ML.NET is restricted to 30 Seconds timeout in SQL Statement.
Please fix this!

@mstfbl
Copy link
Contributor

mstfbl commented May 26, 2020

Hi @thtemme, what is the upper bound you've seen for the timeout of SQL statements, and where have you seen them? We can update our timeouts if it is indeed the case that 30 seconds isn't enough to load big datasets in ML .NET. Thanks!

@codemzs
Copy link
Member

codemzs commented May 26, 2020

@mstfbl You don't want to update timeouts statically as your response seems to imply but we must instead add an option that allows changing timeouts dynamically which is exactly what this change is trying to do.

CC: @harishsk

@mstfbl
Copy link
Contributor

mstfbl commented May 26, 2020

I understand @codemzs , I implied that if @thtemme can provide examples on where 30 seconds isn't enough to load in datasets, then we can update our timeouts through this PR.

@codemzs
Copy link
Member

codemzs commented May 26, 2020

@mstfbl This PR is NOT updating the timeout. It is providing an option to update the timeout at runtime. The user can specify whatever timeout they want once this PR is merged. You do not need to update the timeout as you say "then we can update our timeouts through this PR.".

@wangyems wangyems closed this Jul 8, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command Timeouts While Using The Database Loader
8 participants