Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the database cursor to return default for DBNull #4070

Merged
merged 3 commits into from
Aug 7, 2019
Merged

Changing the database cursor to return default for DBNull #4070

merged 3 commits into from
Aug 7, 2019

Conversation

tannergooding
Copy link
Member

This updates the DatabaseLoaderCursor to support nullable columns by treating them as default.

@tannergooding
Copy link
Member Author

Peek data in DataView: Showing 2 rows with the columns
######################################################
Row--> | Label:False| Label:False| Feat01:32| Feat02:3| Feat03:5| Feat04:| Feat05:1| Feat06:0| Feat07:0| Feat08:61| Feat09:5| Feat10:0| Feat11:1| Feat12:3157| Feat13:5| Cat14:e5f3fd8d| Cat15:a0aaffa6| Cat16:6faa15d5| Cat17:da8a3421| Cat18:3cd69f23| Cat19:6fcd6dcb| Cat20:ab16ed81| Cat21:43426c29| Cat22:1df5e154| Cat23:7de9c0a9| Cat24:6652dc64| Cat25:99eb4e27| Cat26:00c5ffb7| Cat27:be4ee537| Cat28:f3bbfe99| Cat29:4cdc3efa| Cat30:d20856aa| Cat31:a1eb1511| Cat32:9512c20b| Cat33:febfd863| Cat34:a3323ca1| Cat35:c8e1ee56| Cat36:1752e9e8| Cat37:75350c8a| Cat38:991321ea| Cat39:b757e957| Cat14Encoded:1| Cat14Encoded:Sparse vector of size 7, 0 explicit values| Cat15Encoded:1| Cat15Encoded:Sparse vector of size 8, 0 explicit values| Cat16Encoded:1| Cat16Encoded:Sparse vector of size 8, 0 explicit values| Cat17Encoded:1| Cat17Encoded:Sparse vector of size 7, 0 explicit values| Cat18Encoded:1| Cat18Encoded:Sparse vector of size 7, 0 explicit values| Cat19Encoded:1| Cat19Encoded:Sparse vector of size 3, 0 explicit values| Cat20Encoded:1| Cat20Encoded:Sparse vector of size 8, 0 explicit values| Cat21Encoded:1| Cat21Encoded:Sparse vector of size 7, 0 explicit values| Cat22Encoded:1| Cat22Encoded:Sparse vector of size 5, 0 explicit values| Cat23Encoded:1| Cat23Encoded:Sparse vector of size 7, 0 explicit values| Cat24Encoded:1| Cat24Encoded:Sparse vector of size 7, 0 explicit values| Cat25Encoded:1| Cat25Encoded:Sparse vector of size 8, 0 explicit values| Cat26Encoded:1| Cat26Encoded:Sparse vector of size 5, 0 explicit values| Cat27Encoded:1| Cat27Encoded:Sparse vector of size 5, 0 explicit values| Cat28Encoded:1| Cat28Encoded:Sparse vector of size 7, 0 explicit values| Cat29Encoded:1| Cat29Encoded:Sparse vector of size 4, 0 explicit values| Cat30Encoded:1| Cat30Encoded:Sparse vector of size 3, 0 explicit values| Cat31Encoded:1| Cat31Encoded:Sparse vector of size 5, 0 explicit values| Cat32Encoded:1| Cat32Encoded:Sparse vector of size 5, 0 explicit values| Cat33Encoded:1| Cat33Encoded:Sparse vector of size 7, 0 explicit values| Cat34Encoded:1| Cat34Encoded:Sparse vector of size 7, 0 explicit values| Cat35Encoded:1| Cat35Encoded:Sparse vector of size 7, 0 explicit values| Cat36Encoded:1| Cat36Encoded:Sparse vector of size 6, 0 explicit values| Cat37Encoded:1| Cat37Encoded:Sparse vector of size 8, 0 explicit values| Cat38Encoded:1| Cat38Encoded:Sparse vector of size 5, 0 explicit values| Cat39Encoded:1| Cat39Encoded:Sparse vector of size 6, 0 explicit values| Feat01Featurized:Sparse vector of size 100, 3 explicit values| Feat02Featurized:Sparse vector of size 248, 2 explicit values| Feat03Featurized:Sparse vector of size 65, 2 explicit values| Feat04Featurized:Sparse vector of size 104, 0 explicit values| Feat05Featurized:Sparse vector of size 57, 2 explicit values| Feat06Featurized:Sparse vector of size 19, 2 explicit values| Feat07Featurized:Dense vector of size 7| Feat08Featurized:Sparse vector of size 153, 3 explicit values| Feat09Featurized:Sparse vector of size 80, 2 explicit values| Feat10Featurized:Dense vector of size 6| Feat11Featurized:Sparse vector of size 25, 2 explicit values| Feat12Featurized:Sparse vector of size 366, 5 explicit values| Feat13Featurized:Sparse vector of size 63, 2 explicit values| Features:Sparse vector of size 1758, 41 explicit values

Row--> | Label:False| Label:False| Feat01:| Feat02:233| Feat03:1| Feat04:146| Feat05:1| Feat06:0| Feat07:0| Feat08:99| Feat09:7| Feat10:0| Feat11:1| Feat12:3101| Feat13:1| Cat14:62770d79| Cat15:ad984203| Cat16:62bec60d| Cat17:386c49ee| Cat18:e755064d| Cat19:6fcd6dcb| Cat20:b5f5eb62| Cat21:d1f2cc8b| Cat22:2e4e821f| Cat23:2e027dc1| Cat24:0c7c4231| Cat25:12716184| Cat26:00c5ffb7| Cat27:be4ee537| Cat28:f70f0d0b| Cat29:4cdc3efa| Cat30:d20856aa| Cat31:628f1b8d| Cat32:9512c20b| Cat33:c38e2f28| Cat34:14f65a5d| Cat35:25b1b089| Cat36:d7c1fc0b| Cat37:34a9b905| Cat38:ff654802| Cat39:ed10571d| Cat14Encoded:2| Cat14Encoded:Sparse vector of size 7, 1 explicit values| Cat15Encoded:2| Cat15Encoded:Sparse vector of size 8, 1 explicit values| Cat16Encoded:2| Cat16Encoded:Sparse vector of size 8, 1 explicit values| Cat17Encoded:2| Cat17Encoded:Sparse vector of size 7, 1 explicit values| Cat18Encoded:2| Cat18Encoded:Sparse vector of size 7, 1 explicit values| Cat19Encoded:1| Cat19Encoded:Sparse vector of size 3, 0 explicit values| Cat20Encoded:2| Cat20Encoded:Sparse vector of size 8, 1 explicit values| Cat21Encoded:2| Cat21Encoded:Sparse vector of size 7, 1 explicit values| Cat22Encoded:2| Cat22Encoded:Sparse vector of size 5, 1 explicit values| Cat23Encoded:2| Cat23Encoded:Sparse vector of size 7, 1 explicit values| Cat24Encoded:2| Cat24Encoded:Sparse vector of size 7, 1 explicit values| Cat25Encoded:2| Cat25Encoded:Sparse vector of size 8, 1 explicit values| Cat26Encoded:1| Cat26Encoded:Sparse vector of size 5, 0 explicit values| Cat27Encoded:1| Cat27Encoded:Sparse vector of size 5, 0 explicit values| Cat28Encoded:2| Cat28Encoded:Sparse vector of size 7, 1 explicit values| Cat29Encoded:1| Cat29Encoded:Sparse vector of size 4, 0 explicit values| Cat30Encoded:1| Cat30Encoded:Sparse vector of size 3, 0 explicit values| Cat31Encoded:2| Cat31Encoded:Sparse vector of size 5, 1 explicit values| Cat32Encoded:1| Cat32Encoded:Sparse vector of size 5, 0 explicit values| Cat33Encoded:2| Cat33Encoded:Sparse vector of size 7, 1 explicit values| Cat34Encoded:2| Cat34Encoded:Sparse vector of size 7, 1 explicit values| Cat35Encoded:2| Cat35Encoded:Sparse vector of size 7, 1 explicit values| Cat36Encoded:2| Cat36Encoded:Sparse vector of size 6, 1 explicit values| Cat37Encoded:2| Cat37Encoded:Sparse vector of size 8, 1 explicit values| Cat38Encoded:2| Cat38Encoded:Sparse vector of size 5, 1 explicit values| Cat39Encoded:2| Cat39Encoded:Sparse vector of size 6, 1 explicit values| Feat01Featurized:Sparse vector of size 100, 0 explicit values| Feat02Featurized:Sparse vector of size 248, 4 explicit values| Feat03Featurized:Sparse vector of size 65, 2 explicit values| Feat04Featurized:Sparse vector of size 104, 4 explicit values| Feat05Featurized:Sparse vector of size 57, 2 explicit values| Feat06Featurized:Sparse vector of size 19, 2 explicit values| Feat07Featurized:Dense vector of size 7| Feat08Featurized:Sparse vector of size 153, 3 explicit values| Feat09Featurized:Sparse vector of size 80, 2 explicit values| Feat10Featurized:Dense vector of size 6| Feat11Featurized:Sparse vector of size 25, 2 explicit values| Feat12Featurized:Sparse vector of size 366, 5 explicit values| Feat13Featurized:Sparse vector of size 63, 2 explicit values| Features:Sparse vector of size 1758, 64 explicit values

Training model...
elapsed time for training the model = 665277
Evaluating the model...
elapsed time for evaluating the model = 686433
************************************************************
*       Metrics for ====Evaluation Metrics for Large datasets stored in Database==== binary classification model
*-----------------------------------------------------------
*       Accuracy: 97.08%
*       Area Under Curve:      71.04%
*       Area under Precision recall Curve:  7.39%
*       F1Score:  NaN
*       LogLoss:  .18
*       LogLossReduction:  .06
*       PositivePrecision:
*       PositiveRecall:
*       NegativePrecision:  .97
*       NegativeRecall:  100.00%
************************************************************
=============== Press any key ===============

@@ -236,79 +236,79 @@ private Delegate CreateGetterDelegate<TValue>(int col)
private ValueGetter<bool> CreateBooleanGetterDelegate(ColInfo colInfo)
{
int columnIndex = GetColumnIndex(colInfo);
return (ref bool value) => value = DataReader.GetBoolean(columnIndex);
return (ref bool value) => value = DataReader.IsDBNull(columnIndex) ? default : DataReader.GetBoolean(columnIndex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think blindly turning null into default is really the correct thing to do here.

I wonder if we should have optional behaviors that the user can opt into. Potentially something like:

  1. (default behavior) throw on nulls so the user knows they have to make some decision.
  2. Turn null into default.
  3. Convert nullable integer columns into float/double, and use NaN to designate null values. They can then use the Replace N/A transforms available to them in the rest of the pipeline.
  4. The user can always change their schema (either inserting the data into a different table and replacing null as appropriate, creating a special stored proc or SELECT statement to do the null conversion) as an option to get around the exception as well.

See @TomFinley's comments at #673 (comment) for more thoughts here.

@codemzs @ebarsoumMS - any thoughts on the appropriate behavior here?

@eerhardt eerhardt merged commit bbb6b15 into dotnet:master Aug 7, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants