Open
Description
System Information (please complete the following information):
- OS & Version: Windows 10
- ML.NET Version: Microsot.ML 3.0.0
Describe the bug
Fit method accesses the database two times instead of one time.
Cache was set.
Maximum BatchSize was set
a 15 seconds delay happens between the two queries.
Database server is in the same server with the running code, there is no load on the server.
To Reproduce
if you dont have time to reproduce please just look at my ipynb code
Create a database loader for the data, see this ipynb (change from json to ipynb)
download QDB and install
my data is from here but you can use your own data
Expected behavior
The LightGbm trainer should query the database once and not twice.
Screenshots, Code, Sample Projects
Database log:
2023-11-30T20:35:42.275822Z I server-main os scheduled worker started [name=ilpwriter_0]
2023-11-30T20:35:42.823066Z A server-main enjoy
2023-11-30T20:35:51.589288Z I pg-server connected [ip=127.0.0.1, fd=3436]
2023-11-30T20:35:51.591142Z I i.q.g.SqlCompilerImpl parse [fd=3436, q=xHole where date<'2017-01-01T00:00:00.000000Z']
2023-11-30T20:35:51.600643Z I i.q.g.SqlCompilerImpl plan [q=`select-choose date, store_nbr, family, sfZero, id, sf, sales, onpromotion, sfOpen, sfPromotion, transactions, dcoilwtico, city, local_event, local_type, local_desc, local_transferred, state, regional_event, regional_type, regional_desc, regional_transferred, national_event, national_type, national_desc, national_transferred, type, cluster, doywoy, yearcount, monthCount, weekOfYear, DayOfWeek, DayOfMonth, daysCounter, monthProgress, dayOfYear, yearProgress, Weekend, quarter, RANAD, familysfOpen from (select [date, store_nbr, family, sfZero, id, sf, sales, onpromotion, sfOpen, sfPromotion, transactions, dcoilwtico, city, local_event, local_type, local_desc, local_transferred, state, regional_event, regional_type, regional_desc, regional_transferred, national_event, national_type, national_desc, national_transferred, type, cluster, doywoy, yearcount, monthCount, weekOfYear, DayOfWeek, DayOfMonth, daysCounter, monthProgress, dayOfYear, yearProgress, Weekend, quarter, RANAD, familysfOpen] from xHole timestamp (date) where date < '2017-01-01T00:00:00.000000Z')`, fd=3436]
2023-11-30T20:35:51.605458Z I i.q.c.TableReader open partition C:\qdbroot\db\xHole\2013 [rowCount=650430, partitionNameTxn=-1, transientRowCount=433026, partitionIndex=0, partitionCount=5]
2023-11-30T20:35:51.608417Z I i.q.c.TableReader open partition C:\qdbroot\db\xHole\2014 [rowCount=655776, partitionNameTxn=-1, transientRowCount=433026, partitionIndex=1, partitionCount=5]
2023-11-30T20:35:51.611011Z I i.q.c.TableReader open partition C:\qdbroot\db\xHole\2015 [rowCount=650430, partitionNameTxn=-1, transientRowCount=433026, partitionIndex=2, partitionCount=5]
2023-11-30T20:35:51.613647Z I i.q.c.TableReader open partition C:\qdbroot\db\xHole\2016 [rowCount=669042, partitionNameTxn=-1, transientRowCount=433026, partitionIndex=3, partitionCount=5]
2023-11-30T20:36:04.985713Z I i.q.g.SqlCompilerImpl parse [fd=-1, q=DISCARD ALL]
2023-11-30T20:36:04.985903Z I i.q.c.p.PGConnectionContext exec [fd=3436, q=xHole where date<'2017-01-01T00:00:00.000000Z']
2023-11-30T20:36:04.985926Z I i.q.c.p.PGConnectionContext query cache used [fd=3436]
2023-11-30T20:36:31.160548Z I i.q.g.SqlCompilerImpl parse [fd=-1, q=DISCARD ALL]
2023-11-30T20:36:31.160893Z I i.q.g.SqlCompilerImpl parse [fd=3436, q=xHole where date<'2017-08-16T00:00:00.000000Z' AND date>='2017-01-01T00:00:00.000000Z']
2023-11-30T20:36:31.163041Z I i.q.g.SqlCompilerImpl plan [q=`select-choose date, store_nbr, family, sfZero, id, sf, sales, onpromotion, sfOpen, sfPromotion, transactions, dcoilwtico, city, local_event, local_type, local_desc, local_transferred, state, regional_event, regional_type, regional_desc, regional_transferred, national_event, national_type, national_desc, national_transferred, type, cluster, doywoy, yearcount, monthCount, weekOfYear, DayOfWeek, DayOfMonth, daysCounter, monthProgress, dayOfYear, yearProgress, Weekend, quarter, RANAD, familysfOpen from (select [date, store_nbr, family, sfZero, id, sf, sales, onpromotion, sfOpen, sfPromotion, transactions, dcoilwtico, city, local_event, local_type, local_desc, local_transferred, state, regional_event, regional_type, regional_desc, regional_transferred, national_event, national_type, national_desc, national_transferred, type, cluster, doywoy, yearcount, monthCount, weekOfYear, DayOfWeek, DayOfMonth, daysCounter, monthProgress, dayOfYear, yearProgress, Weekend, quarter, RANAD, familysfOpen] from xHole timestamp (date) where date < '2017-08-16T00:00:00.000000Z' and date >= '2017-01-01T00:00:00.000000Z')`, fd=3436]
2023-11-30T20:36:31.164073Z I i.q.c.TableReader open partition C:\qdbroot\db\xHole\2017 [rowCount=433026, partitionNameTxn=-1, transientRowCount=433026, partitionIndex=4, partitionCount=5]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment