Description
openedon Sep 24, 2019
Note:
- Issue Relational split query mode for collection includes #20892 has been created to track actual work on split queries in EF Core 5.0. The new issue is locked so that it can be subscribed to for updates on the implementation without noise.
- This older issue has been re-purposed to be discussion about different approaches, etc. It will not be locked because it is important to us that we don’t shut down channels of communication with the community.
Preview6 introduces AsSplitQuery
API to load collections using multiple queries. Details of the API has been posted #20892 (comment)
A way to rewrite the query to avoid large result set & mimick previous version behavior is posted https://gist.github.com/smitpatel/d4cb3619e5b33e8d9ea24d3f2a88333a
Note: Per @roji's request, I am opening this in response to the comments on #17455
I have a very large query that contains about 57 includes that is being used to copy down a large entity so that it can be modified and cloned.
With EF 2.2.6, this large query ran successfully in about 1-3 seconds (variable). With the changes in 3.0 (all includes create one entire SQL statement with joins), the query takes significantly longer and always times out with the default execution timeout settings.
Steps to reproduce
Use a (IMO nasty) Linq query similar to the one as follows:
dbContext.Projects
.Include(z => z.Financial)
.Include(z => z.ProjectProtocol)
.Include(z => z.ReportCategories)
.Include(z => z.ReportSubCategories)
.Include(x => x.SubProjects).ThenInclude(y => y.Address)
.Include(x => x.SubProjects).ThenInclude(y => y.LegacyAddress)
.Include(x => x.SubProjects).ThenInclude(z => z.BuildingTypes)
.Include(x => x.SubProjects).ThenInclude(z => z.Buildings).ThenInclude(b => b.BuildingType)
.Include(x => x.SubProjects).ThenInclude(z => z.Buildings).ThenInclude(b => b.Site)
.Include(x => x.SubProjects).ThenInclude(z => z.Sites).ThenInclude(s => s.Address)
.Include(x => x.SubProjects).ThenInclude(z => z.Participants).ThenInclude(p => p.Address)
.Include(x => x.SubProjects).ThenInclude(z => z.ExcelFileJson)
.Include(x => x.SubProjects).ThenInclude(z => z.CompanionAddress)
.Include(x => x.SubProjects).ThenInclude(z => z.UtilityTypes)
.Include(x => x.SubProjects).ThenInclude(z => z.InspectedUnits).ThenInclude(i => i.Building)
.Include(x => x.SubProjects).ThenInclude(z => z.InspectedUnits).ThenInclude(i => i.UnitType)
.Include(x => x.SubProjects).ThenInclude(z => z.Utilities).ThenInclude(i => i.UtilityType)
.Include(x => x.SubProjects).ThenInclude(z => z.Units).ThenInclude(i => i.Building)
.Include(x => x.SubProjects).ThenInclude(z => z.UnitTypes)
.Include(x => x.SubProjects).ThenInclude(z => z.CommonAreas).ThenInclude(ca => ca.Building)
.Include(x => x.SubProjects).ThenInclude(z => z.FixtureAreas).ThenInclude(w => w.Fixtures)
.Include(x => x.SubProjects).ThenInclude(y => y.LineItems).ThenInclude(z => z.EnergyOneItems)
.Include(x => x.SubProjects).ThenInclude(y => y.LineItems).ThenInclude(z => z.EnergyTwoItems)
.Include(x => x.SubProjects).ThenInclude(y => y.LineItems)
.ThenInclude(z => z.TraditionalItem).AsNoTracking().FirstOrDefault(x => x.Id == project.Id);
Results:
With EF Core 2.2.6 - I can see in the output via the SQL Server Profiler that EF is breaking up the LINQ statement into smaller queries. The overall process takes about 1-3 seconds.
With EF Core 3.0 - I can see in the output via the SQL Server Profiler that EF is emitting one massive query with lots and lots of joins. The overall process always times out with the default execution timeout setting.
At this point, I am open to the notion that this query needs to be either re-written or the process needs to be changed for handling the cloning. I would still like to hear if there are any workarounds, findings that this is a bug or other suggestions to avoid having to devote a significant amount of time on rewriting.
Edit
For now we worked around this by splitting the query up manually using EF Plus' "Includes Optimized" method and then looping through the change tracker to set all of the entities as untracked so we can then reset their keys so that the graph can be comitted as a clone (this gave me a flashback to my EF 6 days).
Note: The model changed somewhat between the time this issue was first encountered and now due to user requests and other factors. I should also note that the system is now in production and users are pretty happy with the performance.
var clone = dbContext.Projects
.IncludeOptimized(z => z.Financial)
.IncludeOptimized(z => z.ProjectProtocol)
.IncludeOptimized(z => z.ReportCategories)
.IncludeOptimized(z => z.ReportSubCategories)
.IncludeOptimized(z => z.SubProject)
.IncludeOptimized(z => z.SubProject.ValidationFlag)
.IncludeOptimized(z => z.SubProject.ExcelFileJson)
.IncludeOptimized(z => z.SubProject.EnergyAuditUtilities)
.IncludeOptimized(z => z.SubProject.EnergyAuditData)
.IncludeOptimized(z => z.SubProject.EnergyAuditData.EnergyAuditAreas)
.IncludeOptimized(z => z.SubProject.Address)
.IncludeOptimized(z => z.SubProject.Utilities)
.IncludeOptimized(z => z.SubProject.UtilityTypes)
.IncludeOptimized(z => z.SubProject.Units)
.IncludeOptimized(z => z.SubProject.Units.Select(y => y.Building))
.IncludeOptimized(z => z.SubProject.BuildingTypes)
.IncludeOptimized(z => z.SubProject.Buildings)
.IncludeOptimized(z => z.SubProject.Buildings.Select(y => y.BuildingType))
.IncludeOptimized(z => z.SubProject.Buildings.Select(y => y.Site))
.IncludeOptimized(z => z.SubProject.Sites)
.IncludeOptimized(z => z.SubProject.Sites.Select(y => y.Address))
.IncludeOptimized(z => z.SubProject.Participants)
.IncludeOptimized(z => z.SubProject.Participants.Select(y => y.Address))
.IncludeOptimized(z => z.SubProject.InspectedUnits)
.IncludeOptimized(z => z.SubProject.InspectedUnits.Select(y => y.Building))
.IncludeOptimized(z => z.SubProject.InspectedUnits.Select(y => y.UnitType))
.IncludeOptimized(z => z.SubProject.UnitTypes)
.IncludeOptimized(z => z.SubProject.CommonAreas)
.IncludeOptimized(z => z.SubProject.CommonAreas.Select(y => y.Building))
.IncludeOptimized(z => z.SubProject.CommonAreas.Select(y => y.Building).Select(z => z.Units))
.IncludeOptimized(z => z.SubProject.FixtureAreas)
.IncludeOptimized(z => z.SubProject.FixtureAreas.Select(y => y.Fixtures))
.IncludeOptimized(x => x.SubProject.LineItems)
.IncludeOptimized(x => x.SubProject.LineItems.Select(y => y.EnergyOneItem))
.IncludeOptimized(x => x.SubProject.LineItems.Select(y => y.EnergyTwoTwoItem))
.IncludeOptimized(x => x.SubProject.LineItems.Select(y => y.TraditionalItem))
.FirstOrDefault(x => x.Id == project.Id);
if (clone != null)
{
foreach (var entityEntry in dbContext.ChangeTracker.Entries())
{
if (entityEntry.Entity != null)
{
entityEntry.State = EntityState.Detached;
}
}
return clone;
}
My team was struggling with this at first due to the fact that EF was at first wiping out the entities when we detatched them due to an issue on #18982. Using the work around that was posted there allowed for things to work. The overall performance is actually better due to the fact there isn't any client side evaluation. However, I would still prefer if the behavior of Includes Optimized (which is pretty much what EF 2.x did, splitting the query) was something that came out of the box with EF Core. I also do not like it how this entire scenario can no longer be done with a no tracking query (or so it seems), as it was possible to do it before with EF Core 2.x.
Further technical details
EF Core version: 3.0.0
Database provider: Microsoft.EntityFramework.SqlServer
Target framework: .Net Core 3.0
Operating system: Windows 10 x64 / Windows Server 2016
IDE: Visual Studio 2019 Pro (16.3)