Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected query results with UseRelationalNulls and Sum() #34436

Open
GertArnold opened this issue Aug 14, 2024 · 4 comments
Open

Unexpected query results with UseRelationalNulls and Sum() #34436

GertArnold opened this issue Aug 14, 2024 · 4 comments

Comments

@GertArnold
Copy link

GertArnold commented Aug 14, 2024

In .net, see the output of the following statements:

new[] { default(int?), default(int?) }.Max()     // => null
new[] { default(int?), default(int?) }.Sum()     // => 0

And similar statements in SQL (SQL Server) :

DECLARE @numbers TABLE (Number int);
INSERT @numbers values (null), (null);
SELECT MAX(Number) FROM @numbers;     -- => null
SELECT SUM(Number) FROM @numbers;     -- => null

I.e. the database semantics for the Sum function is different than .net's.

Therefore, when using the UseRelationalNulls() option, I'd expect Sum in a LINQ expression to return null when all contributing items are null. However, it doesn't. It returns 0.

Reproducing code (paste in Linqpad as program, using EF8 NuGet for SQL Server and localdb installed.)

#define useRelationalNulls
void Main()
{
    //newDb();
    using var db = getContext(false);
    db.Tests.Sum(t => t.Number).Dump("Sum all, from queryable");
    db.Database.SqlQueryRaw<int?>("SELECT SUM(Number) AS value FROM Tests").AsEnumerable().First().Dump("Sum all, from raw SQL");
    db.Tests.Where(t => t.Number == null).Sum(t => t.Number).Dump("Sum all null, from queryable");
    db.Database.SqlQueryRaw<int?>("SELECT SUM(Number) AS value FROM Tests WHERE Number IS NULL").AsEnumerable().First().Dump("Sum all null, from raw SQL");

    db.Tests.Max(t => t.Number).Dump("Max all, from queryable");
    db.Database.SqlQueryRaw<int?>("SELECT MAX(Number) AS value FROM Tests").AsEnumerable().First().Dump("Max all, from raw SQL");
    db.Tests.Where(t => t.Number == null).Max(t => t.Number).Dump("Max all null, from queryable");
    db.Database.SqlQueryRaw<int?>("SELECT MAX(Number) AS value FROM Tests WHERE Number IS NULL").AsEnumerable().First().Dump("Max all null, from raw SQL");
}

class Test
{
    public int Id { get; set; }
    public int? Number { get; set; }
}

class MyContext : DbContext
{
    private readonly string _connString;
    private readonly bool _log;

    public MyContext(string connectionString, bool log) : base()
    {
        _connString = connectionString;
        _log = log;
    }
    
    public DbSet<Test> Tests { get; set; }

    protected override void OnModelCreating(ModelBuilder mb)
    {
        mb.Entity<Test>()
            .HasData(new[]
            {
                new Test { Id = 1, Number = 10 },
                new Test { Id = 2, Number = 20 },
                new Test { Id = 3, Number = null },
                new Test { Id = 4, Number = null },
            });
    }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        bool relationalNulls = false;
#if useRelationalNulls
        relationalNulls = true;
#endif
        optionsBuilder.UseSqlServer(_connString, o => o.UseRelationalNulls(relationalNulls));
        if (_log) optionsBuilder.LogTo(s => s.Dump(), LogLevel.Information);
    }
}

void newDb()
{
    using var db = getContext(false);
    db.Database.EnsureDeleted();
    db.Database.EnsureCreated();
}

MyContext getContext(bool log)
{
    var connectionString = @$"Server=(localDB)\MSSQLLocalDB;database=nullSemantics;Integrated Security=true;MultipleActiveResultSets=true;Encrypt=true;TrustServerCertificate=true;Application Name=Linqpad";
    return new MyContext(connectionString, log);
}

The output of the statements, both with and without UseRelationalNulls:

Method Result
Sum all, from queryable 30
Sum all, from raw SQL 30
Sum all null, from queryable 0
Sum all null, from raw SQL null
Max all, from queryable 20
Max all, from raw SQL 20
Max all null, from queryable null
Max all null, from raw SQL null

So we see that the difference in semantics between .net and the database (row 3 and 4) is not affected by relational nulls settings. For other aggregate functions like Max the .net and database semantics are always equal, as said above.

The problem is caused by the query translation that always contains SELECT COALESCE(SUM([t].[Number]), 0). I'd expect this COALESCE call to not be added when database null semantics are switched on.

EF Core version: 8.0.7
Database provider: Microsoft.EntityFrameworkCore.SqlServer 2019
Target framework: NET 8.0
Operating system: win 11

@roji
Copy link
Member

roji commented Aug 14, 2024

Makes sense; FWIW UseRelationalNulls() is quite a fringe feature that isn't used by many - there are likely to be various other issues with it (and the priority isn't very high). But we should fix these - putting on the backlog.

@roji roji added this to the Backlog milestone Aug 14, 2024
@GertArnold
Copy link
Author

GertArnold commented Aug 15, 2024

For the time being, a work-around is using this function mapping, although I don't like the hack:

public static int? SumNullable(int? number) => throw new InvalidOperationException($"{nameof(SumNullable)} cannot be called client side.");

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.HasDbFunction(this.GetType().GetRuntimeMethod(nameof(SumNullable), new[] { typeof(int?) }))
        .HasTranslation(args =>
            new SqlFunctionExpression(
                    functionName: "SUM ", // Hack: inserted space to prevent EF from reverting to its built-in "SUM" mapping.
                    arguments: args,
                    nullable: true,
                    argumentsPropagateNullability: new[] { true },
                    type: typeof(int),
                    typeMapping: null
            ));
}

Usage:

db.Tests.Where(t => t.Number == null).Select(t => MyContext.SumNullable(t.Number))

Result: null.

@GertArnold
Copy link
Author

The function above can only be used in single 1-dimensional queries. In projections, the generated query is illegal (in SQL Server) because of queried fields that are not in a grouping (I think some other db engines would allow that). I'm not aware of possibilities to replace Queryable.Sum by a custom function.

All in all, I would strongly prefer abandoning this path of making Sum emulate .Net behavior (which itself is disputable). It's much, much easier to force null to 0 in C# code, when desired, than dealing with an ambiguous 0 value. Of course, people should be aware of the difference, but wouldn't that only be one of many areas where developers have to be aware of the difference between LINQ-to-SQL-backend and LINQ-to-objects? It would make life a easier to just return what the database returns, also for EF developers.

@roji
Copy link
Member

roji commented Aug 20, 2024

@GertArnold I'm not aware of any particular problem with translating Sum() in a way that emulates the .NET behavior; that's generally something we strive to do for all of our translations, and make exceptions here only in very specific cases. The above trouble is specific to UseRelationalNulls() - and we should indeed fix that - but that doesn't seem like it should be a reason to stop coalescing SUM to zero in cases where UseRelationalNulls() isn't being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants