Skip to content

Proposal: Use value types to store query comprehensions' intermediate variables #372

Closed
@benjamin-hodgson

Description

@benjamin-hodgson

Problem statement

In high-throughput situations it's often desirable to minimise garbage. You often see advice like "don't use LINQ in hot code because it allocates a lot". One reason for this is that query comprehensions are translated to the query pattern using anonymous objects which live on the heap.

var q =
    from x in xs
    from y in ys
    from z in zs
    select x + y + z;
// translates to...
var q = xs
    .SelectMany(x => ys, (x, y) => new { x, y })  // new { x, y } is a reference type that lives on the heap
    .SelectMany(dummy => zs, (dummy, z) => dummy.x + dummy.y + z);

(Of course in practice dummy will be a transparent identifier.) While dummy will often be short-lived and won't survive the nursery, if your goal is to minimise garbage it's still preferable to avoid allocating it altogether.

You can achieve this by writing your query manually and storing intermediate variables in a custom value type. The example below will run with O(1) allocations:

var q = xs
    .SelectMany(x => ys, (x, y) => new MyStruct(x, y))
    .SelectMany(dummy => zs, (dummy, z) => dummy.x + dummy.y + z);
// or
var q =
    from dummy in (
        from x in xs
        from y in ys
        select new MyStruct(x, y)
    )
    from z in zs
    select dummy.x + dummy.y + z;

struct MyStruct
{
    public int x { get; }
    public int y { get; }
    public MyStruct(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}

When your query is long or complicated this translation gets rather tedious rather quickly (although C#7's new ValueTuple certainly eases some of the pain). I'd like to be able to use the nice original query syntax but be confident that it won't allocate a lot at run time.

Proposed solution

My proposal is to (optionally) translate the original query into one which looks like the manually-written version, by generating MyStruct at compile time, much like how anonymous objects already work.

It's not always desirable to use value types - it can be expensive to copy large value types around, and existing query providers may not understand expressions that don't use anonymous objects. So I propose having this behaviour disabled by default. Users can enable the value-type translation on a per-method level using an attribute:

[StructQueries]  // all query comprehensions in this method will use an anonymous value type for their intermediate identifiers
public void MyMethod()
{
    var q =
        from x in xs
        from y in ys
        from z in zs
        select x + y + zs;
}

// translates to...
[StructQueries]
public void MyMethod()
{
    var q =
        .SelectMany(x => ys, (x, y) => new <>AnonymousStruct0(x, y))
        .SelectMany(dummy => zs, (dummy, z) => dummy.x + dummy.y + z);
}

[CompilerGenerated]
struct <>AnonymousStruct0
{
    public int x { get; }
    public int y { get; }
    public MyStruct(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions