Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support Span<T> Enumerator / Memory<T> in ObjectReader.Create #79

Open
joebone opened this issue Dec 16, 2019 · 6 comments

Comments

@joebone
Copy link

joebone commented Dec 16, 2019

In the case where we have massive Bulk Inserts pending ( which seems like a very common use case for this library ;) ), we often have to break those into smaller batches, often 100K or less from Lists or arrays of a couple of million.

Rather than allocating those intermediate arrays, it would be great to be able to pass a Sliced Span, or at the very least a Memory. Internally I see the private source is an Enumerator rather than the IEnumerable that is required by the method, and GetEnumerator is called at the end.

It would be beneficial to be able to either

Basic use case example of the intended use case:

private async Task BatchedBulkInsertAsync<T>(T[] arr, string tableName, string[] members, int chunkSize = 100000) {
  var mem = new ReadOnlyMemory<T>(arr);
  for (int i = 0; i < arr.Length; i += chunkSize) {
    var slice = mem.Slice(i, Math.Min(chunkSize, arr.Length - i));
    _logger.LogInformation("Inserting {size} {tableName} : Pos:{i}, Len:{pLength}, Total:{total} ", slice.Length, tableName);
    using (var bcp = new SqlBulkCopy(_connectionString))
    using (var reader = ObjectReader.Create<T>(slice.Span.GetEnumerator(), members)) {
      bcp.DestinationTableName = tableName;
      var sw = Stopwatch.StartNew();
      bcp.BulkCopyTimeout = 120;
      await bcp.WriteToServerAsync(reader);
      sw.Stop();
       _logger.LogWarning("Inserted {count} to {tableName} in {elapsed}", slice.Length, tableName, sw.Elapsed.TotalMilliseconds);
    }
}
@mgravell
Copy link
Owner

The notion of reusing the object instance has merit, but overall is unlikely to be a significant overhead in any scenarios that are talking about data access. I'd want to talk about perceived benefit before jumping in there, because then we also get into issues of lifetime management, disposal / recycling, etc. I suspect the complexity and risk outweigh any expected benefit.


We can't take Span<T> / Span<T>.Enumerator, as they are both ref struct and cannot be stored anywhere in this context (since we need to implement an interface here, we can't consider a custom ref struct implementation). In particular, re:

or at least be able to pass an IEnumerator such as Span<T>.GetEnumerator()

you'll notice that Span<T>.Enumerator is not in fact an IEnumerator, for this same reason.


Memory<T> is possible, but has significant issues here, as the .Span access should be considered relatively expensive - plus we'd need to slice or index into the span each time we progress the thing.


Overall, I'd say that it is unlikely that we can add huge benefit here. Is there a specific scenario that you're thinking of? Also, if there is something we want to do here, then in any scenario that is "bulk", I'd also be tempted to look past Span<T> / Memory<T>, and think more about ReadOnlySequence<T>, since most "bulk" scenarios usually benefit from discontiguous data support.

@joebone
Copy link
Author

joebone commented Dec 16, 2019

First off, thanks for the quick response, wasn't expecting you to be quite so on-the-ball.
And to be fair, you may well have a point on the perception of benefit vs the real.

Our particular use case would probably benefit more from the object reuse/pooling on type than from the saved allocations.

The code sample I provided above is not far from the actual implementation we are using, witht he distinction that the fantasy line:
var reader = ObjectReader.Create<T>(slice.Span.GetEnumerator(), members)
is actually just implemented as a slice.ToArray().

Yea.

However I had up until this moment completely forgotten about ArraySegment, which is basically what I was looking for, and does implement the required interfaces..

So I guess the saving would be having to recreate the TypeAccessor and iterate the members each time - admittedly not a lot compared to the catastrophic copies of 100K elements to the LOH, but still.. I'm cheap. If I can avoid paying twice for the same work, I will.

Perhaps a change in scope to the original request then - a Recreate(newSource) or equivalent API that
would set the private IEnumerator source?

@mgravell
Copy link
Owner

mgravell commented Dec 16, 2019 via email

@joebone
Copy link
Author

joebone commented Dec 16, 2019

To be sure I'm understanding correctly, are you suggesting a subclass of objectreader?

I hadn't actually considered that, but yea, would allow more control at the cost of flexibility, although following the whole pit of success axiom it would be useful then to include an analyzer/fixer with the nuget package which advises when the more optimal path can be used(for discoverability purposes).

Or perhaps an internal path like Count on IEnumerable when it is an Ilist?

Either way, just having the class would definitely benefit us 😊.

I do feel a bit guilty about just crying for features without at least contributing though so maybe I'll branch and have a stab at an implementation myself, if for no other reason than practice.

@mgravell
Copy link
Owner

mgravell commented Dec 16, 2019 via email

@joebone
Copy link
Author

joebone commented Dec 16, 2019

Yea, whatever i was thinking was far more over-engineered than that :p Awesome, sounds great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants