Skip to content

DataFrame crashes on attempt to create column with size higher than 2,147,483,591 bytes #6699

Closed
@asmirnov82

Description

@asmirnov82

DataFrame is said to work with columns larger than int size (Count and Length has long type).

However it fails to create column even less than Max,Int amount of items.
This code crashes with System.OutOfMemoryException: 'Array dimensions exceeded supported range.':

PrimitiveDataFrameColumn<byte> column = new PrimitiveDataFrameColumn<byte>("test", int.MaxValue - 10);
DataFrame df = new DataFrame(column);

Some investigation shows, that DataFrame internaly uses structure similar to ChunkedArray, with the size of each chunk equal to Max.Int bytes (which is 2147483647).
However, according to MSDN (https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element) actual maximum array size is less than this value:

The maximum size in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for arrays containing other types.

So despite the playing around with buffers collection and chunked arrays, dataframe fails in DataFrameBuffer.EnsureCapacity method

var newCapacity = Math.Max(newLength * Size, ReadOnlyBuffer.Length * 2);
var memory = new Memory<byte>(new byte[newCapacity]);

for newCapacity higher than 2,147,483,591

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions