Add support for >2GB tensors via byte stream #1234

shaltielshmid · 2024-02-11T03:32:37Z

No description provided.

shaltielshmid · 2024-02-11T03:33:31Z

src/TorchSharp/Tensor/Tensor.cs

+
+                if (!is_contiguous()) throw new InvalidOperationException("SetBytes() called on non-contiguous tensor.");
+
+                unsafe {


Should we check for CPU here?
We check in the bytes getter, but not the setter.

If it's going to blow up if it's not CPU, then it's better to check and throw an exception with good, specific information about the problem rather than some generic I/O exception later.

Sure, will add

shaltielshmid · 2024-02-11T03:34:06Z

src/TorchVision/File.cs

@@ -58,6 +59,7 @@ public static void write_file(string filename, Tensor data)
            /// <param name="data">One dimensional <c>uint8</c> <cref>Tensor</cref>.</param>
            public static async void write_file_async(string filename, Tensor data)
            {
+                // Currently limited to 2GB - should we duplicate the code for WriteBytesToStream using async ops?


Comment should be deleted before merge - wrote it for @NiklasGustafsson

Should we write an async function?

shaltielshmid · 2024-02-11T03:34:28Z

test/TorchSharpTest/TestTorchTensor.cs

@@ -8269,5 +8269,111 @@ Tensor nbins_ratio(int seed, int size)
                }
            }
        }
+
+
+        [Fact(Skip = "Very heavy on the compute")]


The tests are very heavy on the compute, so I put them in Skip (but I tested them myself)

shaltielshmid · 2024-02-11T03:35:47Z

src/TorchSharp/Tensor/Tensor.cs

+            /// </summary>
+            /// <param name="stream">Stream to write the bytes to</param>
+            /// <param name="bufferSize">The buffer size to use when writing to the stream</param>
+            public void WriteBytesToStream(Stream stream, int bufferSize = 65_536)


Unfortunately the Stream.Write overload which supports Spans was added in a later version of .NET, so we can't use it and therefore resort to using a buffer.

shaltielshmid · 2024-02-11T03:39:21Z

Tagging the discussion #1219

shaltielshmid · 2024-02-11T03:40:11Z

@NiklasGustafsson Please don't merge this in just yet - I want to confirm that it works well with PyBridge, and to benchmark how long the load and save takes before and after.

shaltielshmid · 2024-02-12T12:44:40Z

Benchmarking:

Comparing loading from and saving to a pre-allocated memory (to mitigate the disk speed factor). The model is around 11MB in size when saved to disk. The comparison is between the "old" (current) method and with the new proposed method, with the various buffer sizes (4096, 2048, 1024):

Method	Mean	Error	StdDev	Allocated
SaveOld	7.760 ms	0.0828 ms	0.0734 ms	22.04 KB
LoadOld	9.635 ms	0.1364 ms	0.1276 ms	10929.37 KB
Save4K	8.016 ms	0.0770 ms	0.0720 ms	62.27 KB
Load4K	8.099 ms	0.0855 ms	0.0758 ms	62.75 KB
Save2K	7.882 ms	0.1496 ms	0.1470 ms	42.26 KB
Load2K	7.935 ms	0.1165 ms	0.1090 ms	42.75 KB
Save1K	7.867 ms	0.1156 ms	0.1081 ms	32.27 KB
Load1K	7.934 ms	0.1237 ms	0.1157 ms	32.75 KB

NiklasGustafsson · 2024-02-12T17:47:36Z

Benchmarking:

Comparing loading from and saving to a pre-allocated memory (to mitigate the disk speed factor). The model is around 11MB in size when saved to disk. The comparison is between the "old" (current) method and with the new proposed method, with the various buffer sizes (4096, 2048, 1024):

So, roughly the same load speed, slightly better save speed.

NiklasGustafsson · 2024-02-13T16:19:44Z

@shaltielshmid -- you asked me to hold off on merging, so I haven't...

shaltielshmid · 2024-02-13T16:24:16Z

@shaltielshmid -- you asked me to hold off on merging, so I haven't...

Update: Good to go on the merging as soon as we confirm this question below and I update the comment.

Regarding the write_file_async method in TorchVision. Do we want to duplicate the WriteBytesToStream function with async operations?

shaltielshmid · 2024-02-14T18:03:42Z

@NiklasGustafsson Just confirming you saw the previous message. I'm good to go with the merge as soon as we decide if to add an "async" version of the function.

NiklasGustafsson · 2024-02-14T18:19:10Z

I saw it, but failed to recognize that it was holding up the PR. Yes, async would be doo for reading and writing.

shaltielshmid · 2024-02-14T18:38:46Z

Okay so it's not super trivial - Spans aren't allowed in async methods, so this would require some playing around to get right. I think we should go ahead and add it to the backlog, unless you think it's important in which case I'll work on it on the next few days.

NiklasGustafsson · 2024-02-14T18:39:53Z

Add it to the backlog.

shaltielshmid · 2024-02-14T18:48:29Z

Add it to the backlog.

Sounds good! The PR is then ready to merge from my perspective. I removed the comment already.

shaltielshmid added 2 commits February 11, 2024 05:12

Added code for using stream reading & writing for bytes, and using copy_

7a1b0b7

Updated release notes

045fe5f

shaltielshmid commented Feb 11, 2024

View reviewed changes

Updated buffer size

191fbcb

Added validation in read

2bf3015

Removed comment

e88ecfb

NiklasGustafsson merged commit 53d74a6 into dotnet:main Feb 14, 2024


		if (!is_contiguous()) throw new InvalidOperationException("SetBytes() called on non-contiguous tensor.");

		unsafe {

Add support for >2GB tensors via byte stream #1234

Add support for >2GB tensors via byte stream #1234

Uh oh!

Conversation

shaltielshmid commented Feb 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shaltielshmid commented Feb 11, 2024

Uh oh!

shaltielshmid commented Feb 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shaltielshmid commented Feb 12, 2024

Uh oh!

NiklasGustafsson commented Feb 12, 2024

Uh oh!

NiklasGustafsson commented Feb 13, 2024

Uh oh!

shaltielshmid commented Feb 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shaltielshmid commented Feb 14, 2024

Uh oh!

NiklasGustafsson commented Feb 14, 2024

Uh oh!

shaltielshmid commented Feb 14, 2024

Uh oh!

NiklasGustafsson commented Feb 14, 2024

Uh oh!

shaltielshmid commented Feb 14, 2024

Uh oh!

Uh oh!

shaltielshmid commented Feb 11, 2024 •

edited

Loading

shaltielshmid commented Feb 11, 2024 •

edited

Loading

shaltielshmid commented Feb 13, 2024 •

edited

Loading