-
Notifications
You must be signed in to change notification settings - Fork 201
Add support for >2GB tensors via byte stream #1234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
if (!is_contiguous()) throw new InvalidOperationException("SetBytes() called on non-contiguous tensor."); | ||
|
||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for CPU here?
We check in the bytes getter, but not the setter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's going to blow up if it's not CPU, then it's better to check and throw an exception with good, specific information about the problem rather than some generic I/O exception later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will add
src/TorchVision/File.cs
Outdated
@@ -58,6 +59,7 @@ public static void write_file(string filename, Tensor data) | |||
/// <param name="data">One dimensional <c>uint8</c> <cref>Tensor</cref>.</param> | |||
public static async void write_file_async(string filename, Tensor data) | |||
{ | |||
// Currently limited to 2GB - should we duplicate the code for WriteBytesToStream using async ops? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment should be deleted before merge - wrote it for @NiklasGustafsson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we write an async function?
@@ -8269,5 +8269,111 @@ Tensor nbins_ratio(int seed, int size) | |||
} | |||
} | |||
} | |||
|
|||
|
|||
[Fact(Skip = "Very heavy on the compute")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are very heavy on the compute, so I put them in Skip (but I tested them myself)
src/TorchSharp/Tensor/Tensor.cs
Outdated
/// </summary> | ||
/// <param name="stream">Stream to write the bytes to</param> | ||
/// <param name="bufferSize">The buffer size to use when writing to the stream</param> | ||
public void WriteBytesToStream(Stream stream, int bufferSize = 65_536) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the Stream.Write overload which supports Spans was added in a later version of .NET, so we can't use it and therefore resort to using a buffer.
Tagging the discussion #1219 |
@NiklasGustafsson Please don't merge this in just yet - I want to confirm that it works well with PyBridge, and to benchmark how long the load and save takes before and after. |
Benchmarking: Comparing loading from and saving to a pre-allocated memory (to mitigate the disk speed factor). The model is around 11MB in size when saved to disk. The comparison is between the "old" (current) method and with the new proposed method, with the various buffer sizes (4096, 2048, 1024):
|
So, roughly the same load speed, slightly better save speed. |
@shaltielshmid -- you asked me to hold off on merging, so I haven't... |
Update: Good to go on the merging as soon as we confirm this question below and I update the comment. Regarding the write_file_async method in TorchVision. Do we want to duplicate the WriteBytesToStream function with async operations? |
@NiklasGustafsson Just confirming you saw the previous message. I'm good to go with the merge as soon as we decide if to add an "async" version of the function. |
I saw it, but failed to recognize that it was holding up the PR. Yes, async would be doo for reading and writing. |
Okay so it's not super trivial - Spans aren't allowed in async methods, so this would require some playing around to get right. I think we should go ahead and add it to the backlog, unless you think it's important in which case I'll work on it on the next few days. |
Add it to the backlog. |
Sounds good! The PR is then ready to merge from my perspective. I removed the comment already. |
No description provided.