Skip to content

Conversation

@invario
Copy link
Contributor

@invario invario commented Jul 16, 2025

In a nutshell, previews for remote videos were partially disabled for any files larger than 5MB for performance reasons. (To avoid the situation of say...downloading a 37GB video file from S3.)

(I have a separate pending PR #53634 for S3 that allows preview generation via a direct connection to S3)

See the following for more discussion and detail of the problem this addresses:

Summary

Thanks to an idea from Derkedes in this post: #53469 (comment)

Downloading only a small part of the video is a useful optimization, however it also needs to support video formats that place important data at the end of the file. Instead of downloading only the first 5MB, would be possible to create a sparse file and download the first and last 5MB? The middle of the file would be empty, but ffmpeg should be able to extract a single frame.

I think this issue should be classified as a bug, since video thumbnails are now broken for mp4 files on external storages.

This PR does the following:

  • It will search for the moov atom in the original MP4/MOV file
  • It will create a blank file of same size as the video
  • It will copy the first 10MB of the remote file to the beginning of the new sparse file
  • It will copy the moov atom that it located to the same offset in the new sparse file
  • This sparse file will be used to generate a preview using an image grab from 5 sec and (failing that) also 1, and 0 seconds in.
  • Adds a new test MovieTestRemoteFile to verify that sparse file generation is working by comparing to a reference preview
  • Adds a new higher bitrate video to use for testing

Important notes:

  • I increased the download amount from 5MB to 10MB for the leading download. This is necessary because 5MB is too little to achieve reliable preview generation from modern video files.
  • For reference, 1080p H265/HEVC 30FPS recorded on a GS25+ has a bitrate of about 10Mbps. This translates to requiring 6.25 MB for 5 seconds of video. 4K H265/HEVC 60FPS is 70Mbps, which translates to 44 MB for 5 seconds of video.
  • Under this PR's sparse file method, ffmpeg has returned an error when attempting to generate a preview/still using a timestamp that requires data from the empty/null part of the sparse file. This is good and expected behavior and will result in fallback from 5 to 1 to 0 seconds.

TODO

  • Testers? I have tested this on an external SMB storage mount using files as large as 2GB.
  • Input and discussion?

Checklist

@invario invario requested a review from a team as a code owner July 16, 2025 00:04
@invario invario requested review from icewind1991, provokateurin and sorbaugh and removed request for a team July 16, 2025 00:04
@provokateurin provokateurin added bug 3. to review Waiting for reviews labels Jul 16, 2025
@provokateurin provokateurin linked an issue Jul 16, 2025 that may be closed by this pull request
@provokateurin provokateurin added this to the Nextcloud 32 milestone Jul 16, 2025
@Derkades
Copy link

Wow, thank you for turning my suggestion into actual code, I did not expect this! I want to clarify one thing about sparse files. A sparse file is a file with a "hole"; you do not need to fill the middle with zeroes. If you write zeroes to the middle of the file, it is not a sparse file anymore.

I am not very familiar with PHP as I am with other languages, but here's an untested example:

$input = ...
$output = ...
$total_size = 10*1024*1024*1024; // 10GiB
$partial_size = 5242880; // 5MiB

ftruncate($output, $total_size); // create a large file without actually writing any data to disk
// Copy first part to start of file
fseek($output, 0);
stream_copy_to_stream($input, $output, $partial_size, 0);
// Copy last part to end of file
fseek($output, $total_size - $partial_size);
stream_copy_to_stream($input, $output, $partial_size, $total_size - $partial_size);

@invario
Copy link
Contributor Author

invario commented Jul 16, 2025

Wow, thank you for turning my suggestion into actual code, I did not expect this! I want to clarify one thing about sparse files. A sparse file is a file with a "hole"; you do not need to fill the middle with zeroes. If you write zeroes to the middle of the file, it is not a sparse file anymore.

I am not very familiar with PHP as I am with other languages, but here's an untested example:

$input = ...
$output = ...
$total_size = 10*1024*1024*1024; // 10GiB
$partial_size = 5242880; // 5MiB

ftruncate($output, $total_size); // create a large file without actually writing any data to disk
// Copy first part to start of file
fseek($output, 0);
stream_copy_to_stream($input, $output, $partial_size, 0);
// Copy last part to end of file
fseek($output, $total_size - $partial_size);
stream_copy_to_stream($input, $output, $partial_size, $total_size - $partial_size);

Thanks! So I am (incorrectly) saying it is zeroes. /dev/zero actually provides null zeroes. I quickly generated a file using ftruncate to the same size as another file I generated using /dev/zero with 0 diffs between them.

@invario invario force-pushed the preview-sparse-file branch from 5dc4c1b to 2595107 Compare July 16, 2025 13:51
@Derkades
Copy link

Derkades commented Jul 16, 2025

I quickly generated a file using ftruncate to the same size as another file I generated using /dev/zero with 0 diffs between them.

Indeed, there is no difference between the file content. But if you copy from /dev/zero to the file, actual data is written to disk.

Example:

dd if=/dev/zero of=testfile1 bs=1M count=100
truncate -s 100M testfile2

Both files appear to be 100MB of zero data:

ls -l testfile*
-rw-r--r--. 1 robin robin 104857600 Jul 16 16:17 testfile1
-rw-r--r--. 1 robin robin 104857600 Jul 16 16:17 testfile2

But one file consumes 100MB of disk space, and the other 0 bytes (only metadata):

du testfile*
102404  testfile1
0       testfile2

It will be a lot faster to use ftruncate and fseek, instead of writing from /dev/zero to file file. Using ftruncate, you only need to write 2x 5MB of data. If you copy /dev/zero to the file, you many need to write many gigabytes of zeros to the file.

@invario
Copy link
Contributor Author

invario commented Jul 16, 2025

I quickly generated a file using ftruncate to the same size as another file I generated using /dev/zero with 0 diffs between them.

Indeed, there is no difference between the file content. But if you copy from /dev/zero to the file, actual data is written to disk.

Example:

dd if=/dev/zero of=testfile1 bs=1M count=100
truncate -s 100M testfile2

Both files appear to be 100MB of zero data:

ls -l testfile*
-rw-r--r--. 1 robin robin 104857600 Jul 16 16:17 testfile1
-rw-r--r--. 1 robin robin 104857600 Jul 16 16:17 testfile2

But one file consumes 100MB of disk space, and the other 0 bytes (only metadata):

du testfile*
102404  testfile1
0       testfile2

It will be a lot faster to use ftruncate and fseek, instead of writing from /dev/zero to file file. Using ftruncate, you only need to write 2x 5MB of data. If you copy /dev/zero to the file, you many need to write many gigabytes of zeros to the file.

Understood, thanks for the explanation! Will implement this.

@invario invario force-pushed the preview-sparse-file branch from 2595107 to d3d3819 Compare July 16, 2025 19:52
@invario
Copy link
Contributor Author

invario commented Jul 16, 2025

Force pushed a new one using ftruncate. Thanks again!

Now onto the other issue of determining if an external storage will work properly with this. I thought that stream_get_meta_data would allow me to check for the seekable value but after testing against S3 and SMB, seekable comes up as false even though it appears to work properly on both.

@invario invario force-pushed the preview-sparse-file branch 2 times, most recently from be54fec to abdef72 Compare July 16, 2025 23:16
@invario
Copy link
Contributor Author

invario commented Jul 16, 2025

Spoke too soon, it was my mistake. [seekable] seems to be returning true so I have made the changes in the code to check first.

@invario invario force-pushed the preview-sparse-file branch from abdef72 to e62f6b8 Compare July 16, 2025 23:21
@invario invario force-pushed the preview-sparse-file branch 2 times, most recently from 5c31caf to 52ccc6e Compare July 18, 2025 03:24
@invario
Copy link
Contributor Author

invario commented Jul 18, 2025

@provokateurin Force pushed an update. Couldn't move the moov atom to the front like I originally planned (it's not as simple as moving it, and ffprobe/ffmpeg refuse to process it.) So I stuck with generating a sparse file. Added lots of comments to explain what's going on.

Also re-added in previews for the 5 sec timestamp for local files and limited the 1 and 0 sec ones for ones that require temp files.

Copy link
Member

@provokateurin provokateurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the code comments, this was really useful to understand everything!

Would you be able to add some tests for this including real videos for testing the different scenarios? That would be really nice to ensure we don't have any regressions in the future.

Also once this PR is merged it would be cool to have an issue about the idea to let ffmpeg handle all of it. I still think we could make it work somehow and make it more reliable and effective that way.

@invario invario force-pushed the preview-sparse-file branch from 478d921 to c47bafb Compare July 18, 2025 13:15
@invario
Copy link
Contributor Author

invario commented Jul 18, 2025

Thanks for all the code comments, this was really useful to understand everything!

👍

Would you be able to add some tests for this including real videos for testing the different scenarios? That would be really nice to ensure we don't have any regressions in the future.

I don't know how to do that but I'll work on figuring it out. On a related note, I came across this video in the tests directory from 10 years ago.
https://github.com/nextcloud/server/blob/master/tests/data/testimage.mp4
It's no wonder 5MB was the amount used for the initial grab and 5 seconds for an initial attempt; this file is only 375 KB total for the entire 5 seconds, resolution of 560x320 and a bitrate of only 465 KB/s. Times sure have changed.

Also once this PR is merged it would be cool to have an issue about the idea to let ffmpeg handle all of it. I still think we could make it work somehow and make it more reliable and effective that way.

👍

auto-merge was automatically disabled August 29, 2025 13:42

Head branch was pushed to by a user without write access

@invario invario force-pushed the preview-sparse-file branch 2 times, most recently from aaef40b to c8069ea Compare August 29, 2025 13:46
@provokateurin provokateurin disabled auto-merge August 29, 2025 16:19
@provokateurin
Copy link
Member

Cypress is red because the PR is coming from a fork, but I think it's fine to merge it like this.

@provokateurin
Copy link
Member

But let's wait until after the branch off, so it gets into Nextcloud 33, as we are in the feature freeze phase already.

@invario
Copy link
Contributor Author

invario commented Aug 30, 2025

But let's wait until after the branch off, so it gets into Nextcloud 33, as we are in the feature freeze phase already.

😭👍

@provokateurin
Copy link
Member

Sorry 😞

@Vislike
Copy link

Vislike commented Sep 3, 2025

Just want to throw in some cents regarding using isLocal()
As I also mentioned here #52079 (comment)
If you use encryption isLocal() returns false even if it is a local storage.
So this might need to be tested when using local server encryption also.

@invario
Copy link
Contributor Author

invario commented Sep 3, 2025

Just want to throw in some cents regarding using isLocal() As I also mentioned here #52079 (comment) If you use encryption isLocal() returns false even if it is a local storage. So this might need to be tested when using local server encryption also.

In that scenario, it should work fine and generate the preview using the getSparseFile() method, which shouldn't be necessary if it had returned isLocal()=true. It shouldn't break, but it should still work.

…ownload

Co-authored-by: Kate <26026535+provokateurin@users.noreply.github.com>
Co-authored-by: Côme Chilliet <91878298+come-nc@users.noreply.github.com>
Signed-off-by: invario <67800603+invario@users.noreply.github.com>
@provokateurin
Copy link
Member

Closing this PR now, as @artonge created a PR from the nextcloud org, so that the entire CI will run. Thanks a lot @invario !

@artonge
Copy link
Contributor

artonge commented Sep 5, 2025

Just want to see if the local PR's CI gets reflected in this PR

@artonge artonge reopened this Sep 5, 2025
@artonge artonge merged commit 05b4403 into nextcloud:master Sep 5, 2025
614 of 792 checks passed
@artonge
Copy link
Contributor

artonge commented Sep 5, 2025

Yeah! 🥳

@invario invario deleted the preview-sparse-file branch September 5, 2025 12:45
@skjnldsv skjnldsv modified the milestones: Nextcloud 33, Nextcloud 32 Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants