- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 4.6k
feat(previews): previews for large remote files without full file download #53952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Wow, thank you for turning my suggestion into actual code, I did not expect this! I want to clarify one thing about sparse files. A sparse file is a file with a "hole"; you do not need to fill the middle with zeroes. If you write zeroes to the middle of the file, it is not a sparse file anymore. I am not very familiar with PHP as I am with other languages, but here's an untested example: $input = ...
$output = ...
$total_size = 10*1024*1024*1024; // 10GiB
$partial_size = 5242880; // 5MiB
ftruncate($output, $total_size); // create a large file without actually writing any data to disk
// Copy first part to start of file
fseek($output, 0);
stream_copy_to_stream($input, $output, $partial_size, 0);
// Copy last part to end of file
fseek($output, $total_size - $partial_size);
stream_copy_to_stream($input, $output, $partial_size, $total_size - $partial_size); | 
| 
 Thanks! So I am (incorrectly) saying it is zeroes. /dev/zero actually provides null zeroes. I quickly generated a file using ftruncate to the same size as another file I generated using /dev/zero with 0 diffs between them. | 
5dc4c1b    to
    2595107      
    Compare
  
    | 
 Indeed, there is no difference between the file content. But if you copy from /dev/zero to the file, actual data is written to disk. Example: Both files appear to be 100MB of zero data: But one file consumes 100MB of disk space, and the other 0 bytes (only metadata): It will be a lot faster to use ftruncate and fseek, instead of writing from /dev/zero to file file. Using ftruncate, you only need to write 2x 5MB of data. If you copy /dev/zero to the file, you many need to write many gigabytes of zeros to the file. | 
| 
 Understood, thanks for the explanation! Will implement this. | 
2595107    to
    d3d3819      
    Compare
  
    | Force pushed a new one using ftruncate. Thanks again! Now onto the other issue of determining if an external storage will work properly with this.  I thought that  | 
be54fec    to
    abdef72      
    Compare
  
    | Spoke too soon, it was my mistake.   | 
abdef72    to
    e62f6b8      
    Compare
  
    5c31caf    to
    52ccc6e      
    Compare
  
    | @provokateurin Force pushed an update. Couldn't move the  Also re-added in previews for the 5 sec timestamp for local files and limited the 1 and 0 sec ones for ones that require temp files. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the code comments, this was really useful to understand everything!
Would you be able to add some tests for this including real videos for testing the different scenarios? That would be really nice to ensure we don't have any regressions in the future.
Also once this PR is merged it would be cool to have an issue about the idea to let ffmpeg handle all of it. I still think we could make it work somehow and make it more reliable and effective that way.
478d921    to
    c47bafb      
    Compare
  
    | 
 👍 
 I don't know how to do that but I'll work on figuring it out.  On a related note, I came across this video in the  
 👍 | 
Head branch was pushed to by a user without write access
aaef40b    to
    c8069ea      
    Compare
  
    | Cypress is red because the PR is coming from a fork, but I think it's fine to merge it like this. | 
| But let's wait until after the branch off, so it gets into Nextcloud 33, as we are in the feature freeze phase already. | 
| 
 😭👍 | 
| Sorry 😞 | 
| Just want to throw in some cents regarding using isLocal() | 
| 
 In that scenario, it should work fine and generate the preview using the  | 
…ownload Co-authored-by: Kate <26026535+provokateurin@users.noreply.github.com> Co-authored-by: Côme Chilliet <91878298+come-nc@users.noreply.github.com> Signed-off-by: invario <67800603+invario@users.noreply.github.com>
c8069ea    to
    97a0dde      
    Compare
  
    | Just want to see if the local PR's CI gets reflected in this PR | 
| Yeah! 🥳 | 
In a nutshell, previews for remote videos were partially disabled for any files larger than 5MB for performance reasons. (To avoid the situation of say...downloading a 37GB video file from S3.)
(I have a separate pending PR #53634 for S3 that allows preview generation via a direct connection to S3)
See the following for more discussion and detail of the problem this addresses:
[Bug]: Movie preview not generated on remote storages if file > 5MiB #53469
fix(previews): Allow to parse big files for previews #53496
fix(previews): avoid large file downloads for remote movie storage #52079
Summary
Thanks to an idea from Derkedes in this post: #53469 (comment)
This PR does the following:
moov atomin the original MP4/MOV filemoov atomthat it located to the same offset in the new sparse fileMovieTestRemoteFileto verify that sparse file generation is working by comparing to a reference previewImportant notes:
ffmpeghas returned an error when attempting to generate a preview/still using a timestamp that requires data from the empty/null part of the sparse file. This is good and expected behavior and will result in fallback from 5 to 1 to 0 seconds.TODO
Checklist