Skip to content

Conversation

@satyamtg
Copy link
Contributor

@satyamtg satyamtg commented May 9, 2020

This would add a new video module and fix #14

This currently uses the videos here for tests. We need to change that and update the changelog before merging.

@satyamtg satyamtg marked this pull request as ready for review May 11, 2020 14:56
@satyamtg satyamtg requested a review from rgaudin May 11, 2020 14:58
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @satyamtg for this big chunk!

General comments

  • there is no usage. As it's not as easy as using a single function with clear arguments, that's needed.
  • there's no docstring nor comments. Use docstrings so users can just help() to discover usage and use comments to help maintenance (where required, we don't want to over-comment)
  • I see no reason for process_video_dir to exist. it's too specific and might be duplicated with slight differences in all the scrapers. Also, as we are moving towards pylibzim, we'll be handling files one after another.
  • what's the use case for vidutil to be updated?
  • Should we promote get_video_info into the lib? What are all the details it gives on a file?
  • Although I understand its advantages, I don't really like the current API.

Looking at the code, it's a wrapper around ffmpeg but it's too narrow. It forces the use of many options but ffmpeg is not an easy beast to please so we need flexibility otherwise all scrapers will reimplement this.
Main objectives are code-sharing and simplicity for users.
With the current API, I can't reencode a video without setting quality range which is probematic if I just want to change fornat.
Also, I can't convert audio files while that's something we need.

Thus, I'd propose you change the API to have something simpler where the converter receives ffmpeg args.

Now that's for flexibility right? What about simplicity?

For that, we should provide presets. Your video module would have a presets submodule offering various options. Those presets don't have to be a list but needs to output a list of ffmpeg args.
In addition, you may want to include a configbuilder for those with simple needs but yet not satisfied with the presets.

Pseudo-code might make more sense:

from zimscraperlib.video.presets import Webm240pHard
from zimscraperlib.video import recompress, ConfigBuilder

recompress(src_path, dst_path, Webm240pHard(), delete_src=True)

config = Webm240pHard({"-q-min": 20})
config["-vf"] = "scale=426:240"
recompress(src_path, dst_path, config)

config = ConfigBuilder(
    max_video_bitrate="300k",
    min_video_bitrate=None,
    target_video_bitrate="300k",
    buffersize="1000k",
    audio_sampling_rate=44100,
    target_audio_bitrate="128k",
    quality_range=(30, 42),
    ffmpeg_video_scale="480:trunc(ow/a/2)*2",
    quality="best")
recompress(src_path, dst_path, config)

The preset could include a VERSION attr that would be handy for our S3 usage as well.

# just an example, probably clever way to do it
class VoiceMp348k(dict):
    VERSION = 1

    options = {
        "-codec:a": "mp3",
        "-ar": "44100", # set sample rate
        "-b:a": "48k", # constant bitrate
    }
    def __init__(self, **kwargs):
        super().__init__(self, **type(self).options)
        self.update(kwargs)

    def to_ffmpeg(self):
        args = []
        for k, v in self.items():
            args += [k, v]
        return args

c = VoiceMp348k(**{"-b:a": "44k"})
c["-ar"] = "48400"
print(c.to_ffmpeg())

Common comments (spans across different places)

  • abbreviations is bad; we should use them only when needed.
  • duplicating context into variable names should be avoided unless necessary (vid_info, ffargs, etc.)
  • variables like vidcompressioncfg are unreadable. python convention is to use _ to separate words. Although in this case config would have probably been enough.
  • we may want to launch ffmpeg/ffprobe via /usr/bin/env to prevent codefactor from complaining. Not sure why it's not complaining already tbh. Do you?
  • if you don't use a variable, it should not exist. if you use it only to return, return directly

ex:

    # in test_video.py
    vid_info = {
        "codecs": codecs,
        "duration": int(format_info[0].split(".")[0]),
        "bitrate": int(format_info[1]),
    }
    return vid_info
    # replacable with
    return {
        "codecs": codecs,
        "duration": int(format_info[0].split(".")[0]),
        "bitrate": int(format_info[1]),
    }

Good work! let me know your thoughts on the API change.

python-resize-image==1.1.19
jinja2==2.10.1
babel==2.7.0
humanfriendly==8.2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allow humanfriendly 8.x.
Actually, if we're using it only for test, it makes no sense to require it here. Maybe it's time we introduce a requirements-dev.txt ?

While you're at it, remove jinja2 and babel that are not used at all and allow for python-resize-image 1.1.*

"The video format with which VidUtil is initialized requires audio_codec, video_codec and extra_ffmpeg_params to be passed as arguments"
)

tmp_path = src_path.parent.joinpath(f"video.tmp.{self.video_format}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should use tempfile as well. It was fine next to download file for youtube and TED but we don't want to enforce that in the lib.

f"recompress {src_path} -> {dst_path} video format = {self.video_format}"
)
logger.debug(nicer_args_join(args))
subprocess.run(args, check=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will crash on error. you may want to do that in two steps to clean-up temp file if you're not using a context manager

("minrate", "min_video_bitrate"),
]
for index, option_data in enumerate(options_map):
option, attr = option_data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can replace that with for index, (option, attr) in enumerate(options_map):

from zimscraperlib.download import save_large_file


def get_video_info(src_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move that to the library? It seems quite useful although we have not been using such info so far

# -*- coding: utf-8 -*-
# vim: ai ts=4 sts=4 et sw=4 nu

import pathlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import. you may want to integrate flake8 with your text editor

def test_video_recompression(
vidutil, vidcompressioncfg, temp_video_dir, hosted_video_links
):
temp_video_dir.mkdir(parents=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have to create it, it's not a temp_video_dir that you're getting



@pytest.fixture(scope="function")
def temp_video_dir():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to use tempfile and not have that along the code. tests can fail so it may not be cleaned-up.

@pytest.fixture(scope="function")
def hosted_video_links():
links = {
"mp4": "https://github.com/satyamtg/test-bucket/raw/master/video.mp4",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I see the files are not that big (700K and 1MB), we may consider adding them to the repo. not sure yet. Until that we'll move them to kiwix server.
I think it would make more sense to have test files with clear audio :) Even better maybe, a 1-2s long cut of an open video like big-buck-bunny or something from blender foundation?

@satyamtg
Copy link
Contributor Author

Closing this as this isn't mergeable and the issue would be fixed from the new PR from video_module.

@satyamtg satyamtg closed this May 14, 2020
@satyamtg satyamtg mentioned this pull request May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for video post processing and convertion

2 participants