Create video module #16

satyamtg · 2020-05-09T16:41:23Z

This would add a new video module and fix #14

This currently uses the videos here for tests. We need to change that and update the changelog before merging.

…aperlib into video_module

rgaudin

Thank you @satyamtg for this big chunk!

General comments

there is no usage. As it's not as easy as using a single function with clear arguments, that's needed.
there's no docstring nor comments. Use docstrings so users can just help() to discover usage and use comments to help maintenance (where required, we don't want to over-comment)
I see no reason for process_video_dir to exist. it's too specific and might be duplicated with slight differences in all the scrapers. Also, as we are moving towards pylibzim, we'll be handling files one after another.
what's the use case for vidutil to be updated?
Should we promote get_video_info into the lib? What are all the details it gives on a file?
Although I understand its advantages, I don't really like the current API.

Looking at the code, it's a wrapper around ffmpeg but it's too narrow. It forces the use of many options but ffmpeg is not an easy beast to please so we need flexibility otherwise all scrapers will reimplement this.
Main objectives are code-sharing and simplicity for users.
With the current API, I can't reencode a video without setting quality range which is probematic if I just want to change fornat.
Also, I can't convert audio files while that's something we need.

Thus, I'd propose you change the API to have something simpler where the converter receives ffmpeg args.

Now that's for flexibility right? What about simplicity?

For that, we should provide presets. Your video module would have a presets submodule offering various options. Those presets don't have to be a list but needs to output a list of ffmpeg args.
In addition, you may want to include a configbuilder for those with simple needs but yet not satisfied with the presets.

Pseudo-code might make more sense:

from zimscraperlib.video.presets import Webm240pHard
from zimscraperlib.video import recompress, ConfigBuilder

recompress(src_path, dst_path, Webm240pHard(), delete_src=True)

config = Webm240pHard({"-q-min": 20})
config["-vf"] = "scale=426:240"
recompress(src_path, dst_path, config)

config = ConfigBuilder(
    max_video_bitrate="300k",
    min_video_bitrate=None,
    target_video_bitrate="300k",
    buffersize="1000k",
    audio_sampling_rate=44100,
    target_audio_bitrate="128k",
    quality_range=(30, 42),
    ffmpeg_video_scale="480:trunc(ow/a/2)*2",
    quality="best")
recompress(src_path, dst_path, config)

The preset could include a VERSION attr that would be handy for our S3 usage as well.

# just an example, probably clever way to do it
class VoiceMp348k(dict):
    VERSION = 1

    options = {
        "-codec:a": "mp3",
        "-ar": "44100", # set sample rate
        "-b:a": "48k", # constant bitrate
    }
    def __init__(self, **kwargs):
        super().__init__(self, **type(self).options)
        self.update(kwargs)

    def to_ffmpeg(self):
        args = []
        for k, v in self.items():
            args += [k, v]
        return args

c = VoiceMp348k(**{"-b:a": "44k"})
c["-ar"] = "48400"
print(c.to_ffmpeg())

Common comments (spans across different places)

abbreviations is bad; we should use them only when needed.
duplicating context into variable names should be avoided unless necessary (vid_info, ffargs, etc.)
variables like vidcompressioncfg are unreadable. python convention is to use _ to separate words. Although in this case config would have probably been enough.
we may want to launch ffmpeg/ffprobe via /usr/bin/env to prevent codefactor from complaining. Not sure why it's not complaining already tbh. Do you?
if you don't use a variable, it should not exist. if you use it only to return, return directly

ex:

    # in test_video.py
    vid_info = {
        "codecs": codecs,
        "duration": int(format_info[0].split(".")[0]),
        "bitrate": int(format_info[1]),
    }
    return vid_info
    # replacable with
    return {
        "codecs": codecs,
        "duration": int(format_info[0].split(".")[0]),
        "bitrate": int(format_info[1]),
    }

Good work! let me know your thoughts on the API change.

rgaudin · 2020-05-12T12:22:00Z

requirements.txt

 python-resize-image==1.1.19
 jinja2==2.10.1
 babel==2.7.0
+humanfriendly==8.2


Allow humanfriendly 8.x.
Actually, if we're using it only for test, it makes no sense to require it here. Maybe it's time we introduce a requirements-dev.txt ?

While you're at it, remove jinja2 and babel that are not used at all and allow for python-resize-image 1.1.*

rgaudin · 2020-05-12T12:26:03Z

src/zimscraperlib/video.py

+                "The video format with which VidUtil is initialized requires audio_codec, video_codec and extra_ffmpeg_params to be passed as arguments"
+            )
+
+        tmp_path = src_path.parent.joinpath(f"video.tmp.{self.video_format}")


should use tempfile as well. It was fine next to download file for youtube and TED but we don't want to enforce that in the lib.

rgaudin · 2020-05-12T12:26:27Z

src/zimscraperlib/video.py

+            f"recompress {src_path} -> {dst_path} video format = {self.video_format}"
+        )
+        logger.debug(nicer_args_join(args))
+        subprocess.run(args, check=True)


will crash on error. you may want to do that in two steps to clean-up temp file if you're not using a context manager

rgaudin · 2020-05-12T12:26:58Z

tests/video/test_video.py

+        ("minrate", "min_video_bitrate"),
+    ]
+    for index, option_data in enumerate(options_map):
+        option, attr = option_data


you can replace that with for index, (option, attr) in enumerate(options_map):

rgaudin · 2020-05-12T12:28:44Z

tests/video/test_video.py

+from zimscraperlib.download import save_large_file
+
+
+def get_video_info(src_path):


Should we move that to the library? It seems quite useful although we have not been using such info so far

rgaudin · 2020-05-12T12:30:39Z

tests/video/test_video.py

+# -*- coding: utf-8 -*-
+# vim: ai ts=4 sts=4 et sw=4 nu
+
+import pathlib


unused import. you may want to integrate flake8 with your text editor

rgaudin · 2020-05-12T12:31:31Z

tests/video/test_video.py

+def test_video_recompression(
+    vidutil, vidcompressioncfg, temp_video_dir, hosted_video_links
+):
+    temp_video_dir.mkdir(parents=True)


if you have to create it, it's not a temp_video_dir that you're getting

rgaudin · 2020-05-12T12:33:53Z

tests/video/conftest.py

+
+
+@pytest.fixture(scope="function")
+def temp_video_dir():


we want to use tempfile and not have that along the code. tests can fail so it may not be cleaned-up.

rgaudin · 2020-05-12T12:36:54Z

tests/video/conftest.py

+@pytest.fixture(scope="function")
+def hosted_video_links():
+    links = {
+        "mp4": "https://github.com/satyamtg/test-bucket/raw/master/video.mp4",


👍 I see the files are not that big (700K and 1MB), we may consider adding them to the repo. not sure yet. Until that we'll move them to kiwix server.
I think it would make more sense to have test files with clear audio :) Even better maybe, a 1-2s long cut of an open video like big-buck-bunny or something from blender foundation?

satyamtg · 2020-05-14T17:27:23Z

Closing this as this isn't mergeable and the issue would be fixed from the new PR from video_module.

satyamtg added 12 commits May 9, 2020 14:06

Create video module

381802d

Add VidUtil in video.py

2ef5c9f

Add tests for VidCompressionCfg

b38f5ea

Fix failing VidCompressionCfg test

0f2fafa

Add test for recompress_video()

dbfcd91

Add complete tests for video module

c42b72b

Fix codefactor

c72adf4

Add ffmpeg installation in ci.yml

65c2709

Add ffmpeg installation for qa

0669e32

Fix tests for python3.6

52e83e2

Merge branch 'video_module' of https://github.com/satyamtg/python_scr…

a38794f

…aperlib into video_module

Replace text with universal_newlines for backwards compatibility

daccd5b

satyamtg marked this pull request as ready for review May 11, 2020 14:56

satyamtg requested a review from rgaudin May 11, 2020 14:58

rgaudin requested changes May 12, 2020

View reviewed changes

satyamtg closed this May 14, 2020

satyamtg mentioned this pull request May 14, 2020

Video module #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Create video module #16

Create video module #16

Uh oh!

satyamtg commented May 9, 2020 •

edited

Loading

Uh oh!

rgaudin left a comment

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

rgaudin May 12, 2020

Uh oh!

satyamtg commented May 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from zimscraperlib.download import save_large_file


		def get_video_info(src_path):

Uh oh!

Create video module #16

Create video module #16

Uh oh!

Conversation

satyamtg commented May 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgaudin left a comment

Choose a reason for hiding this comment

General comments

Common comments (spans across different places)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

satyamtg commented May 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

satyamtg commented May 9, 2020 •

edited

Loading