Update VMAF_Python_library.md; move models part to models.md.

Netflix · li-zhi · Jul 16, 2018 · Jul 15, 2018 · Jul 15, 2018 · Jul 15, 2018
commit a068690fdc34634d320c100f5b04d00d945102b5
diff --git a/README.md b/README.md
@@ -17,13 +17,13 @@ Refer to the [FAQ](FAQ.md) page.
 
 ## Usages
 
-THe VDK package offers multiple ways for a user to interact with the VMAF algorithm implementations. The core feature extraction library is written in C. The rest scripting code including the classes for machine learning regression, training and testing VMAF models and etc., is written in Python. Besides, there is C++ "wrapper" code partially replicating the logic in the regression classes, such that the VMAF prediction (excluding training) is fully implemented in C/C++.
+THe VDK package offers a number of ways for a user to interact with the VMAF algorithm implementations. The core feature extraction library is written in C. The rest scripting code including the classes for machine learning regression, training and testing VMAF models and etc., is written in Python. Besides, there is C++ "wrapper" code partially replicating the logic in the regression classes, such that the VMAF prediction (excluding training) is fully implemented in C/C++.
 
 There are a number of ways one can use the pakcage: 
 
   - [VMAF Python library](resource/doc/VMAF_Python_library.md) offers full functionalities including running basic VMAF command line, running VMAF on a batch of video files, training and testing a VMAF model on video datasets, and visualization tools, etc. It also provides a command line tool `ffmpeg2vmaf` that can pipe FFmpeg-decoded raw videos to VMAF. Unlike other command lines, `ffmpeg2vmaf` can take compressed video bitstreams as input.
   - [`vmafossexec` - a C++ "wrapper" executable](resource/doc/vmafossexec.md) offers running the prediction part of the algorithm in full, such that one can easily deploy VMAF in a production environment without needing to configure the Python dependancies. Additionally, `vmafossexec` offers a number of exclusive features, such as 1) speed optimization using multi-threading and skipping frames, 2) optionally computing PSNR, SSIM and MS-SSIM metrics in the output.
-  - [`libvmaf` - a static library](resource/doc/libvmaf.md) offers an interface to incorporate VMAF into your C/C++ code. Using this library, VMAF is now included as a filter in [FFmpeg](http://ffmpeg.org/) main branch, and can be configured using: `./configure --enable-libvmaf`.
+  - [`libvmaf.a` - a static library](resource/doc/libvmaf.md) offers an interface to incorporate VMAF into your C/C++ code. Using this library, VMAF is now included as a filter in [FFmpeg](http://ffmpeg.org/) main branch, and can be configured using: `./configure --enable-libvmaf`.
   - [VMAF Dockerfile](Dockerfile) generates a VMAF docker image from the [VMAF Python library](resource/doc/VMAF_Python_library.md). Refer to [this](resource/doc/docker.md) document for detailed usages.
 
 ## Datasets

diff --git a/resource/doc/VMAF_Python_library.md b/resource/doc/VMAF_Python_library.md
@@ -3,6 +3,8 @@ VMAF Python Library
 
 The VMAF Python library offers full functionalities from running basic VMAF command line, running VMAF on a batch of video files, training and testing a VMAF model on video datasets, and visualization tools, etc. It is the playground to experiment with the VMAF algorithm.
 
+It also provides a command line tool [`ffmpeg2vmaf`](#using-ffmmpeg2vmaf) that can pipe FFmpeg-decoded raw videos to VMAF. Unlike other command lines, `ffmpeg2vmaf` can take compressed video bitstreams as input.
+
 ## Prerequisites
 
 The VMAF Python library has its core feature extraction library written in C, and the rest scripting code written in Python. To build the C code, it requires `gcc` and `g++` (>=4.8). To run scripts and tests, it requires Python2 (>= 2.7) installed.
@@ -41,7 +43,7 @@ sudo -H pip install --upgrade pip
 Then install the required Python packages:
 
 ```
-pip install --user numpy scipy matplotlib notebook pandas sympy nose scikit-learn scikit-image h5py
+pip install --user numpy scipy matplotlib pandas scikit-learn scikit-image h5py
 ```
 
 Make sure your user install executable directory is on your PATH. Add this to the end of `~/.bashrc` and restart your shell:
@@ -130,11 +132,13 @@ After installation, run:
 ./unittest
 ```
 
+and expect all tests pass.
+
 ## Basic Usage
 
-There are two basic execution modes to run VMAF – a single mode and a batch mode.
+One can run VMAF either in single mode by `run_vmaf` or in batch mode by `run_vmaf_in_batch`. Besides, `ffmpeg2vmaf` is a command line tool that offers the capability of taking compressed video bitstreams as input.
 
-### Running in Single Mode
+### `run_vmaf` -- Running VMAF in Single Mode
 
 To run VMAF on a single reference/distorted video pair, run:
 
@@ -183,19 +187,9 @@ where `VMAF_score` is the final score and the others are the scores for VMAF's e
 - `adm2`, `vif_scalex` scores range from 0 (worst) to 1 (best)
 - `motion2` score typically ranges from 0 (static) to 20 (high-motion)
 
-### Using `ffmpeg2vmaf`
-
-There is also an `ffmpeg2vmaf` script which can compare any file format supported by `ffmpeg`. `ffmpeg2vmaf` essentially pipes FFmpeg-decoded videos to VMAF. Note that you need a recent version of `ffmpeg` installed (for the first time, run the command line, follow the prompted instruction to specify the path of `ffmpeg`).
-
-```
-./ffmpeg2vmaf quality_width quality_height reference_path distorted_path [--model model_path] [--out-fmt out_fmt]
-```
-
-Here `quality_width` and `quality_height` are the width and height the reference and distorted videos are scaled to before VMAF calculation. This is different from `run_vmaf`'s  `width` and `height`, which specify the raw YUV's width and height instead. The input to `ffmpeg2vmaf` must already have such information specified in the header so that they are FFmpeg-decodable.
-
-### Running in Batch Mode
+### `run_vmaf_in_batch` -- Running VMAF in Batch Mode
 
-To run VMAF in batch mode, create an input text file, where each corresponds to the following format (check examples in [example_batch_input](resource/example/example_batch_input)):
+To run VMAF in batch mode, create an input text file, where each corresponds to the following format (check examples in [example_batch_input](../../resource/example/example_batch_input)):
 
 ```
 format width height reference_path distorted_path
@@ -222,72 +216,15 @@ For example:
 ./run_vmaf_in_batch resource/example/example_batch_input --parallelize
 ```
 
-### Predict Quality on a Cellular Phone Screen
-
-VMAF v0.6.1 and later support a custom quality model for cellular phone screen viewing. This model can be invoked by adding `--phone-model` option in the commands `run_vmaf`, `run_vmaf_in_batch` (but also in `run_testing` and `vmafossexec` which are introduced the following sections):
-
-```
-./run_vmaf yuv420p 576 324 \
-  python/test/resource/yuv/src01_hrc00_576x324.yuv \
-  python/test/resource/yuv/src01_hrc01_576x324.yuv \
-  --phone-model
-
-./run_vmaf_in_batch resource/example/example_batch_input --parallelize \
-  --phone-model
-```
-
-This model is trained using subjective data collected in a lab experiment, based on the [absolute categorical rating (ACR)](https://en.wikipedia.org/wiki/Absolute_Category_Rating) methodology, with the exception that after viewing a video sequence, a subject votes on a continuous scale (from "bad" to "excellent"), instead of the more conventional five-level discrete scale. The test content are video clips selected from the Netflix catalog, each 10 seconds long. For each clip, a combination of 6 resolutions and 3 encoding parameters are used to generate the processed video sequences, resulting 18 impairment conditions for testing. Instead of fixating the viewing distance, each subject is instructed to view the video at a distance he/she feels comfortable with. In the trained model, the score ranges from 0 to 100, which is linear with the subjective voting scale, where roughly "bad" is mapped to score 20, and "excellent" is mapped to score 100.
-
-Invoking the phone model will generate VMAF scores higher than in the regular model, which is more suitable for laptop, TV, etc. viewing conditions. An example VMAF–bitrate relationship for the two models is shown below:
-
-![regular vs phone model](/resource/images/phone_model.png)
-
-From the figure it can be interpreted that due to the factors of screen size and viewing distance, the same distorted video would be perceived as having a higher quality when viewed on a phone screen than on a laptop/TV screen, and when the quality score reaches its maximum (100), further increasing the encoding bitrate would not result in any perceptual improvement in quality.
-
-### Predict Quality on a 4KTV Screen at 1.5H
-
-As June 2018, we have added a new 4K VMAF model at `model/vmaf_4k_v0.6.1.pkl`, which predicts the subjective quality of video displayed on a 4KTV and viewed from the distance of 1.5 times the height of the display device (1.5H). This model is trained with subjective data collected in a lab experiment, using the ACR methodology. The viewing distance of 1.5H is the critical distance for a human subject to appreciate the quality of 4K content (see [recommendation](https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.2022-0-201208-I!!PDF-E.pdf)).
-
-To invoke this model, specify the model path using the `--model` option. For example:
-
-```
-./run_vmaf yuv420p 3840 2160 ref_path dis_path --model model/vmaf_4k_v0.6.1.pkl
-```
-
-### Invoking Prediction Confidence Interval
-
-As June 2018, we have introduced a way to quantify the level of confidence in VMAF predictions. Each VMAF prediction score can now come with a 95% confidence interval (CI), which quantifies the level of confidence that the prediction lies within the interval. The CI is a consequence of the fact that the VMAF model is trained on a sample of subjective scores, while the population is unknown. The CI is established through bootstrapping on the prediction residue using the full training data. Essentially, it trains multiple models, using "resampling with replacement", on the residue of prediction. Each of the models will introduce a slightly different prediction. The variability of these predictions quantifies the level of confidence -- the more close these predictions, the more confident the prediction using the full data.
-
-To enable CI, use the option `--ci` in the command line tools with a bootstrapping model such as `model/vmaf_rb_v0.6.2/vmaf_rb_v0.6.2.pkl`.
-
-For example, running
-
-```
-./run_vmaf yuv420p 576 324 python/test/resource/yuv/src01_hrc00_576x324.yuv \
-python/test/resource/yuv/src01_hrc01_576x324.yuv \
---model model/vmaf_rb_v0.6.2/vmaf_rb_v0.6.2.pkl --out-fmt json --ci
-```
+### Using `ffmpeg2vmaf`
 
-yields:
+There is also an `ffmpeg2vmaf` command line tool which can compare any file format decodable by `ffmpeg`. `ffmpeg2vmaf` essentially pipes FFmpeg-decoded videos to VMAF. Note that you need a recent version of `ffmpeg` installed (for the first time, run the command line, follow the prompted instruction to specify the path of `ffmpeg`).
 
 ```
-...
-    "aggregate": {
-        "BOOTSTRAP_VMAF_bagging_score": 73.09994670135325, 
-        "BOOTSTRAP_VMAF_score": 75.44304862545658, 
-        "BOOTSTRAP_VMAF_stddev_score": 1.2301198524660464, 
-        "VMAF_feature_adm2_score": 0.9345878077620574, 
-        "VMAF_feature_motion2_score": 3.8953518541666665, 
-        "VMAF_feature_vif_scale0_score": 0.36342081156994926, 
-        "VMAF_feature_vif_scale1_score": 0.7666473878461729, 
-        "VMAF_feature_vif_scale2_score": 0.8628533892781629, 
-        "VMAF_feature_vif_scale3_score": 0.9159718691393048, 
-        "method": "mean"
-    }
-}
+./ffmpeg2vmaf quality_width quality_height reference_path distorted_path [--model model_path] [--out-fmt out_fmt]
 ```
 
-Here, `BOOTSTRAP_VMAF_score` is the final prediction result, similar to `VMAF_score` without the `--ci` option. `BOOTSTRAP_VMAF_stddev_score` is the standard deviation of bootstrapping predictions. If assuming a normal distribution, the 95% CI is `BOOTSTRAP_VMAF_score +/- 1.96 * BOOTSTRAP_VMAF_stddev_score`.
+Here `quality_width` and `quality_height` are the width and height the reference and distorted videos are scaled to before VMAF calculation. This is different from `run_vmaf`'s  `width` and `height`, which specify the raw YUV's width and height instead. The input to `ffmpeg2vmaf` must already have such information specified in the header so that they are FFmpeg-decodable.
 
 ## Advanced Usage
 

diff --git a/resource/doc/libvmaf.md b/resource/doc/libvmaf.md
@@ -27,6 +27,14 @@ int do_ms_ssim, char *pool_method, int thread, int subsample, int enable_conf_in
 
 Here, `read_frame` is a callback function which can be used to pass data from a program to VMAF. `user_data` is a program specific data that can be used by the callback function. For sample usage of `compute_vmaf`, refer to [`wrapper/src/main.cpp`](../../wrapper/src/main.cpp).
 
+To test the library, run:
+
+```
+make testlib
+```
+
+This command will build an executable `testlib` using this library together with [`wrapper/src/main.cpp`](../../wrapper/src/main.cpp).
+
 To uninstall the library run:
 
 ```

diff --git a/resource/doc/models.md b/resource/doc/models.md
@@ -1,4 +1,69 @@
 Models
 ===================
 
-TODO.
+### Predict Quality on a Cellular Phone Screen
+
+VMAF v0.6.1 and later support a custom quality model for cellular phone screen viewing. This model can be invoked by adding `--phone-model` option in the commands `run_vmaf`, `run_vmaf_in_batch` (but also in `run_testing` and `vmafossexec` which are introduced the following sections):
+
+```
+./run_vmaf yuv420p 576 324 \
+  python/test/resource/yuv/src01_hrc00_576x324.yuv \
+  python/test/resource/yuv/src01_hrc01_576x324.yuv \
+  --phone-model
+
+./run_vmaf_in_batch resource/example/example_batch_input --parallelize \
+  --phone-model
+```
+
+This model is trained using subjective data collected in a lab experiment, based on the [absolute categorical rating (ACR)](https://en.wikipedia.org/wiki/Absolute_Category_Rating) methodology, with the exception that after viewing a video sequence, a subject votes on a continuous scale (from "bad" to "excellent"), instead of the more conventional five-level discrete scale. The test content are video clips selected from the Netflix catalog, each 10 seconds long. For each clip, a combination of 6 resolutions and 3 encoding parameters are used to generate the processed video sequences, resulting 18 impairment conditions for testing. Instead of fixating the viewing distance, each subject is instructed to view the video at a distance he/she feels comfortable with. In the trained model, the score ranges from 0 to 100, which is linear with the subjective voting scale, where roughly "bad" is mapped to score 20, and "excellent" is mapped to score 100.
+
+Invoking the phone model will generate VMAF scores higher than in the regular model, which is more suitable for laptop, TV, etc. viewing conditions. An example VMAF–bitrate relationship for the two models is shown below:
+
+![regular vs phone model](/resource/images/phone_model.png)
+
+From the figure it can be interpreted that due to the factors of screen size and viewing distance, the same distorted video would be perceived as having a higher quality when viewed on a phone screen than on a laptop/TV screen, and when the quality score reaches its maximum (100), further increasing the encoding bitrate would not result in any perceptual improvement in quality.
+
+### Predict Quality on a 4KTV Screen at 1.5H
+
+As June 2018, we have added a new 4K VMAF model at `model/vmaf_4k_v0.6.1.pkl`, which predicts the subjective quality of video displayed on a 4KTV and viewed from the distance of 1.5 times the height of the display device (1.5H). This model is trained with subjective data collected in a lab experiment, using the ACR methodology. The viewing distance of 1.5H is the critical distance for a human subject to appreciate the quality of 4K content (see [recommendation](https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.2022-0-201208-I!!PDF-E.pdf)).
+
+To invoke this model, specify the model path using the `--model` option. For example:
+
+```
+./run_vmaf yuv420p 3840 2160 ref_path dis_path --model model/vmaf_4k_v0.6.1.pkl
+```
+
+### Invoking Prediction Confidence Interval
+
+As June 2018, we have introduced a way to quantify the level of confidence in VMAF predictions. Each VMAF prediction score can now come with a 95% confidence interval (CI), which quantifies the level of confidence that the prediction lies within the interval. The CI is a consequence of the fact that the VMAF model is trained on a sample of subjective scores, while the population is unknown. The CI is established through bootstrapping on the prediction residue using the full training data. Essentially, it trains multiple models, using "resampling with replacement", on the residue of prediction. Each of the models will introduce a slightly different prediction. The variability of these predictions quantifies the level of confidence -- the more close these predictions, the more confident the prediction using the full data.
+
+To enable CI, use the option `--ci` in the command line tools with a bootstrapping model such as `model/vmaf_rb_v0.6.2/vmaf_rb_v0.6.2.pkl`.
+
+For example, running
+
+```
+./run_vmaf yuv420p 576 324 python/test/resource/yuv/src01_hrc00_576x324.yuv \
+python/test/resource/yuv/src01_hrc01_576x324.yuv \
+--model model/vmaf_rb_v0.6.2/vmaf_rb_v0.6.2.pkl --out-fmt json --ci
+```
+
+yields:
+
+```
+...
+    "aggregate": {
+        "BOOTSTRAP_VMAF_bagging_score": 73.09994670135325, 
+        "BOOTSTRAP_VMAF_score": 75.44304862545658, 
+        "BOOTSTRAP_VMAF_stddev_score": 1.2301198524660464, 
+        "VMAF_feature_adm2_score": 0.9345878077620574, 
+        "VMAF_feature_motion2_score": 3.8953518541666665, 
+        "VMAF_feature_vif_scale0_score": 0.36342081156994926, 
+        "VMAF_feature_vif_scale1_score": 0.7666473878461729, 
+        "VMAF_feature_vif_scale2_score": 0.8628533892781629, 
+        "VMAF_feature_vif_scale3_score": 0.9159718691393048, 
+        "method": "mean"
+    }
+}
+```
+
+Here, `BOOTSTRAP_VMAF_score` is the final prediction result, similar to `VMAF_score` without the `--ci` option. `BOOTSTRAP_VMAF_stddev_score` is the standard deviation of bootstrapping predictions. If assuming a normal distribution, the 95% CI is `BOOTSTRAP_VMAF_score +/- 1.96 * BOOTSTRAP_VMAF_stddev_score`.
diff --git a/resource/doc/vmafossexec.md b/resource/doc/vmafossexec.md
@@ -32,4 +32,6 @@ Optionally, one can test `vmafossexec` by running the [`vmafossexec_test.py`](..
 ```
 pip install --user numpy scipy pandas sklearn
 PYTHONPATH=python/src python python/test/vmafossexec_test.py
-```
+```
+
+Expect all tests pass.