Better JpegEncoder profiling & benchmarks #1534

antonfirsov · 2021-02-05T17:29:46Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This is a test/benchmark only change.

Been playing around with JpegEncoder to get some detailed data about our current performance numbers. In order to have proper profiler results (with PROFILING constant enabled), I had to change some of the [MethodImpl] attributes. Namely: switch to conditional inlining ([MethodImpl(InliningOptions.ShortMethod)]) in huffmann methods, and always inline in Block8x8F indexer.

Additionally:

Changed the EncodeJpeg benchmark to work with a bigger image, and move MemoryStream creation to GlobalSetup
Added JpegProfilingBenchmarks.EncodeJpeg_SingleMidSize

Current perf characteristics

(Updated) JpegEncoder benchmark results on my machine

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.200-preview.20614.14
  [Host]     : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
  DefaultJob : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
  
|                Method |      Mean |     Error |    StdDev | Ratio | RatioSD |
|---------------------- |----------:|----------:|----------:|------:|--------:|
| 'System.Drawing Jpeg' |  5.735 ms | 0.1122 ms | 0.2292 ms |  1.00 |    0.00 |
|     'ImageSharp Jpeg' | 73.348 ms | 1.9322 ms | 5.4498 ms | 13.06 |    1.17 |

Profiler results on my Surface Book

Running EncodeJpeg_SingleMidSize which stresses encoding of the same image with default quality settings results in the following profile using dotTrace:

100.00%   Encode420  •  2,185 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.Encode420(Image, CancellationToken, ref Byte)
  66.78%   WriteBlock  •  1,459 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.WriteBlock(QuantIndex, Int32, ref Block8x8F, ref Block8x8F, ref Block8x8F, ref Block8x8F, ref ZigZag, ref Byte)
    35.16%   EmitHuffRLE  •  768 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.EmitHuffRLE(HuffIndex, Int32, Int32, ref Byte)
      18.96%   EmitHuff  •  414 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.EmitHuff(HuffIndex, Int32, ref Byte)
        12.10%   Emit  •  264 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.Emit(UInt32, UInt32, ref Byte)
           4.67%   Write  •  102 ms  •  System.IO.MemoryStream.Write(Byte[], Int32, Int32)
      9.05%   Emit  •  198 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.Emit(UInt32, UInt32, ref Byte)
         4.11%   Write  •  90 ms  •  System.IO.MemoryStream.Write(Byte[], Int32, Int32)
       0.27%   get_Item  •  6 ms  •  System.ReadOnlySpan`1.get_Item(Int32)
      ►0.27%   get_BitCountLut  •  6 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.get_BitCountLut
    ►6.59%   TransformFDCT  •  144 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.FastFloatingPointDCT.TransformFDCT(ref Block8x8F, ref Block8x8F, ref Block8x8F, Boolean)
     5.22%   Quantize  •  114 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.Quantize(ref Block8x8F, ref Block8x8F, ref Block8x8F, ref ZigZag)
     2.47%   DivideRoundAll  •  54 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.DivideRoundAll(ref Block8x8F, ref Block8x8F)
     1.10%   MultiplyInPlace  •  24 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.MultiplyInPlace(Single)
    ►0.54%   EmitHuff  •  12 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.EmitHuff(HuffIndex, Int32, ref Byte)
  21.17%   Convert  •  463 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Encoder.YCbCrForwardConverter`1.Convert(ImageFrame, Int32, Int32, ref RowOctet)
    9.63%   Convert  •  211 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Encoder.RgbToYCbCrConverterVectorized.Convert(ReadOnlySpan, ref Block8x8F, ref Block8x8F, ref Block8x8F)
       2.74%   MultiplyAdd  •  60 ms  •  SixLabors.ImageSharp.SimdUtils+HwIntrinsics.MultiplyAdd(ref Vector256, ref Vector256, ref Vector256)
    5.22%   LoadAndStretchEdges  •  114 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.GenericBlock8x8`1.LoadAndStretchEdges(Buffer2D, Int32, Int32, ref RowOctet)
       1.65%   get_Item  •  36 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.RowOctet`1.get_Item(Int32)
    ►4.94%   ToRgb24  •  108 ms  •  SixLabors.ImageSharp.PixelFormats.Rgba32+PixelOperations.ToRgb24(Configuration, ReadOnlySpan, Span)
    ►0.27%   AsSpanUnsafe  •  6 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.GenericBlock8x8`1.AsSpanUnsafe
  7.66%   Update  •  167 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.RowOctet`1.Update(Buffer2D, Int32)
    ►6.29%   GetRowSpan  •  137 ms  •  SixLabors.ImageSharp.Memory.Buffer2D`1.GetRowSpan(Int32)
     0.27%   set_Item  •  6 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.RowOctet`1.set_Item(Int32, Span)
   1.10%   Scale16X16To8X8Vectorized  •  24 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.Scale16X16To8X8Vectorized(ref Block8x8F, ReadOnlySpan)

This means that our current primary bottleneck is Huffmann encoding.

codecov · 2021-02-05T17:42:36Z

Codecov Report

Merging #1534 (4bd6c04) into master (27135a0) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1534   +/-   ##
=======================================
  Coverage   83.47%   83.47%           
=======================================
  Files         742      742           
  Lines       32830    32830           
  Branches     3667     3667           
=======================================
  Hits        27406    27406           
  Misses       4709     4709           
  Partials      715      715

Flag	Coverage Δ
unittests	`83.47% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...rc/ImageSharp/Formats/Jpeg/Components/Block8x8F.cs	`85.65% <ø> (ø)`
src/ImageSharp/Formats/Jpeg/Components/RowOctet.cs	`91.42% <ø> (ø)`
src/ImageSharp/Formats/Jpeg/JpegEncoderCore.cs	`94.86% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27135a0...4bd6c04. Read the comment docs.

JimBobSquarePants · 2021-02-05T17:54:05Z

Love it @antonfirsov this is great insight!

This means that our current primary bottleneck is Huffmann encoding.

I thought it would be the case. I was reading up on a Cloudflare blog post talking about optimising jpegtran Huffman encoding using SIMD the other day plus I spotted that libjpeg turbo also has specific SSE versions.

antonfirsov · 2021-02-07T17:10:43Z

That >10% actually feels very surprising to me. I'm actually worried that either there is something wrong with the benchmark, or we are doing something really-really wrong.

Anyways, found the Cloudfare article mentioned:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran/

There's also a good general article by Intel:
https://software.intel.com/content/www/us/en/develop/articles/fast-computation-of-huffman-codes.html

Also notice the Stream.Write bottleneck. We could use some double buffering probably just like in the decoder.

JimBobSquarePants · 2021-02-08T06:14:19Z

I know Mango have been working in this area also. Perhaps they have some good ideas.
https://github.com/t0rakka/mango

Better JpegEncoder profiling & benchmarks

antonfirsov added 2 commits February 5, 2021 18:18

better JpegEncoder profiling/benchmarks

9088d7e

change inlining options for RowOctet.Update()

4bd6c04

JimBobSquarePants approved these changes Feb 5, 2021

View reviewed changes

JimBobSquarePants merged commit 8e21937 into master Feb 5, 2021

JimBobSquarePants deleted the af/JpegEncoder-profiling branch February 5, 2021 20:15

JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021

Merge pull request #1534 from SixLabors/af/JpegEncoder-profiling

62a224b

Better JpegEncoder profiling & benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Better JpegEncoder profiling & benchmarks #1534

Better JpegEncoder profiling & benchmarks #1534

Uh oh!

antonfirsov commented Feb 5, 2021 •

edited

Loading

Uh oh!

codecov bot commented Feb 5, 2021 •

edited

Loading

Uh oh!

JimBobSquarePants commented Feb 5, 2021

Uh oh!

antonfirsov commented Feb 7, 2021

Uh oh!

JimBobSquarePants commented Feb 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Better JpegEncoder profiling & benchmarks #1534

Better JpegEncoder profiling & benchmarks #1534

Uh oh!

Conversation

antonfirsov commented Feb 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Prerequisites

Description

Current perf characteristics

(Updated) JpegEncoder benchmark results on my machine

Profiler results on my Surface Book

Uh oh!

codecov bot commented Feb 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JimBobSquarePants commented Feb 5, 2021

Uh oh!

antonfirsov commented Feb 7, 2021

Uh oh!

JimBobSquarePants commented Feb 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

antonfirsov commented Feb 5, 2021 •

edited

Loading

codecov bot commented Feb 5, 2021 •

edited

Loading