Skip to content

Conversation

@antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Feb 5, 2021

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This is a test/benchmark only change.

Been playing around with JpegEncoder to get some detailed data about our current performance numbers. In order to have proper profiler results (with PROFILING constant enabled), I had to change some of the [MethodImpl] attributes. Namely: switch to conditional inlining ([MethodImpl(InliningOptions.ShortMethod)]) in huffmann methods, and always inline in Block8x8F indexer.

Additionally:

  • Changed the EncodeJpeg benchmark to work with a bigger image, and move MemoryStream creation to GlobalSetup
  • Added JpegProfilingBenchmarks.EncodeJpeg_SingleMidSize

Current perf characteristics

(Updated) JpegEncoder benchmark results on my machine

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=5.0.200-preview.20614.14
  [Host]     : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
  DefaultJob : .NET Core 3.1.11 (CoreCLR 4.700.20.56602, CoreFX 4.700.20.56604), X64 RyuJIT
  
|                Method |      Mean |     Error |    StdDev | Ratio | RatioSD |
|---------------------- |----------:|----------:|----------:|------:|--------:|
| 'System.Drawing Jpeg' |  5.735 ms | 0.1122 ms | 0.2292 ms |  1.00 |    0.00 |
|     'ImageSharp Jpeg' | 73.348 ms | 1.9322 ms | 5.4498 ms | 13.06 |    1.17 |

Profiler results on my Surface Book

Running EncodeJpeg_SingleMidSize which stresses encoding of the same image with default quality settings results in the following profile using dotTrace:

100.00%   Encode420  •  2,185 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.Encode420(Image, CancellationToken, ref Byte)
  66.78%   WriteBlock  •  1,459 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.WriteBlock(QuantIndex, Int32, ref Block8x8F, ref Block8x8F, ref Block8x8F, ref Block8x8F, ref ZigZag, ref Byte)
    35.16%   EmitHuffRLE  •  768 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.EmitHuffRLE(HuffIndex, Int32, Int32, ref Byte)
      18.96%   EmitHuff  •  414 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.EmitHuff(HuffIndex, Int32, ref Byte)
        12.10%   Emit  •  264 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.Emit(UInt32, UInt32, ref Byte)
           4.67%   Write  •  102 ms  •  System.IO.MemoryStream.Write(Byte[], Int32, Int32)
      9.05%   Emit  •  198 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.Emit(UInt32, UInt32, ref Byte)
         4.11%   Write  •  90 ms  •  System.IO.MemoryStream.Write(Byte[], Int32, Int32)
       0.27%   get_Item  •  6 ms  •  System.ReadOnlySpan`1.get_Item(Int32)
      ►0.27%   get_BitCountLut  •  6 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.get_BitCountLut
    ►6.59%   TransformFDCT  •  144 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.FastFloatingPointDCT.TransformFDCT(ref Block8x8F, ref Block8x8F, ref Block8x8F, Boolean)
     5.22%   Quantize  •  114 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.Quantize(ref Block8x8F, ref Block8x8F, ref Block8x8F, ref ZigZag)
     2.47%   DivideRoundAll  •  54 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.DivideRoundAll(ref Block8x8F, ref Block8x8F)
     1.10%   MultiplyInPlace  •  24 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.MultiplyInPlace(Single)
    ►0.54%   EmitHuff  •  12 ms  •  SixLabors.ImageSharp.Formats.Jpeg.JpegEncoderCore.EmitHuff(HuffIndex, Int32, ref Byte)
  21.17%   Convert  •  463 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Encoder.YCbCrForwardConverter`1.Convert(ImageFrame, Int32, Int32, ref RowOctet)
    9.63%   Convert  •  211 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Encoder.RgbToYCbCrConverterVectorized.Convert(ReadOnlySpan, ref Block8x8F, ref Block8x8F, ref Block8x8F)
       2.74%   MultiplyAdd  •  60 ms  •  SixLabors.ImageSharp.SimdUtils+HwIntrinsics.MultiplyAdd(ref Vector256, ref Vector256, ref Vector256)
    5.22%   LoadAndStretchEdges  •  114 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.GenericBlock8x8`1.LoadAndStretchEdges(Buffer2D, Int32, Int32, ref RowOctet)
       1.65%   get_Item  •  36 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.RowOctet`1.get_Item(Int32)
    ►4.94%   ToRgb24  •  108 ms  •  SixLabors.ImageSharp.PixelFormats.Rgba32+PixelOperations.ToRgb24(Configuration, ReadOnlySpan, Span)
    ►0.27%   AsSpanUnsafe  •  6 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.GenericBlock8x8`1.AsSpanUnsafe
  7.66%   Update  •  167 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.RowOctet`1.Update(Buffer2D, Int32)
    ►6.29%   GetRowSpan  •  137 ms  •  SixLabors.ImageSharp.Memory.Buffer2D`1.GetRowSpan(Int32)
     0.27%   set_Item  •  6 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.RowOctet`1.set_Item(Int32, Span)
   1.10%   Scale16X16To8X8Vectorized  •  24 ms  •  SixLabors.ImageSharp.Formats.Jpeg.Components.Block8x8F.Scale16X16To8X8Vectorized(ref Block8x8F, ReadOnlySpan)

This means that our current primary bottleneck is Huffmann encoding.

@codecov
Copy link

codecov bot commented Feb 5, 2021

Codecov Report

Merging #1534 (4bd6c04) into master (27135a0) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1534   +/-   ##
=======================================
  Coverage   83.47%   83.47%           
=======================================
  Files         742      742           
  Lines       32830    32830           
  Branches     3667     3667           
=======================================
  Hits        27406    27406           
  Misses       4709     4709           
  Partials      715      715           
Flag Coverage Δ
unittests 83.47% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rc/ImageSharp/Formats/Jpeg/Components/Block8x8F.cs 85.65% <ø> (ø)
src/ImageSharp/Formats/Jpeg/Components/RowOctet.cs 91.42% <ø> (ø)
src/ImageSharp/Formats/Jpeg/JpegEncoderCore.cs 94.86% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 27135a0...4bd6c04. Read the comment docs.

@JimBobSquarePants
Copy link
Member

Love it @antonfirsov this is great insight!

This means that our current primary bottleneck is Huffmann encoding.

I thought it would be the case. I was reading up on a Cloudflare blog post talking about optimising jpegtran Huffman encoding using SIMD the other day plus I spotted that libjpeg turbo also has specific SSE versions.

@JimBobSquarePants JimBobSquarePants merged commit 8e21937 into master Feb 5, 2021
@JimBobSquarePants JimBobSquarePants deleted the af/JpegEncoder-profiling branch February 5, 2021 20:15
@antonfirsov
Copy link
Member Author

That >10% actually feels very surprising to me. I'm actually worried that either there is something wrong with the benchmark, or we are doing something really-really wrong.

Anyways, found the Cloudfare article mentioned:
https://blog.cloudflare.com/doubling-the-speed-of-jpegtran/

There's also a good general article by Intel:
https://software.intel.com/content/www/us/en/develop/articles/fast-computation-of-huffman-codes.html

Also notice the Stream.Write bottleneck. We could use some double buffering probably just like in the decoder.

@JimBobSquarePants
Copy link
Member

I know Mango have been working in this area also. Perhaps they have some good ideas.
https://github.com/t0rakka/mango

JimBobSquarePants added a commit that referenced this pull request Mar 13, 2021
Better JpegEncoder profiling & benchmarks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants