Skip to content
This repository was archived by the owner on Mar 12, 2021. It is now read-only.
This repository was archived by the owner on Mar 12, 2021. It is now read-only.

Performance regression with mapreduce  #611

Closed
@MasonProtter

Description

@MasonProtter

Here's an example for me on the master branch:

julia> using BenchmarkTools, CuArrays

julia> function pi_mc_cu(nsamples)
           xs = CuArrays.rand(nsamples); ys = CuArrays.rand(nsamples)
           mapreduce((x, y) -> (x^2 + y^2) < 1.0, +, xs, ys, init=0) * 4/nsamples
       end
pi_mc_cu (generic function with 1 method)

julia> @benchmark pi_mc_cu(10000000)
BenchmarkTools.Trial: 
  memory estimate:  16.63 KiB
  allocs estimate:  473
  --------------
  minimum time:     1.620 ms (0.00% GC)
  median time:      1.666 ms (0.00% GC)
  mean time:        1.709 ms (1.60% GC)
  maximum time:     9.460 ms (7.77% GC)
  --------------
  samples:          2921
  evals/sample:     1

(@v1.4) pkg> st CuArrays 
Status `~/.julia/environments/v1.4/Project.toml`
  [3a865a2d] CuArrays v1.7.0 #master (https://github.com/JuliaGPU/CuArrays.jl.git)

(@v1.4) pkg> st CUDAnative
Status `~/.julia/environments/v1.4/Project.toml`
  [be33ccc6] CUDAnative v2.10.2 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)

and here's that same example on the latest tagged version:

julia> using BenchmarkTools, CuArrays

julia> function pi_mc_cu(nsamples)
           xs = CuArrays.rand(nsamples); ys = CuArrays.rand(nsamples)
           mapreduce((x, y) -> (x^2 + y^2) < 1.0, +, xs, ys, init=0) * 4/nsamples
       end
pi_mc_cu (generic function with 1 method)

julia> @benchmark pi_mc_cu(10000000)
BenchmarkTools.Trial: 
  memory estimate:  4.61 KiB
  allocs estimate:  126
  --------------
  minimum time:     594.302 μs (0.00% GC)
  median time:      659.321 μs (0.00% GC)
  mean time:        667.914 μs (1.58% GC)
  maximum time:     2.338 ms (39.61% GC)
  --------------
  samples:          7463
  evals/sample:     1

(@v1.4) pkg> st CuArrays
Status `~/.julia/environments/v1.4/Project.toml`
  [3a865a2d] CuArrays v1.7.2

(@v1.4) pkg> st CUDAnative
Status `~/.julia/environments/v1.4/Project.toml`
  [be33ccc6] CUDAnative v2.10.2

As you can see, I lost around a factor of 3 performance on the new master. I tested the master version with and without JULIA_CUDA_USE_BINARYBUILDER=false , so binary builder is not the problem. Likely due to #602


julia> versioninfo()
Julia Version 1.4.0-rc1.0
Commit b0c33b0cf5* (2020-01-23 17:23 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 5 2600 Six-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, znver1)
Environment:
  JULIA_NUM_THREADS = 6

[mason@mason-pc ~]$ sudo pacman -Q --info cuda
Name            : cuda
Version         : 10.2.89-3
Description     : NVIDIA's GPU programming toolkit
Architecture    : x86_64
URL             : https://developer.nvidia.com/cuda-zone
Licenses        : custom:NVIDIA
Groups          : None
Provides        : cuda-toolkit  cuda-sdk
Depends On      : gcc8-libs  gcc8  opencl-nvidia  nvidia-utils
Optional Deps   : gdb: for cuda-gdb
                  java-runtime=8: for nsight and nvvp
Required By     : cudnn
Optional For    : None
Conflicts With  : None
Replaces        : cuda-toolkit  cuda-sdk
Installed Size  : 4.04 GiB
Packager        : Sven-Hendrik Haase <svenstaro@gmail.com>
Build Date      : Tue 31 Dec 2019 01:07:53 AM MST
Install Date    : Wed 26 Feb 2020 03:04:42 PM MST
Install Reason  : Explicitly installed
Install Script  : Yes
Validated By    : Signature

[mason@mason-pc ~]$ lspci  -v -s  $(lspci | grep ' VGA ' | cut -d" " -f 1)
1f:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. TU106 [GeForce RTX 2060 Rev. A]
        Flags: bus master, fast devsel, latency 0, IRQ 71
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions