Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Performance regression with mapreduce  #611

Closed

Description

Here's an example for me on the master branch:

julia> using BenchmarkTools, CuArrays

julia> function pi_mc_cu(nsamples)
           xs = CuArrays.rand(nsamples); ys = CuArrays.rand(nsamples)
           mapreduce((x, y) -> (x^2 + y^2) < 1.0, +, xs, ys, init=0) * 4/nsamples
       end
pi_mc_cu (generic function with 1 method)

julia> @benchmark pi_mc_cu(10000000)
BenchmarkTools.Trial: 
  memory estimate:  16.63 KiB
  allocs estimate:  473
  --------------
  minimum time:     1.620 ms (0.00% GC)
  median time:      1.666 ms (0.00% GC)
  mean time:        1.709 ms (1.60% GC)
  maximum time:     9.460 ms (7.77% GC)
  --------------
  samples:          2921
  evals/sample:     1

(@v1.4) pkg> st CuArrays 
Status `~/.julia/environments/v1.4/Project.toml`
  [3a865a2d] CuArrays v1.7.0 #master (https://github.com/JuliaGPU/CuArrays.jl.git)

(@v1.4) pkg> st CUDAnative
Status `~/.julia/environments/v1.4/Project.toml`
  [be33ccc6] CUDAnative v2.10.2 #master (https://github.com/JuliaGPU/CUDAnative.jl.git)

and here's that same example on the latest tagged version:

julia> using BenchmarkTools, CuArrays

julia> function pi_mc_cu(nsamples)
           xs = CuArrays.rand(nsamples); ys = CuArrays.rand(nsamples)
           mapreduce((x, y) -> (x^2 + y^2) < 1.0, +, xs, ys, init=0) * 4/nsamples
       end
pi_mc_cu (generic function with 1 method)

julia> @benchmark pi_mc_cu(10000000)
BenchmarkTools.Trial: 
  memory estimate:  4.61 KiB
  allocs estimate:  126
  --------------
  minimum time:     594.302 μs (0.00% GC)
  median time:      659.321 μs (0.00% GC)
  mean time:        667.914 μs (1.58% GC)
  maximum time:     2.338 ms (39.61% GC)
  --------------
  samples:          7463
  evals/sample:     1

(@v1.4) pkg> st CuArrays
Status `~/.julia/environments/v1.4/Project.toml`
  [3a865a2d] CuArrays v1.7.2

(@v1.4) pkg> st CUDAnative
Status `~/.julia/environments/v1.4/Project.toml`
  [be33ccc6] CUDAnative v2.10.2

As you can see, I lost around a factor of 3 performance on the new master. I tested the master version with and without JULIA_CUDA_USE_BINARYBUILDER=false , so binary builder is not the problem. Likely due to #602


julia> versioninfo()
Julia Version 1.4.0-rc1.0
Commit b0c33b0cf5* (2020-01-23 17:23 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 5 2600 Six-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, znver1)
Environment:
  JULIA_NUM_THREADS = 6

[mason@mason-pc ~]$ sudo pacman -Q --info cuda
Name            : cuda
Version         : 10.2.89-3
Description     : NVIDIA's GPU programming toolkit
Architecture    : x86_64
URL             : https://developer.nvidia.com/cuda-zone
Licenses        : custom:NVIDIA
Groups          : None
Provides        : cuda-toolkit  cuda-sdk
Depends On      : gcc8-libs  gcc8  opencl-nvidia  nvidia-utils
Optional Deps   : gdb: for cuda-gdb
                  java-runtime=8: for nsight and nvvp
Required By     : cudnn
Optional For    : None
Conflicts With  : None
Replaces        : cuda-toolkit  cuda-sdk
Installed Size  : 4.04 GiB
Packager        : Sven-Hendrik Haase <svenstaro@gmail.com>
Build Date      : Tue 31 Dec 2019 01:07:53 AM MST
Install Date    : Wed 26 Feb 2020 03:04:42 PM MST
Install Reason  : Explicitly installed
Install Script  : Yes
Validated By    : Signature

[mason@mason-pc ~]$ lspci  -v -s  $(lspci | grep ' VGA ' | cut -d" " -f 1)
1f:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2060 Rev. A] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. TU106 [GeForce RTX 2060 Rev. A]
        Flags: bus master, fast devsel, latency 0, IRQ 71
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions