Skip to content

Fixed LibDevice compilation on compute_100 and later.#1360

Merged
m4rs-mt merged 1 commit intom4rs-mt:branch/v1.5.xfrom
MoFtZ:bug/libdevice-compute100
Jul 17, 2025
Merged

Fixed LibDevice compilation on compute_100 and later.#1360
m4rs-mt merged 1 commit intom4rs-mt:branch/v1.5.xfrom
MoFtZ:bug/libdevice-compute100

Conversation

@MoFtZ
Copy link
Copy Markdown
Collaborator

@MoFtZ MoFtZ commented Jul 16, 2025

A member of the community reported issues using LibDevice on a newer RTX 5090 device.

After some investigation, the NVVM compiler returned NVVM_ERROR_COMPILATION on the NVVM IR:

!nvvmir.version = !{{!0}}
!0 = !{{i32 2, i32 0}}

target triple = \"nvptx64-unknown-cuda\"
target datalayout = \"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64\"

declare float @__nv_cosf(float %x)

define float @__ilgpu__nv_cosf(float %x)
{
entry:
    %call = call float @__nv_cosf(float %x)
    ret float %call
}

This could be reproduced by setting the compiler argument -arch=compute_100, or newer.
https://developer.nvidia.com/cuda-gpus#:~:text=Table_title:%20CUDA%20GPU%20Compute%20Capability%20Table_content:%20header:,NVIDIA%20A100%20NVIDIA%20A30%20%7C%20GeForce/RTX:%20%7C

After investigating further, the solution is to place the target directives first.

I have also updated the target datalayout directive with the i128 registers, as per the NVVM IR specification:
https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#data-layout

@MoFtZ MoFtZ added this to the v1.5.4 milestone Jul 16, 2025
@m4rs-mt m4rs-mt added the bug label Jul 16, 2025
@m4rs-mt m4rs-mt merged commit 27f05fa into m4rs-mt:branch/v1.5.x Jul 17, 2025
31 checks passed
@MoFtZ MoFtZ modified the milestones: v1.5.4, v1.6.0 Dec 15, 2025
@MoFtZ MoFtZ deleted the bug/libdevice-compute100 branch March 15, 2026 04:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants