Skip to content

[Bug]: Offloading with NF4 weights #827

@dxqb

Description

@dxqb

What happened?

enumerating sample paths: 0%| | 0/4 [00:00<?, ?it/sError misaligned address at line 696 in file /src/csrc/ops.cu

This is with CUDA_LAUNCH_BLOCKING=1, otherwise a similar error is reported somewhere during the backward pass.

It can be triggered by using NF4 weights and an layer offload fraction > 0.

What did you expect would happen?

Is offloading supposed to work with NF4? If no, an error message.

Relevant log output

Generate and upload debug_report.log

absl-py==2.2.2
accelerate==1.3.0
aiodns==3.3.0
aiohappyeyeballs==2.6.1
aiohttp==3.11.18
aiohttp-retry==2.9.1
aiosignal==1.3.2
annotated-types==0.7.0
antlr4-python3-runtime==4.9.3
anyio==4.9.0
async-timeout==5.0.1
attrs==25.3.0
av==14.1.0
backoff==2.2.1
bcrypt==4.3.0
bitsandbytes==0.45.2
boto3==1.38.11
botocore==1.38.11
Brotli==1.1.0
certifi==2025.4.26
cffi==1.17.1
charset-normalizer==3.4.2
click==8.1.8
cloudpickle==3.1.1
colorama==0.4.6
coloredlogs==15.0.1
contourpy==1.3.2
cryptography==44.0.3
customtkinter==5.2.2
cycler==0.12.1
dadaptation==3.2
darkdetect==0.8.0
decorator==5.2.1
Deprecated==1.2.18
-e git+https://github.com/huggingface/diffusers.git@1d37f4205531ab44b34d54726505839c3f7048cd#egg=diffusers
dnspython==2.7.0
email_validator==2.2.0
exceptiongroup==1.2.2
fabric==3.2.2
fastapi==0.115.12
fastapi-cli==0.0.7
filelock==3.18.0
flatbuffers==25.2.10
fonttools==4.57.0
frozenlist==1.6.0
fsspec==2025.3.2
ftfy==6.3.1
grpcio==1.71.0
h11==0.16.0
httpcore==1.0.9
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.28.1
humanfriendly==10.0
idna==3.10
imagesize==1.4.1
importlib_metadata==8.7.0
inquirerpy==0.3.4
invisible-watermark==0.2.0
invoke==2.2.0
itsdangerous==2.2.0
Jinja2==3.1.6
jmespath==1.0.1
kiwisolver==1.4.8
lightning-utilities==0.14.3
lion-pytorch==0.2.3
Markdown==3.8
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.10.0
mdurl==0.1.2
-e git+https://github.com/Nerogar/mgds.git@2c67a5a567ac47058f7dcd36262f0343132073a6#egg=mgds
mpmath==1.3.0
multidict==6.4.3
networkx==3.4.2
numpy==2.2.2
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-cusparselt-cu12==0.6.2
nvidia-ml-py==12.575.51
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
omegaconf==2.3.0
-e git+https://github.com/Open-Model-Initiative/OMI-Model-Standards.git@e0f1291c17010cb86481c0f3521c33d92db07b47#egg=omi_model_standards
onnxruntime-gpu==1.20.1
open_clip_torch==2.30.0
opencv-python==4.11.0.86
orjson==3.10.18
packaging==25.0
paramiko==3.5.1
pfzy==0.3.4
pillow==11.1.0
platformdirs==4.3.8
pooch==1.8.2
prettytable==3.16.0
prodigy-plus-schedule-free==1.9.1
prodigyopt==1.1.2
prompt_toolkit==3.0.51
propcache==0.3.1
protobuf==6.30.2
psutil==6.1.1
py-cpuinfo==9.0.0
pycares==4.8.0
pycparser==2.22
pydantic==2.11.4
pydantic-extra-types==2.10.4
pydantic-settings==2.9.1
pydantic_core==2.33.2
Pygments==2.19.1
PyNaCl==1.5.0
pyparsing==3.2.3
python-dateutil==2.9.0.post0
python-dotenv==1.1.0
python-multipart==0.0.20
pytorch-lightning==2.5.0.post0
pytorch_optimizer==3.4.0
PyWavelets==1.8.0
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
rich==14.0.0
rich-toolkit==0.14.5
runpod==1.7.7
s3transfer==0.12.0
safetensors==0.5.2
scalene==1.5.51
scenedetect==0.6.6
schedulefree==1.4
scipy==1.15.1
sentencepiece==0.2.0
shellingham==1.5.4
six==1.17.0
sniffio==1.3.1
starlette==0.46.2
sympy==1.13.1
tensorboard==2.18.0
tensorboard-data-server==0.7.2
timm==1.0.15
tokenizers==0.21.1
tomli==2.2.1
tomlkit==0.13.2
torch==2.6.0+cu124
torchmetrics==1.7.1
torchvision==0.21.0+cu124
tqdm==4.67.1
tqdm-loggable==0.2
transformers==4.48.3
triton==3.2.0
typer==0.15.3
typing-inspection==0.4.0
typing_extensions==4.13.2
ujson==5.10.0
urllib3==2.4.0
uvicorn==0.34.2
uvloop==0.21.0
watchdog==6.0.0
watchfiles==1.0.5
wcwidth==0.2.13
websockets==15.0.1
Werkzeug==3.1.3
wrapt==1.17.2
yarl==1.20.0
yt-dlp==2025.4.30
zipp==3.21.0

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions