Description
🐛 Describe the bug
When building on native Windows, I encountered an undefined symbol error.
lld-link : error : undefined symbol: __declspec(dllimport) class std::tuple<class at::Tensor &, class at::Tensor &> __cdecl torch::executor::native::choose_qparams_tensor_out(class at::Tensor const &, __int64, __int64, double, enum c10::ScalarType, class at::Tensor &, class at::Tensor &)
This issue can be worked around with the following patch.
diff --git a/kernels/quantized/cpu/op_choose_qparams.cpp b/kernels/quantized/cpu/op_choose_qparams.cpp
index 47f261407..9bda17192 100644
--- a/kernels/quantized/cpu/op_choose_qparams.cpp
+++ b/kernels/quantized/cpu/op_choose_qparams.cpp
@@ -149,7 +149,7 @@ void choose_qparams(
}
} // namespace
-std::tuple<Tensor, Tensor> choose_qparams_tensor_out(
+std::tuple<Tensor&, Tensor&> choose_qparams_tensor_out(
const Tensor& input,
int64_t quant_min,
int64_t quant_max,
@@ -164,7 +164,7 @@ std::tuple<Tensor, Tensor> choose_qparams_tensor_out(
return {scale_out, zero_point_out};
}
-::std::tuple<Tensor, Tensor> choose_qparams_tensor_out(
+::std::tuple<Tensor&, Tensor&> choose_qparams_tensor_out(
RuntimeContext& context,
const Tensor& input,
int64_t quant_min,
Versions
Collecting environment information...
PyTorch version: 2.5.0.dev20240716+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Pro
GCC version: Could not collect
Clang version: 18.1.8
CMake version: version 3.30.2
Libc version: N/A
Python version: 3.10.0 | packaged by conda-forge | (default, Nov 10 2021, 13:20:59) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.22631-SP0
Is CUDA available: False
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Ti
Nvidia driver version: 551.76
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=3501
DeviceID=CPU0
Family=107
L2CacheSize=16384
L2CacheSpeed=
Manufacturer=AuthenticAMD
MaxClockSpeed=3501
Name=AMD Ryzen Threadripper PRO 3975WX 32-Cores
ProcessorType=3
Revision=12544
Versions of relevant libraries:
[pip3] executorch==0.4.0a0+a70d070
[pip3] numpy==1.21.3
[pip3] torch==2.5.0.dev20240716+cpu
[pip3] torchaudio==2.4.0.dev20240716+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240716+cpu
[conda] executorch 0.4.0a0+a70d070 pypi_0 pypi
[conda] numpy 1.21.3 pypi_0 pypi
[conda] torch 2.5.0.dev20240716+cpu pypi_0 pypi
[conda] torchaudio 2.4.0.dev20240716+cpu pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.20.0.dev20240716+cpu pypi_0 pypi