Skip to content

Initialize tensors with zeros #3660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pbielak
Copy link

@pbielak pbielak commented Jun 27, 2025

What does this PR do?

When initializing tensors with torch.empty the values are random, often large and near to the dtype range limits. The initialize_tensors function creates the tensors on CPU. When moving them to the destination device (using the send_to_device function), some devices will throw an error if the dtype is not supported and implicitly downcasted, e.g.: on Gaudi in lazy mode, the int64 dtype is not enabled by default, and if we create a int64 empty tensor, move it to "hpu", we will often get the following error message:

RuntimeError: Error when trying to cast Long to Int, Input values range [9223372036854775807, 9223372036854775807] exceeds Int range [-2147483648, 2147483647]

This commit changes the default initialization value of the tensors created using initialize_tensors to zero by replacing torch.empty with torch.zeros.

When initializing tensors with `torch.empty` the values are random,
often large and near to the dtype range limits. The `initialize_tensors`
function creates the tensors on CPU. When moving them to the destination
device (using the `send_to_device` function), some devices will throw
an error if the dtype is not supported and implicitly downcasted, e.g.:
on Gaudi in lazy mode, the int64 dtype is not enabled by default, and
if we create a int64 empty tensor, move it to "hpu", we will often get
the following error message:

`RuntimeError: Error when trying to cast Long to Int, Input values range
[9223372036854775807, 9223372036854775807] exceeds Int range
[-2147483648, 2147483647]`

This commit changes the default initialization value of the tensors
created using `initialize_tensors` to zero by replacing `torch.empty`
with `torch.zeros`.
@pbielak pbielak marked this pull request as ready for review July 1, 2025 09:44
@pbielak
Copy link
Author

pbielak commented Jul 4, 2025

Not sure whom to tag here, but maybe @IlyasMoutawwakil can help/review this one

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Jul 4, 2025

yeah this not an initialisation problem, this is hpu specific due to the way it supports (and at the same time doesn't support) int64.
torch.empty is faster than torch.zeros so it doesn't make sense to penalise all accelerators with torch.zeros.
why not fix torch.empty in synapseAI/torch+hpu ?
or else something like this that's only applied in the case of hpu:

original_torch_empty = torch.empty

def patched_torch_empty(*args, **kwargs):
    tensor = original_torch_empty(*args, **kwargs)
    tensor.zero_()
    return tensor

torch.empty = patched_torch_empty

@SunMarc
Copy link
Member

SunMarc commented Jul 15, 2025

Ilyas solution seems to be better, can you update the PR @pbielak ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants