Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure AD Auth fails on ARM64 #2475

Closed
george-zubrienko opened this issue May 2, 2024 · 28 comments
Closed

Azure AD Auth fails on ARM64 #2475

george-zubrienko opened this issue May 2, 2024 · 28 comments
Labels
bug Something isn't working storage/azure Azure Blog storage related
Milestone

Comments

@george-zubrienko
Copy link
Contributor

Environment

Amazon Linux 2023, ARM64 arch host, container based on python3.11-slim-bookworm

Delta-rs version:

0.17.2

Binding:

python

Environment:

  • Cloud provider: AWS
  • OS: AL2023
  • Other: Python3.11

Bug

What happened:

When trying to load the table using AZURE_CLIENT_ID etc credentials, getting this:

OSError: Generic MicrosoftAzure error: Error performing token request: Error after 10 retries in 2.050817107s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://login.microsoftonline.com/.../oauth2/v2.0/token): error trying to connect: failed to get random bytes

Stack trace:

image

What you expected to happen:

Table reads as before on 0.8.1 version on the same host/container

How to reproduce it:

Run table read on ARM64/AL2023 vm with Azure auth against az://... table path

More details:
N/A

@george-zubrienko george-zubrienko added the bug Something isn't working label May 2, 2024
@ion-elgreco
Copy link
Collaborator

@george-zubrienko are you able to test the connection from non-arm environments?

@george-zubrienko
Copy link
Contributor Author

@ion-elgreco yes if I change machine type to amd64 it works fine. Fun part, people on Mac M2's do not have this issue

@george-zubrienko
Copy link
Contributor Author

george-zubrienko commented May 2, 2024

I'm building a new image rn with 0.17.3, fyi this was the wheel used by ARM64 build: deltalake-0.17.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Same result with 0.17.3. I assume the error is coming from Rust code, not python as it is reported as OSError

@ion-elgreco
Copy link
Collaborator

I'm building a new image rn with 0.17.3, fyi this was the wheel used by ARM64 build: deltalake-0.17.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Same result with 0.17.3. I assume the error is coming from Rust code, not python as it is reported as OSError

Could you do a bisect, to see which release this problem started occurring for you?

@george-zubrienko
Copy link
Contributor Author

I'm building a new image rn with 0.17.3, fyi this was the wheel used by ARM64 build: deltalake-0.17.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Same result with 0.17.3. I assume the error is coming from Rust code, not python as it is reported as OSError

Could you do a bisect, to see which release this problem started occurring for you?

We just upgraded from 0.8.1 to 0.17.* and I got the error. I can try to narrow it down a little. In case this adds anything, we are loading data like this: https://github.com/SneaksAndData/adapta/blob/main/adapta/storage/delta_lake/_functions.py#L39-L90

@george-zubrienko
Copy link
Contributor Author

@ion-elgreco it works up to 0.16.1, 0.16.2 breaks it

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented May 2, 2024

@george-zubrienko hmm strange, nothing implies there is a change that could have caused this in that release.

Maybe its the rust version which it gets compiled with, are you able to compile with some older version and check that

@george-zubrienko
Copy link
Contributor Author

@george-zubrienko hmm strange, nothing implies there is a change that could have caused this in that release.

Maybe its the rust version which it gets compiled with, are you able to compile with some older version and check that

Arent 0.16.2 and 0.16.1 compiled with the same version? I can try to force this package to compile from source rather than use the wheel

@george-zubrienko
Copy link
Contributor Author

george-zubrienko commented May 2, 2024

@ion-elgreco

So, using this compiler version:

# rustc --version
rustc 1.78.0 (9b00956e5 2024-04-29)

Running this:

# pip install --upgrade deltalake==0.17.2 --no-binary :all:
Collecting deltalake==0.17.2
  Using cached deltalake-0.17.2.tar.gz (4.8 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyarrow>=8 in /usr/local/lib/python3.11/site-packages (from deltalake==0.17.2) (16.0.0)
Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.11/site-packages (from deltalake==0.17.2) (0.6)
Requirement already satisfied: numpy>=1.16.6 in /usr/local/lib/python3.11/site-packages (from pyarrow>=8->deltalake==0.17.2) (1.26.4)
Building wheels for collected packages: deltalake
  Building wheel for deltalake (pyproject.toml) ... |yrtydone
  Created wheel for deltalake: filename=deltalake-0.17.2-cp38-abi3-linux_aarch64.whl size=27450351 sha256=1c47967d8c6cd74a7414607321671e8c5fb093b6767035bdffc5562c95014f33
  Stored in directory: /root/.cache/pip/wheels/cf/6f/ad/d4179379730a3e8649079d3050ccba5bdf1ccad191074b0027
Successfully built deltalake
Installing collected packages: deltalake
  Attempting uninstall: deltalake
    Found existing installation: deltalake 0.17.3
    Uninstalling deltalake-0.17.3:
      Successfully uninstalled deltalake-0.17.3
Successfully installed deltalake-0.17.2

Everything works. So it is a problem with the wheel?

Kernel: Linux 6.1.84-99.169.amzn2023.aarch64 #1 SMP Mon Apr 8 19:19:24 UTC 2024 aarch64 GNU/Linux
OS: Debian 12 (bookworm)

@ion-elgreco
Copy link
Collaborator

ion-elgreco commented May 2, 2024

@george-zubrienko ok that's helpful!

I think it's because of the rust compiler.

  • V0.16.1 was compiled with rust 1.76.0
  • v0.16.2 until v0.17.3 is compiled with 1.77.x

You seem to get it working when it's compiled with 1.78.0

@george-zubrienko
Copy link
Contributor Author

That is the case! Any chance we can get compiler bumped for 0.17.4? 🙏

@ion-elgreco
Copy link
Collaborator

That is the case! Any chance we can get compiler bumped for 0.17.4? 🙏

Probably next release. Rust 1.78 got released today and our release just missed that version

@gaurav7261
Copy link

I'm also facing same issue with aws sts, using latest deltalake version

@gaurav7261
Copy link

gaurav7261 commented May 7, 2024

@ion-elgreco tag me here once you release the latest version, right now we are reverting back to amd

@rtyler rtyler added the storage/azure Azure Blog storage related label May 7, 2024
@rtyler rtyler added this to the Rust v0.18 milestone May 7, 2024
@rtyler
Copy link
Member

rtyler commented May 7, 2024

I would be surprised if this was the compiler. It is more likely that a dependency for which we have a loose version range specifier. Either way, I figure the next release should take care of this for ya 😄

@gaurav7261
Copy link

@rtyler we are also facing same with aws as well

@ion-elgreco
Copy link
Collaborator

@george-zubrienko can you check against v0.18.0 please?

@george-zubrienko
Copy link
Contributor Author

@george-zubrienko can you check against v0.18.0 please?

Will check on Monday!

@george-zubrienko
Copy link
Contributor Author

Bit delayed - deploying this as of now, in case I don't get to actually run the validation, will ping tomorrow

@george-zubrienko
Copy link
Contributor Author

Side note: it seems macOS environments now have issues importing datalake:

image

@ion-elgreco
Copy link
Collaborator

Side note: it seems macOS environments now have issues importing datalake:

image

This is already reported, see the issues board. Someone also has a workaround to get it still inetalled

@george-zubrienko
Copy link
Contributor Author

george-zubrienko commented Jun 11, 2024

@ion-elgreco I just tried with deltalake 0.18.0 on ARM machine and I'm getting the same error (library installed from wheel, I can try to build from source tomorrow if you want to confirm it still works in that case)

@ion-elgreco
Copy link
Collaborator

@george-zubrienko you might give 0.18.1 a shot which includes latest object store version, not sure if it's going to have any impact though

@george-zubrienko
Copy link
Contributor Author

@george-zubrienko you might give 0.18.1 a shot which includes latest object store version, not sure if it's going to have any impact though

Will try, but... isn't this strange a bit? I've checked the Mac issue as well and I see people also resolve it with no-binary flag to pip. So two issues at this point that are resolved by simply recompiling the rust library - which for me at least tells that this is not related to code. Maybe something changed in your release process?

@ion-elgreco
Copy link
Collaborator

@george-zubrienko you might give 0.18.1 a shot which includes latest object store version, not sure if it's going to have any impact though

Will try, but... isn't this strange a bit? I've checked the Mac issue as well and I see people also resolve it with no-binary flag to pip. So two issues at this point that are resolved by simply recompiling the rust library - which for me at least tells that this is not related to code. Maybe something changed in your release process?

Yeah the Mac issue got fixed by just bumping the os version of the runners

@george-zubrienko
Copy link
Contributor Author

@ion-elgreco we just tested with 0.18.1 for both Azure and AWS, the error is gone

@george-zubrienko
Copy link
Contributor Author

I believe this can be closed as we rolled 0.18.1 on prod with mostly ARM machines and I do not see any failures :)
Thanks a lot!

@ion-elgreco
Copy link
Collaborator

@george-zubrienko thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working storage/azure Azure Blog storage related
Projects
None yet
Development

No branches or pull requests

4 participants