-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Resolve serialization for numpy bool 1.x and 2.x compatibility #53690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve serialization for numpy bool 1.x and 2.x compatibility #53690
Conversation
See the failing test case. |
d5231d4 to
5735f6b
Compare
Just found out the rest of the code will not be executed by using pytest.xfail in this way. Will think about another way to do it. |
|
Are there any similar cases we need to account for with float or int between numpy 1.x and 2.x? |
I have a test DAG #52753 (comment) which check for the integer types and the floating point types listed in the serializers. I think it is good with those types. However, as mentioned here #53690 (comment). The |
|
I still need some time to get back to this, due to limited bandwidth this week. Will likely follow up again on this with updates early next week. |
|
I linked #54097 issue that is very much related (about float serialization) - I think the PR should attempt to address both as they are very related. |
584dc0b to
64d68f7
Compare
|
Hi @potiuk , @bolkedebruin , @uranusjr , and @ashb , thank you very much for your time and patient in reviewing this PR. I've pushed some small updates to see if it could resolve the concerns in our discussion. Updates to
|
| def test_serializers_importable_and_str(self): | |
| """Test if all distributed serializers are lazy loading and can be imported""" | |
| import airflow.serialization.serializers | |
| for _, name, _ in iter_namespace(airflow.serialization.serializers): | |
| if name == "airflow.serialization.serializers.iceberg": | |
| try: | |
| import pyiceberg # noqa: F401 | |
| except ImportError: | |
| continue | |
| if name == "airflow.serialization.serializers.deltalake": | |
| try: | |
| import deltalake # noqa: F401 | |
| except ImportError: | |
| continue | |
| mod = import_module(name) | |
| for s in getattr(mod, "serializers", list()): | |
| if not isinstance(s, str): | |
| raise TypeError(f"{s} is not of type str. This is required for lazy loading") | |
| try: | |
| import_string(s) | |
| except ImportError: | |
| raise AttributeError(f"{s} cannot be imported (located in {name})") |
The above code is the current implementation in main branch. It is a single test case with some "special handles" for either compatibility considerations or optional dependencies. Basically, I think the test attempts to simulate the process of registering serializers and check if each serializer is importable.
I modified the test case by making it "parameterized" and the parameters are generated by the function generate_serializers_importable_tests. Therefore, for each serializer, a test case will be created, and we can have a control on each individual test from the test generation process, such as skipping some test case due to compatibility issue.
I can then run the test in breeze environment with different version of NumPy installed.
with NumPy 1.x installed. The numpy.bool test will be skipped because it is deprecated in NumPy 1.x.

with NumPy 2.x installed. Both numpy.bool and numpy.bool_ test cases will be executed because both are valid.

When a new serializer is added, this function can also capture those automatically to test for import, so it is less likely to miss any serializer. I also try to improve the documentation to see if it could help to provide more clarity. But, I am open for further feedback you have regarding this implementation.
Issue related to #54097
Looks like the issue is resolved through #54139. @potiuk please let me know if there is anything related to the issue that is still needed to be resolved here.
Thanks.
64d68f7 to
620eda9
Compare
Thanks! Looking into the CI failure, it doesn’t look like related to the changes here. Will have another look into it later today. |
620eda9 to
231c377
Compare
231c377 to
be136cd
Compare
get update from main
…ility between v1 and v2
… individually to numpy bool
be136cd to
5e68223
Compare
|
checking |
|
Ok I like it. The change in code is now small, tests cases are updated. To an extend I wonder if we are over engineering this (how many people still use numpy 1.X?) |
Precisely those people who will complain loudest :) |
|
Faster than me in merging. Geez @potiuk :-P |
Backport failed to create: v3-0-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker 5babf15 v3-0-testThis should apply the commit to the v3-0-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
Always watching 👀 |
…ility (apache#53690) * fix serialization for np.bool in numpy 2.x get update from main * update serializers importable test case to handle numpy bool compatibility between v1 and v2 * use pytest xfail to capture np.bool import error in numpy version less than 2 * parameterize the serializers importable test, so xfail can be applied individually to numpy bool * add textwrap dedent for message * add np.float32 to serializer and improve test description (cherry picked from commit 5babf15) Co-authored-by: Kevin Yang <85313829+sjyangkevin@users.noreply.github.com>
|
Backport attempt in #54881 |
|
Thanks and very appreciate your time in review. I would like to learn more about the back porting. Is the reason we are doing this because we want to make the changes available in 3.0.x since the main branch is targeting 3.1.0? In this case, I think we need to backport the changes in another PR that updates the serde.py and the function signature from classname to class. Probably it is the main reason for the failed test in backport. Thanks. |
Yeah. I saw it already and there were conflicts, so I gave up on backporting. It's not worth it if backporting is somewhat complicated and issue is not too serious (and it has certain risks when backporting requires some manual changes). So... no backporting this time |
Got it, thanks for the feedback! |
…e#53690) * fix serialization for np.bool in numpy 2.x get update from main * update serializers importable test case to handle numpy bool compatibility between v1 and v2 * use pytest xfail to capture np.bool import error in numpy version less than 2 * parameterize the serializers importable test, so xfail can be applied individually to numpy bool * add textwrap dedent for message * add np.float32 to serializer and improve test description
…e#53690) * fix serialization for np.bool in numpy 2.x get update from main * update serializers importable test case to handle numpy bool compatibility between v1 and v2 * use pytest xfail to capture np.bool import error in numpy version less than 2 * parameterize the serializers importable test, so xfail can be applied individually to numpy bool * add textwrap dedent for message * add np.float32 to serializer and improve test description



Create this PR to help review the solution for #52753
Closes: #52753
Unit Test
the testing ran in the breeze environment with numpy
1.26.4installed. The following test case failed because it attempted to importnumpy.bool.np.boolwas a deprecated alias for the built-in Pythonbooltype, introduced in NumPy version1.20.0.np.bool_is the Numpy scalar. If we keep both"numpy.bool"and"numpy.bool_"in the list of serializers and the list of deserializers, we might need to update this test case to run based on the installed numpy version. Or, update the code to check which numpy version we have and import and check the right bool (#52753 (comment)).then, I created a init file
files/airflow-breeze-config/init.shto runuv pip install --upgrade pandas. This upgrade both pandas and numpy to the following versions. We may not only update one package, because the pandas depends on the numpy package and can result in the error mentioned #52753 (comment). The test case that failed when numpy1.26.4is installed now pass becausenumpy.boolis the NumPy scalar in2.3.1.If we only upgrade numpy to 2.x and didn't update pandas, the following error is raised. (
uv pip install --upgrade numpy)DAG Test (Run-time Behavior)
Numpy version 1.26.4

Numpy version 2.3.1

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.