[Data] Deflake test_from_pandas_refs_e2e by removing ordering assumptions #60595

Partth101 · 2026-01-30T00:44:53Z

This PR addresses intermittent test failures in the execution optimizer integration suite. The test test_from_pandas_refs_e2e previously assumed a deterministic row ordering when reading from multiple pandas references, which is not guaranteed by the Ray Data interface in a distributed execution environment.

Changes
Replaced manual tuple-list comparisons with the rows_same utility from ray.data._internal.util.

Refactored assertions to use ds.to_pandas() for robust, order-agnostic data validation.

Applied the fix to all three affected assertions within the test function.

Fixes 60553

gemini-code-assist

Code Review

This pull request effectively addresses the flakiness in test_from_pandas_refs_e2e by removing the assumption of deterministic row ordering. The switch to using the rows_same utility for order-agnostic comparison is a solid approach.

I've added one suggestion to cache the concatenated DataFrame to improve readability and avoid redundant computation, which mirrors the pattern in the original test code.

Overall, the changes are correct and improve test stability.

gemini-code-assist · 2026-01-30T00:46:07Z

python/ray/data/tests/test_execution_optimizer_integrations.py

        df2 = pd.DataFrame({"one": [4, 5, 6], "two": ["e", "f", "g"]})

        ds = ray.data.from_pandas_refs([ray.put(df1), ray.put(df2)])
-        values = [(r["one"], r["two"]) for r in ds.take(6)]
-        rows = [(r.one, r.two) for _, r in pd.concat([df1, df2]).iterrows()]
-        assert values == rows
+        assert rows_same(ds.to_pandas(), pd.concat([df1, df2]))
        # Check that metadata fetch is included in stats.
        assert "FromPandas" in ds.stats()
        assert ds._plan._logical_plan.dag.name == "FromPandas"

        # Test chaining multiple operations
        ds2 = ds.map_batches(lambda x: x)
-        values = [(r["one"], r["two"]) for r in ds2.take(6)]
-        assert values == rows
+        assert rows_same(ds2.to_pandas(), pd.concat([df1, df2]))


To improve readability and avoid re-computing the concatenated DataFrame, you can store pd.concat([df1, df2]) in a variable and reuse it for both assertions. This also aligns with the pattern in the original test code where the expected rows were computed once.

Suggested change

df2 = pd.DataFrame({"one": [4, 5, 6], "two": ["e", "f", "g"]})

ds = ray.data.from_pandas_refs([ray.put(df1), ray.put(df2)])

values = [(r["one"], r["two"]) for r in ds.take(6)]

rows = [(r.one, r.two) for _, r in pd.concat([df1, df2]).iterrows()]

assert values == rows

assert rows_same(ds.to_pandas(), pd.concat([df1, df2]))

# Check that metadata fetch is included in stats.

assert "FromPandas" in ds.stats()

assert ds._plan._logical_plan.dag.name == "FromPandas"

# Test chaining multiple operations

ds2 = ds.map_batches(lambda x: x)

values = [(r["one"], r["two"]) for r in ds2.take(6)]

assert values == rows

assert rows_same(ds2.to_pandas(), pd.concat([df1, df2]))

df2 = pd.DataFrame({"one": [4, 5, 6], "two": ["e", "f", "g"]})

expected_df = pd.concat([df1, df2])

ds = ray.data.from_pandas_refs([ray.put(df1), ray.put(df2)])

assert rows_same(ds.to_pandas(), expected_df)

# Check that metadata fetch is included in stats.

assert "FromPandas" in ds.stats()

assert ds._plan._logical_plan.dag.name == "FromPandas"

# Test chaining multiple operations

ds2 = ds.map_batches(lambda x: x)

assert rows_same(ds2.to_pandas(), expected_df)

I have implemented this suggestion

…sumptions Replaced list-based equality assertions with the rows_same utility in test_from_pandas_refs_e2e to handle non-deterministic row ordering in distributed execution. Refined the implementation based on reviewer feedback to cache the expected DataFrame in an expected_df variable, improving readability and avoiding redundant computation. This aligns the test with the project's standard testing patterns. Signed-off-by: Parth Ghayal <parthmghayal@gmail.com>

Partth101 · 2026-01-30T04:06:07Z

@bveeramani Can I get a review on this please?

Related Issue: 60553

bveeramani

Nice

Partth101 requested a review from a team as a code owner January 30, 2026 00:44

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

Partth101 force-pushed the fix/deflake-pandas-refs-test branch 2 times, most recently from 81593de to d79cd36 Compare January 30, 2026 00:57

Partth101 mentioned this pull request Jan 30, 2026

[Data] Deflake test_from_pandas_refs_e2e by removing ordering assumptions #60553

Open

ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Jan 30, 2026

Partth101 force-pushed the fix/deflake-pandas-refs-test branch from d79cd36 to 3858d57 Compare January 30, 2026 01:39

Merge branch 'master' into fix/deflake-pandas-refs-test

6acaac7

bveeramani approved these changes Jan 30, 2026

View reviewed changes

bveeramani enabled auto-merge (squash) January 30, 2026 19:01

github-actions bot added the go add ONLY when ready to merge, run all tests label Jan 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Data] Deflake test_from_pandas_refs_e2e by removing ordering assumptions #60595

[Data] Deflake test_from_pandas_refs_e2e by removing ordering assumptions #60595

Partth101 commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

Partth101 Jan 30, 2026

Uh oh!

Partth101 commented Jan 30, 2026

Uh oh!

bveeramani left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Data] Deflake test_from_pandas_refs_e2e by removing ordering assumptions #60595

Are you sure you want to change the base?

[Data] Deflake test_from_pandas_refs_e2e by removing ordering assumptions #60595

Conversation

Partth101 commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Partth101 Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Partth101 commented Jan 30, 2026

Uh oh!

bveeramani left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants