Skip to content

Conversation

@huleilei
Copy link
Contributor

@huleilei huleilei commented Dec 28, 2025

This PR migrates the implementation of utf8.right in src/daft-functions-utf8/src/right.rs from arrow2 to arrow-rs.

Changes Made

  • Removed #![allow(deprecated, reason = "arrow2 migration")] attribute.
  • Replaced arrow2 array usage with arrow-rs equivalents (e.g., LargeStringArray, Int64Array).
  • Updated implementation to use to_arrow() instead of the deprecated as_arrow2().

Related Issues

Part of the migration effort tracked in #5741.

Verification

  • Ran cargo test -p daft-functions-utf8 locally and all tests passed.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 28, 2025

Greptile Summary

This PR successfully migrates the utf8.right function from arrow2 to arrow-rs. The migration replaces deprecated arrow2 APIs with their arrow-rs equivalents while maintaining the same functional behavior.

Key changes:

  • Removed the deprecation marker #![allow(deprecated, reason = "arrow2 migration")]
  • Replaced AsArrow trait with FromArrow
  • Updated imports to use arrow::array::{LargeStringArray, AsArray} instead of daft_arrow::array::Utf8Array<i64>
  • Converted array operations from as_arrow2() to to_arrow() followed by casting to Int64Type
  • Changed result construction from Utf8Array::from((name, Box::new(arrow_result))) to Utf8Array::from_arrow(Field::new(name, DataType::Utf8), Arc::new(result))
  • Simplified the multi-value branch by removing intermediate variable and directly collecting into LargeStringArray

The implementation correctly handles null values through the existing parse_inputs check, making the double unwrap on line 116 safe.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The migration is straightforward and follows standard patterns for converting from arrow2 to arrow-rs. The implementation maintains the same logic flow, properly handles null values, and the tests have been verified locally. No breaking changes to the API or behavior.
  • No files require special attention

Important Files Changed

Filename Overview
src/daft-functions-utf8/src/right.rs Migrated utf8.right from arrow2 to arrow-rs, replacing deprecated APIs with arrow-rs equivalents (LargeStringArray, Int64Type, to_arrow, from_arrow)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Right Function
    participant parse_inputs
    participant nchars Array
    participant Arrow-rs
    participant Result Builder

    Caller->>Right Function: right_impl(arr, nchars)
    Right Function->>parse_inputs: Check array lengths & nulls
    parse_inputs-->>Right Function: (is_full_null, expected_size)
    
    alt is_full_null
        Right Function-->>Caller: Return full_null array
    else expected_size == 0
        Right Function-->>Caller: Return empty array
    else Normal processing
        Right Function->>nchars Array: to_arrow()
        nchars Array-->>Right Function: Arrow array
        Right Function->>Arrow-rs: cast to Int64Type
        Arrow-rs-->>Right Function: Int64 primitive array
        
        alt nchars.len() == 1
            Right Function->>Arrow-rs: Get single value (double unwrap)
            Arrow-rs-->>Right Function: Single i64 value
            Right Function->>Result Builder: Collect with fixed nchar
        else nchars.len() > 1
            Right Function->>Result Builder: Zip and collect with variable nchar
        end
        
        Result Builder-->>Right Function: LargeStringArray
        Right Function->>Right Function: from_arrow conversion
        Right Function-->>Caller: Utf8Array result
    end
Loading

@codecov
Copy link

codecov bot commented Dec 28, 2025

Codecov Report

❌ Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.47%. Comparing base (535941d) to head (6845af4).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-functions-utf8/src/right.rs 80.95% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5889      +/-   ##
==========================================
- Coverage   72.48%   72.47%   -0.01%     
==========================================
  Files         966      966              
  Lines      125838   125830       -8     
==========================================
- Hits        91211    91194      -17     
- Misses      34627    34636       +9     
Files with missing lines Coverage Δ
src/daft-functions-utf8/src/right.rs 75.29% <80.95%> (+0.29%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@huleilei huleilei marked this pull request as draft December 28, 2025 04:53
Copy link
Member

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks @huleilei.

@universalmind303 universalmind303 changed the title Refactor: Migrate utf8.right to use arrow-rs instead of arrow2 refactor(arrow2): Migrate utf8.right to use arrow-rs instead of arrow2 Dec 29, 2025
@huleilei
Copy link
Contributor Author

Thanks. Currently, the unit test cases are in use, and there is already a PR #5878 being worked on to fix the issue. Once this PR is merged, I will proceed with a rebase and conduct CI testing.

@universalmind303
Copy link
Member

@huleilei it looks like the fmt precommit check is failing. Could you please reformat the code. We typically use precommit: pre-commit run --all --all-files to check prior to committing.

Once the fmt check is passing, i think we can merge this!

@huleilei huleilei force-pushed the migrate-utf8-right-arrow2 branch from 969993d to 6845af4 Compare January 6, 2026 03:20
@huleilei huleilei marked this pull request as ready for review January 6, 2026 04:18
@huleilei
Copy link
Contributor Author

huleilei commented Jan 6, 2026

@universalmind303 I've made the revisions. Please review them when you have time. Thank you

@universalmind303 universalmind303 merged commit dc57405 into Eventual-Inc:main Jan 6, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants