Skip to content

Conversation

@cswartzvi
Copy link
Contributor

Added a new modifier called unpack_fields which allows for the extraction of fields from a tuple output (a cross between extract_columns and extract_fields).

Changes

  • Added unpack_fields decorator to hamilton/function_modifiers/__init__.py and implemented its logic in hamilton/function_modifiers/expanders.py. This decorator enables the extraction of fields from a tuple output, expanding a single function into multiple nodes.

How I tested this

  • Added multiple test cases in tests/function_modifiers/test_expanders.py to validate the functionality of the unpack_fields decorator, including tests for valid and invalid type annotations and different tuple configurations.

Notes

  • Example usage with an fixed size tuple:

      @unpack_fields("X_train" "X_validation", "X_test")
      def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
          """Randomly split data into train, validation, test"""
          X_train, X_validation, X_test = random_split(X)
          return X_train, X_validation, X_test
  • Example usage with a subset of fixed length tuples:

      @unpack_fields("X_train")
      def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
          """Randomly split data into train, validation, test"""
          X_train, X_validation, X_test = random_split(X)
          return X_train, X_validation, X_test
  • Example usage with an indeterminate length tuple:

      @unpack_fields("X_train" "X_validation", "X_test")
      def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, ...]:
          """Randomly split data into train, validation, test"""
          X_train, X_validation, X_test = random_split(X)
          return X_train, X_validation, X_test

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 46608a6 in 2 minutes and 21 seconds

More details
  • Looked at 351 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 8 drafted comments based on config settings.
1. hamilton/function_modifiers/expanders.py:952
  • Draft comment:
    Consider using an explicit check (e.g. if field_types is None:) here instead of a generic falsy check to clarify that a valid empty list is not expected.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50%
    Looking at _process_unpack_fields(), it explicitly returns None if the base_type is not a tuple. The function never returns an empty list. So while an explicit None check would be more precise, the current falsy check works correctly and is idiomatic Python. The suggestion doesn't fix a real issue.
    The comment makes a valid point about being explicit. In some cases, using is None can prevent subtle bugs when falsy values are valid. However, in this specific case...
    Looking at the implementation, None is the only falsy value that can be returned by _process_unpack_fields(). An empty list is never a valid return value, so the current check is appropriate.
    The comment should be deleted. The current code is correct and clear enough - the function explicitly returns None for invalid cases and the falsy check is a standard Python idiom that works correctly here.
2. hamilton/function_modifiers/expanders.py:994
  • Draft comment:
    The extractor functions (both async and sync) are defined inline. For clarity and maintainability, consider extracting them to a helper function to avoid closure-related pitfalls.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 30% vs. threshold = 50%
    The comment raises a valid code quality concern - defining functions in loops with closures can be tricky and error-prone. The code is using default arguments (field_index=idx) to avoid closure issues, but extracting to helper functions could make this pattern more explicit and maintainable. However, the current implementation seems to be working as intended and the closure pitfall is already handled.
    The comment may be overstating the issue since the code already handles closure problems via default arguments. Moving the functions could add complexity without clear benefit.
    While the suggestion is technically valid, the current implementation is already handling closure issues appropriately and the code is reasonably clear as is.
    The comment should be deleted since it suggests a refactor that isn't clearly beneficial given the current working implementation.
3. hamilton/function_modifiers/expanders.py:890
  • Draft comment:
    When handling indeterminate length tuples (with ellipsis), the code creates a new tuple by repeating the first type. Ensure that this behavior is well-documented (e.g., in Sphinx docs) so users understand that unpacked types will be repeated based on the number of fields provided.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
    The comment is suggesting adding documentation about an implementation detail. While the behavior is important, the comment is asking for documentation rather than pointing out a code issue that needs to be fixed. The rules state "Do NOT make purely informative comments" and "Do NOT comment unless there is clearly a code change required". Documentation suggestions should go through other channels.
    The behavior being described is non-obvious and could surprise users - having it documented would help prevent confusion. The comment is trying to improve code clarity.
    While documentation would be helpful, the review rules explicitly state not to make purely informative comments or documentation requests. These should be handled through other means like documentation tickets.
    Delete the comment since it is purely requesting documentation and doesn't require any code changes, which violates the review rules.
4. tests/function_modifiers/test_expanders.py:528
  • Draft comment:
    Typo: The test function name 'test_upack_fields_valid_indeterminate_tuple' should be 'test_unpack_fields_valid_indeterminate_tuple' for consistency.
  • Reason this comment was not posted:
    Marked as duplicate.
5. docs/:0
  • Draft comment:
    Please add Sphinx documentation for the new unpack_fields modifier, including usage examples, accepted return types (explicit and indeterminate tuples), and error cases.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
6. hamilton/function_modifiers/expanders.py:1108
  • Draft comment:
    Typo: In the comment on line 1108, 'maeks' should be corrected to 'makes' for clarity.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
7. tests/function_modifiers/test_expanders.py:28
  • Draft comment:
    Typographical error: The word 'non_existant' appears in the parameters. It should be spelled 'non_existent'.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
8. tests/function_modifiers/test_expanders.py:527
  • Draft comment:
    Typographical inconsistency: The test function name 'test_upack_fields_valid_indeterminate_tuple' seems to contain a typo ('upack' instead of 'unpack'). Please rename it to 'test_unpack_fields_valid_indeterminate_tuple' for consistency with the rest of the code.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_cm8UzLgA8Mp8aXYO


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@cswartzvi
Copy link
Contributor Author

Geez - apparently I have a problem remembering to include backward compatible type hints 🙃. Regardless, this is ready to be looked at!

Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Thank you! I'm really looking forward to shipping this. Some small comments.

@cswartzvi
Copy link
Contributor Author

cswartzvi commented Apr 8, 2025

Wooh boy, TIL that improper use of an ellipsis in typing.Tuple (ex: Tuple[..., int], Tuple[int, int, ...], etc.) is an error prior to Python 3.11, even though it's perfectly fine in tuple for python 3.9+ (ex: tuple[..., int], tuple[int, int, ...], etc.) 🤷🏻‍♂️🤷🏻‍♂️. I have greatly expanded tests of tuple vs typing.Tuple in both valid and invalid cases using a little trick with eval. Note that this was the only way I could come up with where earlier python versions didn't complain about indexing tuple. Please let me know if it's too ... crazy.

@cswartzvi
Copy link
Contributor Author

Just wanted to mention one other thing ... in slack @zilto mentioned that

' .. with tuple[str, ...] the @unpack_fields should be authoritative for the number of expected tuple values ...'.

Currently, @unpack_fields is extracting the first $n$ fields of an $m$ length tuple with only a check for $0 \leq n < m$ in the local extractor function. I could make the (sync/async) tuple_generator check that the requested fields match the ouput, but I wanted to get everyone's take first.

Sorry @zilto, I didn't see your response until after I started pushing commits!

@zilto
Copy link
Contributor

zilto commented Apr 9, 2025

@cswartzvi Great job on the PR! It makes me very happy to see docs pages included 😄

Regarding tuple length, I see the merit of your approach and implementation. For example, in this snippet:

  @unpack_fields("X_train")
  def dataset_splits(X: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
      """Randomly split data into train, validation, test"""
      X_train, X_validation, X_test = random_split(X)
      return X_train, X_validation, X_test

It seems reasonable to allow @unpack_fields("X_train") to only unpack X_train, the first element of the tuple. This matches how @extract_fields can selectively extract keys of a dict. It avoids creating unnecessary nodes and the node dataset_splits allows to get the full tuple.

This was referenced May 9, 2025
Copy link
Contributor

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@skrawcz skrawcz merged commit ebfddc1 into apache:main Jun 23, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants