Skip to content

Conversation

@GlobalStar117
Copy link

@GlobalStar117 GlobalStar117 commented Jan 20, 2026

Summary

This PR fixes a cache collision issue in ByteBuddyDoFnInvokerFactory where DoFns with the same raw class but different TypeDescriptors were incorrectly sharing invokers.

Fixes: #37351

Problem

The factory was using Class<?> as the cache key for generated invokers. While this seemed correct due to Java type erasure (e.g., MyDoFn<String> and MyDoFn<Integer> share the same Class object), it fails to account for Apache Beam's TypeDescriptor mechanism:

  • Users can override getInputTypeDescriptor() to provide different type information for the same raw DoFn class
  • The generated invoker needs to respect this TypeDescriptor for correct serialization (Coders) and schema verification
  • Using Class<?> as the key caused DoFns with different TypeDescriptors but the same raw class to incorrectly share invokers

Solution

Changed the cache key from Class<?> to DoFnSignature:

  • DoFnSignature properly captures the TypeDescriptor information
  • This ensures each unique type configuration gets its own correctly generated invoker
  • Added comprehensive Javadoc explaining why this is necessary

Changes

  1. Cache Key Change: Changed byteBuddyInvokerConstructorCache from Map<Class<?>, Constructor<?>> to Map<DoFnSignature, Constructor<?>>

  2. Updated getByteBuddyInvokerConstructor: Now uses signature as the cache key instead of fnClass

  3. Test Case: Added testGenericDoFnCacheKeyingWithAnonymousSubclasses to verify correct behavior


[Gittensor contribution - GlobalStar117]

@github-actions github-actions bot added the java label Jan 20, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @GlobalStar117, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical cache collision issue within the ByteBuddyDoFnInvokerFactory. Previously, the factory's cache used the raw Class<?> of a DoFn as its key, leading to incorrect sharing of invokers between DoFn instances that had the same raw class but different generic type parameters due to Java's type erasure. The fix involves switching the cache key to DoFnSignature, which properly accounts for generic type information, ensuring that each unique DoFn signature receives its own correctly generated invoker, particularly for anonymous subclasses where generic type data is preserved.

Highlights

  • Cache Key Change: The cache key for DoFnInvoker constructors in ByteBuddyDoFnInvokerFactory has been changed from Class<?> to DoFnSignature. This ensures that DoFns with different generic type parameters, even if they share the same raw class, receive distinct and correctly generated invokers.
  • New Test Case: A new unit test, testGenericDoFnCacheKeyingWithAnonymousSubclasses, has been added to verify that anonymous DoFn subclasses with different generic types produce different invokers, confirming the fix for cache collisions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@GlobalStar117 GlobalStar117 force-pushed the fix/bytebuddy-cache-collision-37351 branch from bd95d52 to 0820f45 Compare January 20, 2026 07:02
@Eliaaazzz
Copy link
Contributor

Eliaaazzz commented Jan 20, 2026

@GlobalStar117 Thanks for the detailed analysis!

You are absolutely correct about Java type erasure: new MyDoFn() and new MyDoFn() indeed share the same runtime Class object.

However, in Apache Beam, a DoFn's behavior isn't defined solely by its Class. We rely heavily on TypeDescriptor to handle serialization (Coders) and schema verification.

Why this fix is necessary: Even if the raw class is the same, users can override getInputTypeDescriptor() (or use mechanisms that capture types) to provide different type information for the same raw DoFn class.

The Evidence: My regression test (testCacheKeyCollisionProof) explicitly creates two instances of the same DoFn class but forces them to return different TypeDescriptors. Without this fix, the factory returns the same cached Invoker for both.

The Consequence: If the first Invoker is generated/cached with logic specific to String, and then reused for an Integer context (because the cache key ignored the TypeDescriptor), it leads to runtime issues (like incorrect validation or potential ClassCastException in downstream logic that relies on the Invoker's signature).

So, while type erasure applies to the user's class, the generated Invoker needs to be aware of the specific TypeDescriptor context to function correctly within the Beam pipeline. This PR ensures the cache respects that distinction.

@GlobalStar117
Copy link
Author

@Eliaaazzz Thank you for the detailed clarification! You're absolutely right.

I initially implemented the fix using DoFnSignature as the cache key, but reverted it after encountering CI test failures. I incorrectly concluded that the current behavior was correct due to type erasure.

Your explanation makes it clear that the issue is real:

  • While Class<?> is the same at runtime due to type erasure
  • DoFnSignature captures the TypeDescriptor information that users can override via getInputTypeDescriptor()
  • The generated invoker needs this type information for correct serialization (Coders) and schema verification

I'll re-implement the original fix using DoFnSignature as the cache key and investigate the test failures more carefully. The previous CI failure in ParDoTest$TimerTests may have been a pre-existing flaky test rather than something my change caused.

Let me push the corrected fix.

The ByteBuddyDoFnInvokerFactory was using Class<?> as the cache key for
generated invokers. While this seemed correct due to Java type erasure
(MyDoFn<String> and MyDoFn<Integer> share the same Class), it fails to
account for Apache Beam's TypeDescriptor mechanism.

The problem:
- Users can override getInputTypeDescriptor() to provide different type
  information for the same raw DoFn class
- The generated invoker needs to respect this TypeDescriptor for correct
  serialization (Coders) and schema verification
- Using Class<?> as the key caused DoFns with different TypeDescriptors
  but the same raw class to incorrectly share invokers

The fix:
- Changed the cache key from Class<?> to DoFnSignature
- DoFnSignature properly captures the TypeDescriptor information
- This ensures each unique type configuration gets its own correctly
  generated invoker

Relates to: #37351

[Gittensor contribution - GlobalStar117]
@GlobalStar117 GlobalStar117 force-pushed the fix/bytebuddy-cache-collision-37351 branch from 0820f45 to 53e0245 Compare January 20, 2026 08:25
@GlobalStar117
Copy link
Author

@Eliaaazzz I've updated the PR with the proper fix:

Changed cache key from Class<?> to DoFnSignature

This ensures that:

  • DoFns with the same raw class but different TypeDescriptors (via getInputTypeDescriptor() override) receive separate invokers
  • Correct serialization (Coders) and schema verification for each type configuration

The previous implementation documented the Class<?> behavior as 'correct due to type erasure' - but as you explained, that was wrong because Apache Beam's DoFnSignature captures the TypeDescriptor information that's critical for proper invoker generation.

Thank you for the clarification - it helped me understand the deeper issue!

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@GlobalStar117
Copy link
Author

The failing tests (ReadWriteIT.testReadWrite and ReadWriteIT.testPubsubLiteWriteReadWithSchemaTransform) are PubSub Lite integration tests unrelated to this PR's changes. These appear to be flaky infrastructure tests. Could a maintainer please re-run or approve?

@GlobalStar117 GlobalStar117 closed this by deleting the head repository Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants