Skip to content

Conversation

@hadipash
Copy link

@hadipash hadipash commented Apr 23, 2025

This PR fixes condition config mutation in prepare_visual_condition during training. Specifically, because dictionaries are passed by reference, even a single short video would be enough to completely remove v2v_head, v2v_tail, v2v_head_easy, and v2v_tail_easy from the condition config.

@hadipash hadipash changed the title fix a bug that caused to pop all v2v conditions fix a bug that caused to pop all v2v conditions during training Apr 23, 2025
@botbw
Copy link
Contributor

botbw commented May 9, 2025

@hadipash Could you please specify any exact bugs this usage causes? Poping a config might be useful if we don't want it to be consumed twice (but I'm not sure if it's the case here)

@hadipash
Copy link
Author

hadipash commented May 9, 2025

@botbw It doesn't cause bugs with the current configs (as they use i2v configuration only), but if one were to add v2v configurations, a single short video would cause those configurations to pop permanently in the following lines:

if T <= 32 // model_ae.time_compression_ratio:
condition_config.pop("v2v_head", None) # given first 32 frames
condition_config.pop("v2v_tail", None) # given last 32 frames
condition_config.pop("v2v_head_easy", None) # given first 64 frames
condition_config.pop("v2v_tail_easy", None) # given last 64 frames
if T <= 64 // model_ae.time_compression_ratio:
condition_config.pop("v2v_head_easy", None) # given first 64 frames
condition_config.pop("v2v_tail_easy", None) # given last 64 frames

The popping above is supposed to be local and applied to a specific batch, rather than modifying the global configuration.

@botbw
Copy link
Contributor

botbw commented May 9, 2025

@botbw It doesn't cause bugs with the current configs (as they use i2v configuration only), but if one were to add v2v configurations, a single short video would cause those configurations to pop permanently in the following lines:

if T <= 32 // model_ae.time_compression_ratio:
condition_config.pop("v2v_head", None) # given first 32 frames
condition_config.pop("v2v_tail", None) # given last 32 frames
condition_config.pop("v2v_head_easy", None) # given first 64 frames
condition_config.pop("v2v_tail_easy", None) # given last 64 frames
if T <= 64 // model_ae.time_compression_ratio:
condition_config.pop("v2v_head_easy", None) # given first 64 frames
condition_config.pop("v2v_tail_easy", None) # given last 64 frames

The popping above is supposed to be local and applied to a specific batch, rather than modifying the global configuration.

@hadipash I see, but I think the caller should be aware of potential modifications when executing this (although surely it should be documented), there is no optimal design (IMO) as python doesn't have any ownership concept.

@hadipash
Copy link
Author

@botbw Right. Moved the dictionary copying under prepare_visual_condition to avoid confusion, as it's supposed to be internals of the API.

@botbw
Copy link
Contributor

botbw commented May 12, 2025

@zhengzangw A tiny change for the robustness of open-sourced code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants