-
Notifications
You must be signed in to change notification settings - Fork 236
porting gamecraft to fastvideo #941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
porting gamecraft to fastvideo #941
Conversation
Summary of ChangesHello @MihirJagtap, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the Hunyuan-GameCraft model into the fastvideo framework, marking a significant step towards interactive game video generation. The core changes involve integrating a specialized transformer model capable of incorporating camera control and action sequences directly into the video synthesis process. The update provides a comprehensive inference pipeline, complete with distributed processing capabilities, FP8 optimization for enhanced performance, and a user-friendly Gradio interface, making it a powerful tool for creating dynamic and controllable video content in game environments. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request ports the Hunyuan-GameCraft model to the fastvideo framework. It introduces a significant amount of new code, including model definitions, configurations, inference pipelines, and a Gradio-based user interface. The changes are extensive and cover the full stack from model architecture to a user-facing demo.
My review has identified several issues of varying severity. There is a critical issue in the API server that will cause it to be unresponsive due to a long sleep call. The Gradio application has a high-severity issue related to its use of global state, which will lead to incorrect behavior in multi-session scenarios. There are also a few high and medium severity issues in the model and data loading code that should be addressed to ensure correctness and robustness. Overall, the porting effort is substantial, but these key issues need to be resolved before merging.
| with open(trigger_tmp, 'w') as f: | ||
| json.dump(trigger_data, f, indent=2, ensure_ascii=False) | ||
| os.replace(trigger_tmp, TRIGGER_FILE) # Atomically replace/create | ||
| time.sleep(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| latents = x0 * t + x1 * (1 - t) | ||
| latents = torch.randn_like(x1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latents variable calculated from linear interpolation on line 531 is immediately overwritten on line 532. This makes the interpolation logic ineffective and appears to be a bug. The TODO: correct comment on line 524 also suggests this section may be incomplete or incorrect. Please review this logic to ensure the intended latent initialization is performed.
| latents = x0 * t + x1 * (1 - t) | |
| latents = torch.randn_like(x1) | |
| latents = x0 * t + x1 * (1 - t) | |
| previous_video_path = None | ||
| previous_first_frame = None | ||
| all_video_paths = [] | ||
| current_input_type = None # record input type: "image" or "video" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The application uses global variables (previous_video_path, previous_first_frame, all_video_paths, current_input_type) to manage state between user interactions. This is problematic in a web application as it creates shared state across all user sessions, leading to race conditions and incorrect behavior. Please refactor this to use gr.State for managing session-specific state. All variables that need to be preserved between interactions for a single user session should be passed as inputs and outputs to your Gradio functions via gr.State components.
For example:
# In the UI definition
...
video_state = gr.State(value={"previous_video_path": None, "all_video_paths": []})
...
# In the click handler
generate_btn.click(
fn=generate_custom_video,
inputs=[..., video_state],
outputs=[..., video_state]
)
# In the function definition
def generate_custom_video(..., video_state_dict):
# read from video_state_dict
previous_video_path = video_state_dict["previous_video_path"]
...
# update state
video_state_dict["previous_video_path"] = new_video_path
...
return ..., video_state_dict| }) | ||
|
|
||
| # Reverse mapping for saving checkpoints: custom -> hf | ||
| reverse_param_names_mapping: dict = field(default_factory=lambda: {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reverse_param_names_mapping is initialized as an empty dictionary. The comment indicates it's for saving checkpoints. If saving checkpoints is intended to be functional with this PR, this dictionary should be populated. If it's a work-in-progress, consider adding a TODO comment to make it clear that this functionality is not yet implemented.
| except: | ||
| # Fallback: PIL for special formats | ||
| face_image_masked = Image.open(image_path).convert('RGB') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a bare except: clause is generally discouraged as it can catch unexpected exceptions like SystemExit or KeyboardInterrupt, making the program harder to debug or terminate. It's better to catch a more specific exception, such as Exception.
| except: | |
| # Fallback: PIL for special formats | |
| face_image_masked = Image.open(image_path).convert('RGB') | |
| except Exception: | |
| # Fallback: PIL for special formats | |
| face_image_masked = Image.open(image_path).convert('RGB') | |
| if CPU_OFFLOAD: torch.cuda.empty_cache() | ||
|
|
||
| camera_condition = cam_latents | ||
| assert camera_condition is not None, print("plucker_embedding is not provided") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert statement uses a print() call for its message. The print() function will be executed regardless of whether the assertion passes or fails, which is likely not the intended behavior. The message should be passed as the second argument to assert.
| assert camera_condition is not None, print("plucker_embedding is not provided") | |
| assert camera_condition is not None, "plucker_embedding is not provided" |
No description provided.