Skip to content

Conversation

@mohnjiles
Copy link
Contributor

Changed

  • Changed Nvidia GPU detection to use compute capability level instead of the GPU name for certain feature gates / torch indexes

Fixed

  • Fixed #1266 - crash when moving or deleting Lora models in the Checkpoint Manager
  • Fixed #1268 - wrong torch index used for Nvidia 1000-series GPUs and older
  • Fixed #1269, #1257, #1234 - "no such file or directory" errors when updating certain packages after folder migration
  • Fixed #1274, #1276 - incorrect torch installed when updating to InvokeAI v5.12+
  • Fixed missing shared folder links for SwarmUI's diffusion_models and clip folders

Supporters

🌟 Visionaries

Our deepest gratitude to the brilliant Visionary-tier Patrons: Waterclouds, bluepopsicle, Bob S, Ibixat, and Corey T! Your incredible backing is instrumental in shaping the future of Stability Matrix and empowering us to deliver cutting-edge features. Thank you for believing in our vision! 🙏

🚀 Pioneers

A huge shout-out to our fantastic Pioneer-tier Patrons: Mr. Unknown, tankfox, Szir777, Noah M, USATechDude, Thom, TheTekknician, and SeraphOfSalem! Your consistent support and active engagement are vital to our community's growth and our ongoing development efforts. You truly make a difference! ✨

ionite34 and others added 20 commits May 20, 2025 03:31
[dev to main] backport: Fix DynamicData sorting crash on delete model (1074)
Update GpuInfo parsing to include Compute Capability for nvidia GPUs …

(cherry picked from commit c422920)

# Conflicts:
#	CHANGELOG.md
#	StabilityMatrix.Core/Models/Packages/ComfyUI.cs
#	StabilityMatrix.Core/Models/Packages/KohyaSs.cs
#	StabilityMatrix.Core/Models/Packages/RuinedFooocus.cs
[dev to main] backport: Update GpuInfo parsing to include Compute Capability for nvidia GPUs … (1076)
shoutout-chagenlog

(cherry picked from commit 75a96d3)
[dev to main] backport: shoutout-chagenlog (1079)
Add SharedFolderThing for TextEncoders in SwarmUI …

(cherry picked from commit 2886449)

# Conflicts:
#	CHANGELOG.md
[dev to main] backport: Add SharedFolderThing for TextEncoders in SwarmUI … (1086)
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mohnjiles, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary for this pull request titled "v2.14.2". This PR appears to be a release preparation containing several important bug fixes and a key change in how Nvidia GPUs are detected and handled, particularly for selecting the correct PyTorch index based on compute capability rather than just the GPU name. It addresses issues ranging from Lora model management crashes to incorrect package installations and shared folder linking problems.

Highlights

  • Nvidia GPU Detection Improvement: The method for detecting and classifying Nvidia GPUs has been updated to use the compute capability level instead of relying solely on the GPU name. This is used for determining compatibility with certain features and selecting the appropriate PyTorch index (e.g., cu126 vs cu128).
  • Fixed Lora Model Management Crash: Addresses a crash that occurred when moving or deleting Lora models within the Checkpoint Manager.
  • Corrected Torch Index for Older Nvidia GPUs: Fixes an issue where the wrong PyTorch index (cu128) was being used for older Nvidia 1000-series GPUs and potentially others with lower compute capability, now correctly using cu126 where appropriate.
  • Resolved Folder Migration Update Errors: Fixes 'no such file or directory' errors that could occur when updating certain packages after a folder migration.
  • Fixed InvokeAI Torch Installation: Corrects the installation process for InvokeAI v5.12+ to ensure the correct PyTorch version and index are installed.
  • Restored SwarmUI Shared Folder Links: Fixes missing shared folder links for SwarmUI's diffusion_models and clip folders.

Changelog

Click here to see the changelog
  • CHANGELOG.md
    • Added entry for version v2.14.2, detailing the changes, fixes, and supporters.
  • Directory.Packages.props
    • Updated the DynamicData package version from 9.0.1 to 9.3.1.
  • StabilityMatrix.Avalonia/Services/InferenceClientManager.cs
    • Removed unused code related to connecting and binding the loraModelsSource.
  • StabilityMatrix.Avalonia/ViewModels/MainWindowViewModel.cs
    • Added a using statement for StabilityMatrix.Core.Helper.HardwareInfo.
    • Added commas to properties in switch expressions and dialog initializations (lines 84, 160, 473, 506).
    • Added a call to AddComputeCapabilityIfNecessary to run on initial load (line 266).
    • Added a comma to the analytics launch data properties (line 300).
    • Added a new private method AddComputeCapabilityIfNecessary to update the preferred GPU setting with compute capability if it's missing (lines 525-549).
  • StabilityMatrix.Core/Helper/HardwareInfo/GpuInfo.cs
    • Added ComputeCapability string property (line 8).
    • Added ComputeCapabilityValue decimal property to parse and store the compute capability as a number (lines 14-15).
    • Added a comma to the MemoryLevel switch expression (line 23).
    • Updated IsBlackwellGpu method to check ComputeCapabilityValue >= 12.0m instead of parsing the name (line 44).
    • Updated IsAmpereOrNewerGpu method to check ComputeCapabilityValue >= 8.6m instead of parsing the name (line 52).
    • Added IsLegacyNvidiaGpu method to check ComputeCapabilityValue < 7.5m (lines 55-61).
  • StabilityMatrix.Core/Helper/HardwareInfo/HardwareHelper.cs
    • Added a comma to the HardwareInfoLazy initialization (line 21).
    • Added a comma to the RedirectStandardOutput property in RunBashCommand (line 30).
    • Added a comma to the MemoryBytes property in IterGpuInfoLinux (line 106).
    • Added a comma to the MemoryBytes property in IterGpuInfoMacos (line 130).
    • Added a comma to the MemoryBytes property in IterGpuInfo (line 171).
    • Modified the nvidia-smi arguments to include compute_cap in the query (line 208).
    • Updated the parsing logic for nvidia-smi output to expect 3 data points instead of 2 (line 227).
    • Added ComputeCapability to the GpuInfo initialization from nvidia-smi output (line 238).
    • Updated HasBlackwellGpu to use ComputeCapabilityValue >= 12.0m (line 257).
    • Added HasLegacyNvidiaGpu method using ComputeCapabilityValue < 7.5m (lines 260-263).
    • Added commas to properties in GetMemoryInfoImplWindows (line 327) and GetMemoryInfoImplGeneric (lines 342, 350).
  • StabilityMatrix.Core/Models/Packages/BaseGitPackage.cs
    • Added logic to temporarily remove symlinks for shared folders if the Symlink method is used, before performing the update (lines 442-462). This helps prevent 'no such file or directory' errors during updates after folder migration.
  • StabilityMatrix.Core/Models/Packages/ComfyUI.cs
    • Added a check for isLegacyNvidia based on the new IsLegacyNvidiaGpu method (lines 347-352).
    • Updated the TorchIndex.Cuda case in the pip install arguments to use cu126 if isLegacyNvidia is true, otherwise use cu128 (line 368).
  • StabilityMatrix.Core/Models/Packages/Config/FdsConfigSharingStrategy.cs
    • Changed how paths are set in the FDS config, joining multiple paths with a semicolon ; instead of just taking the first one (line 103). This fixes issues with SwarmUI's shared folder configuration.
  • StabilityMatrix.Core/Models/Packages/ForgeClassic.cs
    • Added a check for isLegacyNvidia based on the new IsLegacyNvidiaGpu method (line 157).
    • Updated the torchExtraIndex variable to be cu126 if isLegacyNvidia is true, otherwise cu128 (line 158).
    • Updated the WithTorchExtraIndex argument to use the determined torchExtraIndex (line 165).
  • StabilityMatrix.Core/Models/Packages/InvokeAI.cs
    • Added a using statement for StabilityMatrix.Core.Helper.HardwareInfo (line 11).
    • Added commas to shared folder path initializations (lines 67, 71, 74).
    • Added commas to launch option definitions (lines 89, 95, 97).
    • Added a comma to the prerequisites list (line 118).
    • Added a check for isLegacyNvidiaGpu based on the new IsLegacyNvidiaGpu method (lines 152-157).
    • Updated the TorchIndex.Cuda installation logic:
    • Determined torchIndex based on isLegacyNvidiaGpu (cu126 or cu128) (line 164).
    • Updated Torch, TorchVision, TorchAudio, and XFormers versions to 2.7.0, 0.22.0, 2.7.0, and 0.0.30 respectively (lines 166-169).
    • Updated the WithTorchExtraIndex and pipCommandArgs to use the determined torchIndex (lines 170, 174).
    • Added commas to the multiline string literal for the Python code (lines 334-337).
    • Added commas to the SetupModelFolders and RemoveModelFolderLinks method signatures (lines 398, 403).
    • Added a comma to the ContentSerializer options (line 424).
    • Added a comma to the InstallModelRequest description property (line 441).
    • Added a comma to the install status check condition (line 456).
    • Added commas to the process output text strings (lines 467, 484).
  • StabilityMatrix.Core/Models/Packages/KohyaSs.cs
    • Added a using statement for StabilityMatrix.Core.Models.FileInterfaces (line 6).
    • Added setuptools and uv to the initial pip install arguments (line 129).
    • Added a check for isLegacyNvidia based on the new IsLegacyNvidiaGpu method (lines 137-138).
    • Determined torchExtraIndex based on isLegacyNvidia (cu126 or cu128) (line 139).
    • Replaced the Blackwell-specific torch/xformers/bitsandbytes installation logic with a general torch/torchvision/torchaudio/xformers install using the determined torchExtraIndex (lines 142-155).
    • Adjusted the Windows requirements parsing to exclude bitsandbytes==0.44.0 from the requirements file and install it separately (lines 174-179).
  • StabilityMatrix.Core/Models/Packages/RuinedFooocus.cs
    • Added a check for isLegacyNvidia based on the new IsLegacyNvidiaGpu method (lines 92-94).
    • Determined torchExtraIndex based on isLegacyNvidia (cu126 or cu128) (line 95).
    • Updated the WithTorchExtraIndex argument in the pip install arguments to use the determined torchExtraIndex instead of hardcoded cu121 (line 99).
  • StabilityMatrix.Core/Models/Packages/StableSwarm.cs
    • Added a comma to the return value of GetExtraCommands (line 70).
    • Added commas to launch option definitions (lines 80, 87, 93, 99, 105, 111, 117, 119).
    • Updated SharedFolderLayout rules to include TextEncoders mapped to Models/clip (lines 166-171).
    • Added a comma to the prerequisites list (line 187).
    • Added a comma to the NuGet source name argument (line 226).
    • Added a comma to the SDClipVisionFolder path initialization (line 286).
    • Added commas to the multiline string literal for the ZLUDA wrapper script content (lines 324-326).
    • Added a comma to the ExtraArgs property when saving settings (line 343).
    • Added a comma to the ExtraArgs property when saving settings for ComfyUI (line 357).
    • Added a comma to the GIT environment variable path (line 380).
    • Added logic to create a symlink for the DiffusionModels shared folder (lines 409-431).
    • Added a comma to the dotnet run arguments (line 442).
    • Added a comma to the dotnet publish output path argument (line 542).
    • Added a comma to the SDClipVisionFolder path initialization in SetupModelFoldersConfig (line 609).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A bug appears, a hidden plight,
Compute cap brings the torch index right.
Old GPUs cheer,
Updates are clear,
Stability shines ever so bright.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request (v2.14.2) primarily focuses on improving Nvidia GPU detection by using compute capability levels and addresses several important bug fixes. The changes appear well-targeted and align with the descriptions provided. The refactoring of GPU detection logic is a significant improvement for accuracy in feature gating and torch index selection.

Most changes are clean and directly address the listed issues. The updates to dependency versions and package installation scripts seem appropriate for resolving compatibility problems and supporting newer hardware configurations.

Summary of Findings

  • Improved GPU Detection: The change to use Nvidia GPU compute capability level instead of GPU name for feature gates and torch indexes is a solid improvement, likely leading to more accurate and future-proof hardware support.
  • Bug Fixes: The PR addresses several critical bugs, including crashes related to Lora models, incorrect torch CUDAbinaries for older Nvidia GPUs and InvokeAI, file path issues after migration, and missing shared folder links for SwarmUI. These fixes should enhance stability and user experience.
  • Exception Handling in AddComputeCapabilityIfNecessary: In StabilityMatrix.Avalonia/ViewModels/MainWindowViewModel.cs, the AddComputeCapabilityIfNecessary method silently ignores exceptions. This could mask problems with updating GPU information, potentially undermining the benefits of the new compute capability detection. It's recommended to log these exceptions.
  • Dependency Update: The DynamicData package was updated from 9.0.1 to 9.3.1. This is a standard dependency update.

Merge Readiness

This pull request introduces valuable fixes and a significant improvement to GPU detection. However, there is one high-severity issue regarding silent exception handling in the AddComputeCapabilityIfNecessary method within MainWindowViewModel.cs.

I recommend addressing this exception handling concern by adding logging, at a minimum, to ensure that any failures in updating GPU compute capability can be diagnosed. Once this is addressed, the PR should be in good shape for merging.

As an AI reviewer, I am not authorized to approve pull requests. Please ensure further review and approval from authorized team members before merging.

Comment on lines +545 to +548
catch (Exception)
{
// ignored
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The AddComputeCapabilityIfNecessary method catches all exceptions and ignores them. While this prevents the application from crashing if nvidia-smi fails or there's an issue updating settings, it also means that failures to update the GPU's compute capability will happen silently. This could lead to the system not using the new compute capability-based detection as intended, potentially causing incorrect feature gating or torch index selection for Nvidia GPUs.

Could we consider logging the exception here? This would help in diagnosing issues if the compute capability isn't being updated correctly for some users.

For example:

catch (Exception ex)
{
    Logger.Warn(ex, "Failed to add compute capability during startup.");
    // Optionally, re-throw if this failure is critical enough to halt
    // or if specific exceptions should be handled differently.
}

@mohnjiles mohnjiles merged commit efcf9e7 into LykosAI:main May 24, 2025
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators May 24, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

2 participants