Skip to content

Conversation

@inureyes
Copy link
Member

@inureyes inureyes commented Jan 2, 2026

Summary

  • Add AllSmi client struct with high-level, ergonomic API for hardware monitoring
  • Create unified Error type using thiserror for comprehensive error handling
  • Add prelude module for convenient imports of all common types
  • Implement platform-specific initialization (macOS NativeMetricsManager, Linux HlsmiManager)
  • Ensure thread safety with Send + Sync bounds on AllSmi
  • Add comprehensive documentation with examples in lib.rs and README.md

Changes

New Files

  • src/client.rs - Main AllSmi struct with get_gpu_info(), get_cpu_info(), get_memory_info(), get_process_info(), get_chassis_info()
  • src/error.rs - Unified Error enum with PlatformInit, NoDevicesFound, DeviceAccess, PermissionDenied, NotSupported, Io variants
  • src/prelude.rs - Re-exports for AllSmi, Error, Result, core types, traits, and factory functions
  • examples/library_usage.rs - Comprehensive example demonstrating library API usage
  • tests/library_api_test.rs - Integration tests for the library API

Modified Files

  • src/lib.rs - Updated to export new public API with comprehensive documentation
  • README.md - Added "Library API" section with usage examples

Test Plan

  • Run cargo test - all 17 library API tests pass
  • Run cargo clippy - no warnings
  • Run cargo run --example library_usage - example executes successfully
  • Verify AllSmi is Send + Sync
  • Verify graceful handling when no GPUs are detected

Closes #106

Introduce high-level AllSmi client struct with ergonomic API for querying
GPU/NPU, CPU, memory, and chassis information. Add unified Error type using
thiserror, prelude module for convenient imports, and comprehensive examples
and tests for library usage.

Closes #106
@inureyes inureyes added type:enhancement New feature or request status:review Under review priority:medium Medium priority issue mode:api API mode related device:nvidia-gpu NVIDIA GPU related device:apple-silicon Apple Silicon related device:npu NPU (Neural Processing Unit) related labels Jan 2, 2026
@inureyes
Copy link
Member Author

inureyes commented Jan 2, 2026

Security & Performance Review - PR #109

Analysis Summary

  • PR Branch: feature/issue-106-library-api
  • Scope: changed-files
  • Languages: Rust
  • Total issues: 5
  • CRITICAL: 0 | HIGH: 1 | MEDIUM: 3 | LOW: 1

Prioritized Issue List

HIGH (1 issue)

1. Unsafe Send + Sync implementation lacks detailed safety documentation

  • File: /home/inureyes/Development/backend.ai/all-smi/src/client.rs (lines 433-435)
  • Current code:
    // Safety: AllSmi uses thread-safe readers
    unsafe impl Send for AllSmi {}
    unsafe impl Sync for AllSmi {}
  • Issue: The safety comment is too brief. For unsafe impl, Rust conventions require detailed documentation explaining:
    1. Why the struct is actually Send/Sync safe
    2. What invariants are maintained
    3. What types inside make this safe
  • Impact: While the underlying traits (GpuReader, CpuReader, etc.) all require Send + Sync, the struct contains Box<dyn Trait> which by itself is not Send + Sync without the trait bounds. The current implementation IS correct because the traits require Send + Sync, but future maintainers might not understand this relationship.
  • Recommendation: Expand the safety comment:
    // SAFETY: AllSmi is Send + Sync because:
    // 1. All reader traits (GpuReader, CpuReader, MemoryReader, ChassisReader) 
    //    require Send + Sync bounds
    // 2. Box<dyn Trait + Send + Sync> is Send + Sync
    // 3. Platform-specific fields (_macos_initialized, _gaudi_initialized) are 
    //    primitive bool types which are inherently Send + Sync
    // 4. The underlying platform managers use thread-safe primitives (Arc, Mutex, RwLock)
    unsafe impl Send for AllSmi {}
    unsafe impl Sync for AllSmi {}

MEDIUM (3 issues)

2. Multiple AllSmi instances may cause race conditions with platform managers

  • File: /home/inureyes/Development/backend.ai/all-smi/src/client.rs (lines 177-226)
  • Issue: Creating multiple AllSmi instances could lead to:
    • On macOS: Multiple calls to initialize_native_metrics_manager (handled gracefully via Mutex guard)
    • On Linux with Gaudi: Multiple calls to initialize_hlsmi_manager (also handled gracefully)
    • On Drop: Multiple shutdowns could race
  • Current mitigation: The managers use Lazy<Mutex<Option<Arc<...>>>> pattern which handles this correctly.
  • Impact: LOW - The implementation correctly handles this, but documentation should clarify the behavior.
  • Recommendation: Add documentation to AllSmi::new() explaining that multiple instances share the same underlying platform managers.

3. Missing #[must_use] attribute on Result-returning methods

  • Files: /home/inureyes/Development/backend.ai/all-smi/src/client.rs, /home/inureyes/Development/backend.ai/all-smi/src/error.rs
  • Issue: The AllSmi::new() and AllSmi::with_config() methods return Result but lack #[must_use] attributes.
  • Impact: Users could accidentally ignore initialization errors.
  • Recommendation: Add #[must_use = "this returns a Result that should be checked"] to both methods.

4. Documentation warnings from cargo doc

  • Files: Multiple
  • Issues found:
    • src/error.rs:88 - Ambiguous link to Error (both enum and derive macro)
    • src/device/hlsmi/parser.rs:52 - Unresolved link [suffix]
    • src/parsing/common.rs:68 - Unresolved link [a-zA-Z0-9_]
    • src/parsing/macros.rs:17,24 - Unclosed HTML tags
  • Impact: Documentation may be confusing for users.
  • Recommendation: Fix doc comments by escaping brackets and disambiguating links.

LOW (1 issue)

5. has_gpus() checks reader count, not actual GPU count

  • File: /home/inureyes/Development/backend.ai/all-smi/src/client.rs (lines 403-405)
  • Current code:
    pub fn has_gpus(&self) -> bool {
        !self.gpu_readers.is_empty()
    }
  • Issue: This returns true if there are GPU readers available, not if GPUs were actually detected. A system with NVIDIA drivers but no physical GPU would have a reader but get_gpu_info() would return empty.
  • Impact: Minor - documentation already clarifies this, but the method name is potentially misleading.
  • Recommendation: Either rename to has_gpu_readers() or change implementation to !self.get_gpu_info().is_empty() (with caching for performance).

Positive Findings

  1. Thread safety is correctly implemented: The underlying traits all require Send + Sync, and the platform managers use proper synchronization primitives.

  2. Error handling is comprehensive: The Error enum covers all common failure modes with descriptive messages.

  3. API design follows Rust conventions: Builder pattern for AllSmiConfig, Result type alias, prelude module for convenience.

  4. Tests are thorough: Integration tests verify Send + Sync bounds, creation, and all API methods.

  5. Documentation is extensive: Comprehensive doc comments with examples on all public items.

  6. Graceful degradation: The API doesn't fail if hardware isn't available - it simply returns empty collections.


Test Results

cargo clippy: PASSED (no warnings)
cargo test: PASSED (153 unit tests + 17 integration tests + 15 doc tests)
cargo doc: PASSED (5 minor warnings noted above)

Recommendations Summary

Priority Action Status
HIGH Expand unsafe impl safety documentation Suggested
MEDIUM Document multiple instance behavior Suggested
MEDIUM Add #[must_use] attributes Suggested
MEDIUM Fix documentation warnings Suggested
LOW Clarify has_gpus() semantics Suggested

Conclusion

This PR introduces a well-designed library API for all-smi. The implementation is sound and safe. The issues identified are primarily documentation and API ergonomics improvements rather than functional bugs. The thread safety implementation is correct despite the minimal safety comment, and the error handling is comprehensive.

Recommendation: APPROVE with minor documentation improvements suggested.

@inureyes inureyes merged commit 188f817 into main Jan 2, 2026
2 checks passed
@inureyes inureyes deleted the feature/issue-106-library-api branch January 2, 2026 10:08
@inureyes inureyes self-assigned this Jan 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

device:apple-silicon Apple Silicon related device:npu NPU (Neural Processing Unit) related device:nvidia-gpu NVIDIA GPU related mode:api API mode related priority:medium Medium priority issue status:review Under review type:enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add proper library API for external Rust projects

2 participants