-
Notifications
You must be signed in to change notification settings - Fork 112
Initial RPM database detector #1534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new RPM database detector to scan SQLite-based RPM databases commonly found in modern RPM-based Linux distributions (Azure Linux, Fedora 33+, RHEL 9+). The implementation includes:
- A new
RpmDbDetectorthat scansrpmdb.sqlitefiles - A generic
SystemPackageDetectorbase class for future system package detectors (DPKG, APK, etc.) - Support for parsing RPM headers from SQLite BLOBs
- Dependency graph construction using RPM Provides/Requires relationships
- Linux distribution detection via os-release parsing
Key Changes
- Adds
RpmComponenttyped component with PackageURL support for RPM packages - Implements binary RPM header parser for extracting package metadata
- Creates extensible base class for system-level package detection across Linux distributions
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Microsoft.ComponentDetection.Orchestrator/Extensions/ServiceCollectionExtensions.cs | Registers the new RpmDbDetector in the DI container |
| src/Microsoft.ComponentDetection.Detectors/rpm/RpmPackageInfo.cs | Internal model for RPM package information extracted from headers |
| src/Microsoft.ComponentDetection.Detectors/rpm/RpmHeaderParser.cs | Binary parser for RPM header BLOBs stored in SQLite databases |
| src/Microsoft.ComponentDetection.Detectors/rpm/RpmDbDetector.cs | Main detector implementation that reads SQLite RPM databases |
| src/Microsoft.ComponentDetection.Detectors/Microsoft.ComponentDetection.Detectors.csproj | Adds Microsoft.Data.Sqlite package reference |
| src/Microsoft.ComponentDetection.Contracts/TypedComponent/RpmComponent.cs | Typed component representing RPM packages with PackageURL generation |
| src/Microsoft.ComponentDetection.Contracts/TypedComponent/ComponentType.cs | Adds Rpm component type enum value |
| src/Microsoft.ComponentDetection.Contracts/DetectorClass.cs | Adds SystemPackages detector category |
| src/Microsoft.ComponentDetection.Common/SystemPackageDetector.cs | Abstract base class for system package detectors with dependency graph building |
| src/Microsoft.ComponentDetection.Common/LinuxDistribution.cs | Parser for os-release files to identify Linux distributions |
| Directory.Packages.props | Adds Microsoft.Data.Sqlite version specification |
| /// Detector for RPM packages from SQLite-format RPM databases. | ||
| /// Supports Azure Linux, Fedora 33+, RHEL 9+, and other modern RPM-based distributions. | ||
| /// </summary> | ||
| public sealed class RpmDbDetector : SystemPackageDetector |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the project guidelines, all new detectors must start as IDefaultOffComponentDetector and be explicitly enabled via DetectorArgs until maintainers promote them through the lifecycle stages (DefaultOff → Experimental → Default). The class declaration should be:
public sealed class RpmDbDetector : SystemPackageDetector, IDefaultOffComponentDetector| using System.Buffers.Binary; | ||
| using System.Collections.Generic; | ||
| using System.IO; | ||
| using System.Linq; | ||
| using System.Text; |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imports System.Buffers.Binary, System.IO, System.Linq, and System.Text are not used in this file. Consider removing them to keep the code clean.
| using System.Buffers.Binary; | |
| using System.Collections.Generic; | |
| using System.IO; | |
| using System.Linq; | |
| using System.Text; | |
| using System.Collections.Generic; |
| var metadata = (RpmMetadata)package.Metadata; | ||
|
|
||
| // Create the RPM component | ||
| var component = new RpmComponent( | ||
| name: package.Name, | ||
| version: package.Version, | ||
| arch: metadata.Arch, | ||
| release: metadata.Release, | ||
| epoch: metadata.Epoch, | ||
| sourceRpm: metadata.SourceRpm, | ||
| vendor: metadata.Vendor, | ||
| provides: [.. package.Provides], | ||
| requires: [.. package.Requires] | ||
| ); |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata.Arch and metadata.Release could be null, but the RpmComponent constructor validates these as required fields via ValidateRequiredInput. This will cause an exception if these fields are missing from the RPM header. Consider adding validation here or in the ParsePackagesAsync method to skip packages with missing required fields (Arch and Release), similar to how Name and Version are validated on line 87.
| case RPM_STRING_ARRAY_TYPE: | ||
| // Find the end of the string array (double null or end of data) | ||
| dataLength = 0; | ||
| var consecutiveNulls = 0; | ||
| for (var i = entry.Offset; i < dataStore.Length; i++) | ||
| { | ||
| dataLength++; | ||
| if (dataStore[i] == 0) | ||
| { | ||
| consecutiveNulls++; | ||
| if (consecutiveNulls >= 2) | ||
| { | ||
| break; | ||
| } | ||
| } | ||
| else | ||
| { | ||
| consecutiveNulls = 0; | ||
| } | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string array extraction logic searches for consecutive null bytes to determine the end of the array, but doesn't validate against entry.Count which specifies the expected number of strings. This could lead to incorrect parsing if the data is malformed. Consider adding validation to ensure the number of extracted strings matches entry.Count, or at least add a comment explaining why entry.Count is not used for string arrays.
| sourceRpm: metadata.SourceRpm, | ||
| vendor: metadata.Vendor, | ||
| provides: [.. package.Provides], | ||
| requires: [.. package.Requires] |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The distro parameter is not being used when creating the RpmComponent. According to the PackageURL spec for RPM packages and the comment in RpmComponent.cs (line 112-113), the namespace should be set based on the distribution ID (e.g., "fedora", "rhel", "mariner"). Consider using distro?.Id as the namespace in the RpmComponent constructor, similar to how LinuxComponent uses distribution in its PackageURL generation.
Note: This would require modifying RpmComponent to accept and use a namespace parameter.
| requires: [.. package.Requires] | |
| requires: [.. package.Requires], | |
| @namespace: distro?.Id |
| var values = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); | ||
|
|
||
| foreach (var line in content.Split(['\r', '\n'], StringSplitOptions.RemoveEmptyEntries)) | ||
| { | ||
| var trimmedLine = line.Trim(); | ||
|
|
||
| // Skip comments and empty lines | ||
| if (string.IsNullOrEmpty(trimmedLine) || trimmedLine.StartsWith('#')) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| var parts = trimmedLine.Split('=', 2); | ||
| if (parts.Length != 2) | ||
| { | ||
| continue; | ||
| } | ||
|
|
||
| var key = parts[0].Trim(); | ||
| var value = parts[1].Trim(); | ||
|
|
||
| // Remove quotes if present | ||
| if ( | ||
| value.Length >= 2 | ||
| && ( | ||
| (value.StartsWith('\"') && value.EndsWith('\"')) | ||
| || (value.StartsWith('\'') && value.EndsWith('\'')) | ||
| ) | ||
| ) | ||
| { | ||
| value = value[1..^1]; | ||
| } | ||
|
|
||
| values[key] = value; | ||
| } | ||
|
|
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.
| var values = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase); | |
| foreach (var line in content.Split(['\r', '\n'], StringSplitOptions.RemoveEmptyEntries)) | |
| { | |
| var trimmedLine = line.Trim(); | |
| // Skip comments and empty lines | |
| if (string.IsNullOrEmpty(trimmedLine) || trimmedLine.StartsWith('#')) | |
| { | |
| continue; | |
| } | |
| var parts = trimmedLine.Split('=', 2); | |
| if (parts.Length != 2) | |
| { | |
| continue; | |
| } | |
| var key = parts[0].Trim(); | |
| var value = parts[1].Trim(); | |
| // Remove quotes if present | |
| if ( | |
| value.Length >= 2 | |
| && ( | |
| (value.StartsWith('\"') && value.EndsWith('\"')) | |
| || (value.StartsWith('\'') && value.EndsWith('\'')) | |
| ) | |
| ) | |
| { | |
| value = value[1..^1]; | |
| } | |
| values[key] = value; | |
| } | |
| var values = content | |
| .Split(['\r', '\n'], StringSplitOptions.RemoveEmptyEntries) | |
| .Select(line => line.Trim()) | |
| .Where(trimmedLine => !string.IsNullOrEmpty(trimmedLine) && !trimmedLine.StartsWith('#')) | |
| .Select(trimmedLine => | |
| { | |
| var parts = trimmedLine.Split('=', 2); | |
| if (parts.Length != 2) | |
| { | |
| return null; | |
| } | |
| var key = parts[0].Trim(); | |
| var value = parts[1].Trim(); | |
| // Remove quotes if present | |
| if ( | |
| value.Length >= 2 | |
| && ( | |
| (value.StartsWith('\"') && value.EndsWith('\"')) | |
| || (value.StartsWith('\'') && value.EndsWith('\'')) | |
| ) | |
| ) | |
| { | |
| value = value[1..^1]; | |
| } | |
| return new { key, value }; | |
| }) | |
| .Where(x => x != null) | |
| .ToDictionary(x => x.key, x => x.value, StringComparer.OrdinalIgnoreCase); |
| catch (Exception ex) | ||
| { | ||
| this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path); | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic catch clause.
| catch (Exception ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path); | |
| } | |
| catch (IOException ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path); | |
| } | |
| catch (UnauthorizedAccessException ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path); | |
| } | |
| catch (FormatException ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to parse os-release file at {Path}", path); | |
| } | |
| catch (InvalidOperationException ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to parse os-release file at {Path}", path); | |
| } |
| catch (Exception ex) | ||
| { | ||
| this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB, skipping package"); | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic catch clause.
| catch (Exception ex) | |
| { | |
| this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB, skipping package"); | |
| } | |
| catch (FormatException ex) | |
| { | |
| this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB due to format error, skipping package"); | |
| } | |
| catch (InvalidDataException ex) | |
| { | |
| this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB due to invalid data, skipping package"); | |
| } |
| catch (Exception ex) | ||
| { | ||
| this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile); | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic catch clause.
| catch (Exception ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile); | |
| } | |
| catch (IOException ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile); | |
| } | |
| catch (UnauthorizedAccessException ex) | |
| { | |
| this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile); | |
| } |
| catch (Exception) | ||
| { | ||
| // Skip malformed entries | ||
| continue; | ||
| } |
Copilot
AI
Nov 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generic catch clause.
| catch (Exception) | |
| { | |
| // Skip malformed entries | |
| continue; | |
| } | |
| catch (FormatException) | |
| { | |
| // Skip malformed entries | |
| continue; | |
| } | |
| catch (ArgumentException) | |
| { | |
| // Skip malformed entries | |
| continue; | |
| } | |
| catch (InvalidOperationException) | |
| { | |
| // Skip malformed entries | |
| continue; | |
| } | |
| catch (Exception) | |
| { | |
| // Unexpected exception - rethrow to avoid hiding bugs | |
| throw; | |
| } |
This change introduces support for scanning system-level packages, starting with sqlite-based RPM databases
Here's a sample scanning
mcr.microsoft.com/azurelinux/base/core:3.0: https://gist.github.com/JamieMagee/8213dcc0353f70cd1bb4519bf2c2db4f