Skip to content

Conversation

@JamieMagee
Copy link
Member

@JamieMagee JamieMagee commented Nov 15, 2025

This change introduces support for scanning system-level packages, starting with sqlite-based RPM databases

Here's a sample scanning mcr.microsoft.com/azurelinux/base/core:3.0: https://gist.github.com/JamieMagee/8213dcc0353f70cd1bb4519bf2c2db4f

Copilot AI review requested due to automatic review settings November 15, 2025 00:15
Copilot finished reviewing on behalf of JamieMagee November 15, 2025 00:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new RPM database detector to scan SQLite-based RPM databases commonly found in modern RPM-based Linux distributions (Azure Linux, Fedora 33+, RHEL 9+). The implementation includes:

  • A new RpmDbDetector that scans rpmdb.sqlite files
  • A generic SystemPackageDetector base class for future system package detectors (DPKG, APK, etc.)
  • Support for parsing RPM headers from SQLite BLOBs
  • Dependency graph construction using RPM Provides/Requires relationships
  • Linux distribution detection via os-release parsing

Key Changes

  • Adds RpmComponent typed component with PackageURL support for RPM packages
  • Implements binary RPM header parser for extracting package metadata
  • Creates extensible base class for system-level package detection across Linux distributions

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/Microsoft.ComponentDetection.Orchestrator/Extensions/ServiceCollectionExtensions.cs Registers the new RpmDbDetector in the DI container
src/Microsoft.ComponentDetection.Detectors/rpm/RpmPackageInfo.cs Internal model for RPM package information extracted from headers
src/Microsoft.ComponentDetection.Detectors/rpm/RpmHeaderParser.cs Binary parser for RPM header BLOBs stored in SQLite databases
src/Microsoft.ComponentDetection.Detectors/rpm/RpmDbDetector.cs Main detector implementation that reads SQLite RPM databases
src/Microsoft.ComponentDetection.Detectors/Microsoft.ComponentDetection.Detectors.csproj Adds Microsoft.Data.Sqlite package reference
src/Microsoft.ComponentDetection.Contracts/TypedComponent/RpmComponent.cs Typed component representing RPM packages with PackageURL generation
src/Microsoft.ComponentDetection.Contracts/TypedComponent/ComponentType.cs Adds Rpm component type enum value
src/Microsoft.ComponentDetection.Contracts/DetectorClass.cs Adds SystemPackages detector category
src/Microsoft.ComponentDetection.Common/SystemPackageDetector.cs Abstract base class for system package detectors with dependency graph building
src/Microsoft.ComponentDetection.Common/LinuxDistribution.cs Parser for os-release files to identify Linux distributions
Directory.Packages.props Adds Microsoft.Data.Sqlite version specification

/// Detector for RPM packages from SQLite-format RPM databases.
/// Supports Azure Linux, Fedora 33+, RHEL 9+, and other modern RPM-based distributions.
/// </summary>
public sealed class RpmDbDetector : SystemPackageDetector
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the project guidelines, all new detectors must start as IDefaultOffComponentDetector and be explicitly enabled via DetectorArgs until maintainers promote them through the lifecycle stages (DefaultOff → Experimental → Default). The class declaration should be:

public sealed class RpmDbDetector : SystemPackageDetector, IDefaultOffComponentDetector

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +4 to +8
using System.Buffers.Binary;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports System.Buffers.Binary, System.IO, System.Linq, and System.Text are not used in this file. Consider removing them to keep the code clean.

Suggested change
using System.Buffers.Binary;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Collections.Generic;

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +170
var metadata = (RpmMetadata)package.Metadata;

// Create the RPM component
var component = new RpmComponent(
name: package.Name,
version: package.Version,
arch: metadata.Arch,
release: metadata.Release,
epoch: metadata.Epoch,
sourceRpm: metadata.SourceRpm,
vendor: metadata.Vendor,
provides: [.. package.Provides],
requires: [.. package.Requires]
);
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metadata.Arch and metadata.Release could be null, but the RpmComponent constructor validates these as required fields via ValidateRequiredInput. This will cause an exception if these fields are missing from the RPM header. Consider adding validation here or in the ParsePackagesAsync method to skip packages with missing required fields (Arch and Release), similar to how Name and Version are validated on line 87.

Copilot uses AI. Check for mistakes.
Comment on lines +204 to +223
case RPM_STRING_ARRAY_TYPE:
// Find the end of the string array (double null or end of data)
dataLength = 0;
var consecutiveNulls = 0;
for (var i = entry.Offset; i < dataStore.Length; i++)
{
dataLength++;
if (dataStore[i] == 0)
{
consecutiveNulls++;
if (consecutiveNulls >= 2)
{
break;
}
}
else
{
consecutiveNulls = 0;
}
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string array extraction logic searches for consecutive null bytes to determine the end of the array, but doesn't validate against entry.Count which specifies the expected number of strings. This could lead to incorrect parsing if the data is malformed. Consider adding validation to ensure the number of extracted strings matches entry.Count, or at least add a comment explaining why entry.Count is not used for string arrays.

Copilot uses AI. Check for mistakes.
sourceRpm: metadata.SourceRpm,
vendor: metadata.Vendor,
provides: [.. package.Provides],
requires: [.. package.Requires]
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distro parameter is not being used when creating the RpmComponent. According to the PackageURL spec for RPM packages and the comment in RpmComponent.cs (line 112-113), the namespace should be set based on the distribution ID (e.g., "fedora", "rhel", "mariner"). Consider using distro?.Id as the namespace in the RpmComponent constructor, similar to how LinuxComponent uses distribution in its PackageURL generation.

Note: This would require modifying RpmComponent to accept and use a namespace parameter.

Suggested change
requires: [.. package.Requires]
requires: [.. package.Requires],
@namespace: distro?.Id

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +79
var values = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);

foreach (var line in content.Split(['\r', '\n'], StringSplitOptions.RemoveEmptyEntries))
{
var trimmedLine = line.Trim();

// Skip comments and empty lines
if (string.IsNullOrEmpty(trimmedLine) || trimmedLine.StartsWith('#'))
{
continue;
}

var parts = trimmedLine.Split('=', 2);
if (parts.Length != 2)
{
continue;
}

var key = parts[0].Trim();
var value = parts[1].Trim();

// Remove quotes if present
if (
value.Length >= 2
&& (
(value.StartsWith('\"') && value.EndsWith('\"'))
|| (value.StartsWith('\'') && value.EndsWith('\''))
)
)
{
value = value[1..^1];
}

values[key] = value;
}

Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This foreach loop immediately maps its iteration variable to another variable - consider mapping the sequence explicitly using '.Select(...)'.

Suggested change
var values = new Dictionary<string, string>(StringComparer.OrdinalIgnoreCase);
foreach (var line in content.Split(['\r', '\n'], StringSplitOptions.RemoveEmptyEntries))
{
var trimmedLine = line.Trim();
// Skip comments and empty lines
if (string.IsNullOrEmpty(trimmedLine) || trimmedLine.StartsWith('#'))
{
continue;
}
var parts = trimmedLine.Split('=', 2);
if (parts.Length != 2)
{
continue;
}
var key = parts[0].Trim();
var value = parts[1].Trim();
// Remove quotes if present
if (
value.Length >= 2
&& (
(value.StartsWith('\"') && value.EndsWith('\"'))
|| (value.StartsWith('\'') && value.EndsWith('\''))
)
)
{
value = value[1..^1];
}
values[key] = value;
}
var values = content
.Split(['\r', '\n'], StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Trim())
.Where(trimmedLine => !string.IsNullOrEmpty(trimmedLine) && !trimmedLine.StartsWith('#'))
.Select(trimmedLine =>
{
var parts = trimmedLine.Split('=', 2);
if (parts.Length != 2)
{
return null;
}
var key = parts[0].Trim();
var value = parts[1].Trim();
// Remove quotes if present
if (
value.Length >= 2
&& (
(value.StartsWith('\"') && value.EndsWith('\"'))
|| (value.StartsWith('\'') && value.EndsWith('\''))
)
)
{
value = value[1..^1];
}
return new { key, value };
})
.Where(x => x != null)
.ToDictionary(x => x.key, x => x.value, StringComparer.OrdinalIgnoreCase);

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +129
catch (Exception ex)
{
this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path);
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic catch clause.

Suggested change
catch (Exception ex)
{
this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path);
}
catch (IOException ex)
{
this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path);
}
catch (UnauthorizedAccessException ex)
{
this.Logger.LogTrace(ex, "Failed to read os-release file at {Path}", path);
}
catch (FormatException ex)
{
this.Logger.LogTrace(ex, "Failed to parse os-release file at {Path}", path);
}
catch (InvalidOperationException ex)
{
this.Logger.LogTrace(ex, "Failed to parse os-release file at {Path}", path);
}

Copilot uses AI. Check for mistakes.
Comment on lines +112 to +115
catch (Exception ex)
{
this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB, skipping package");
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic catch clause.

Suggested change
catch (Exception ex)
{
this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB, skipping package");
}
catch (FormatException ex)
{
this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB due to format error, skipping package");
}
catch (InvalidDataException ex)
{
this.Logger.LogWarning(ex, "Failed to parse RPM header BLOB due to invalid data, skipping package");
}

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +145
catch (Exception ex)
{
this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile);
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic catch clause.

Suggested change
catch (Exception ex)
{
this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile);
}
catch (IOException ex)
{
this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile);
}
catch (UnauthorizedAccessException ex)
{
this.Logger.LogTrace(ex, "Failed to delete temporary file {TempFile}", tempFile);
}

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +172
catch (Exception)
{
// Skip malformed entries
continue;
}
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generic catch clause.

Suggested change
catch (Exception)
{
// Skip malformed entries
continue;
}
catch (FormatException)
{
// Skip malformed entries
continue;
}
catch (ArgumentException)
{
// Skip malformed entries
continue;
}
catch (InvalidOperationException)
{
// Skip malformed entries
continue;
}
catch (Exception)
{
// Unexpected exception - rethrow to avoid hiding bugs
throw;
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants