Skip to content

feat(fsck): implement fsck command#371

Open
KishibeKafka wants to merge 66 commits into
web3infra-foundation:mainfrom
KishibeKafka:main
Open

feat(fsck): implement fsck command#371
KishibeKafka wants to merge 66 commits into
web3infra-foundation:mainfrom
KishibeKafka:main

Conversation

@KishibeKafka
Copy link
Copy Markdown

Summary

Implements fsck command for Libra, analogous to git fsck. Verifies repository integrity by checking objects, refs, and index.

Features

  • Object integrity: Verifies SHA1/SHA256 hash and format (blob, tree, commit, tag)
  • Ref validation: Checks all refs point to valid objects
  • Index validation: Verifies index file structure
  • Reachability analysis: Detects dangling/unreachable objects via BFS
  • Options: --verbose, --root, --tags, --dangling, --unreachable, --lost-found, --connectivity-only

KishibeKafka and others added 30 commits April 24, 2026 19:41
Implement `libra fsck` command for verifying repository integrity:
- Object hash verification using Git-style SHA1 (header + content)
- Ref consistency validation against SQLite backend
- Index file integrity check
- Cross-reference validation for trees and commits
- Human-readable and JSON output formats
- Exit codes: 0 (pass), 1 (corrupted), 2 (broken refs), 3 (index corruption)
- Flags: --verbose, --no-cross-ref-check, --no-index-check, --objects-only, --fix

Includes 37 unit tests covering:
- Object verification (blob, tree, commit)
- Missing object detection
- Hash mismatch detection
- Cross-reference validation
- JSON output structure
- CLI argument parsing
Add fsck subcommand to CLI enum and dispatch handler.
Add public getter method for base_path field to enable
filesystem traversal in fsck command.
Add comprehensive documentation for `libra fsck` command:
- Synopsis and description of integrity checks
- All options with examples (--verbose, --no-cross-ref-check, etc.)
- Human output examples (clean, verbose, with errors)
- JSON structured output schema
- Design rationale (Git-style hash computation, optional cross-ref validation)
- Parameter comparison with Git
- Exit code behavior and error handling table
- Usage examples for routine checks and CI/CD integration

Also update docs/commands/README.md to include fsck in the command index.
- Add .cargo/ and other build artifacts to .gitignore
- Update CLAUDE.md project structure comments
- Replace the stub check_index() with a complete implementation
- Validates the binary index file (DIRC header, entry modes, stages, cross-references to object storage).
- Differentiate exit codes: 1 for object corruption, 2 for broken refs, 3 for index corruption.
…pted index

- Delete refs pointing to missing or invalid objects from SQLite
- Rebuild corrupted index from HEAD commit's tree structure
- Track broken ref names in RefCheckResult for targeted deletion
…sh verification

fsck.rs previously hardcoded SHA1_FOR_LEGACY_USE_ONLY in verify_object(), causing all objects in SHA256 repos to be incorrectly flagged as corrupted.
Now reads the algorithm from get_hash_kind() (set at init/dispatch time via core.objectformat) and uses the correct ring digest accordingly.
…bleErrorCode

Migrate fsck from hardcoded std::process::exit(1/2/3) to a bitmask-based
exit code scheme that encodes all failure categories in a single value:
- bit 0 (1): object corruption
- bit 1 (2): broken refs
- bit 2 (4): index corruption
Add 10 integration tests covering fsck behavior under SHA-256 object format:
empty repo, committed repo, verbose/json output, corruption detection, missing object detection, single object verification, indexed files, invalid hash handling.
Ensures the dynamic SHA1/SHA256 selection works end-to-end through the CLI entrypoint.
- streamline issue report generation
- enhance object check handling
- remove dead testcase.
- Refactor check flow into 6 stages: directory, objects, HEAD, reflogs, index, connectivity
- Default mode only shows progress; detailed output moved to --verbose flag
- Add check_head function to detect unborn branch, avoid false notice
- Simplify FsckArgs to only keep object/no_reflogs/verbose options
- Output format matches git fsck (e.g., "Checking object directory: 100% (256/256), done.")
Refactor fsck to output the four standard git fsck diagnostic types:
- missing <type> <object>: Object referenced but doesn't exist
- hash mismatch <type> <object>: Object hash doesn't match content
- dangling <type> <object>: Object exists but not referenced
- unreachable <type> <object>: Object completely isolated

Key changes:
- Add object_type field to IssueReport for git-style output
- Implement reachability analysis using BFS from refs/reflogs/index
- Distinguish dangling (in reflog/index) vs unreachable (isolated) objects
- Add print_diagnostic_messages() for git fsck-style output format
- Dangling/unreachable objects don't cause non-zero exit code (matches git)
- Remove test module temporarily due to major API changes

Output format now matches git fsck:
  dangling blob 8ae04589c000f892abfbe27a7803ac4433846be8
  missing tree abc123...
  hash mismatch commit def456...
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (2)

tests/command/fsck_test.rs:304

  • This assertion is a tautology — !x || x is always true for any boolean — so the test never actually verifies anything about the exit status. Either drop the assertion entirely (if the only goal is "must not crash"), or replace it with a concrete expectation, e.g. that the exit code is one of a specific allowed set, similar to test_fsck_with_empty_object_id.
    assert!(
        !output.status.success() || output.status.success(),
        "fsck should handle zero hash without crashing"
    );

tests/command/fsck_test.rs:512

  • Same tautological assertion as in test_fsck_with_zero_hash: !output.status.success() || output.status.success() is always true, so this test does not exercise any behavior beyond "the process didn't hang". Replace with a concrete expectation (e.g. an allowed set of exit codes, or assert on stderr containing a specific diagnostic).
    assert!(
        !output.status.success() || output.status.success(),
        "fsck should handle invalid reflog reference"
    );

Comment thread CHANGELOG.md
Comment on lines +7 to +15
- **`libra fsck`**: Repository integrity checker analogous to `git fsck`. Verifies
object hash integrity (SHA1/SHA256), object format validity, ref consistency,
index integrity, and cross-reference validation (including object type mismatch
detection for tree entries). Supports `--verbose`, `--json`, `--objects-only`,
`--no-cross-ref-check`, `--no-index-check`, and `--fix` (auto-repair broken refs
and rebuild corrupted index). Exit codes use a bitmask scheme:
bit 0 = object corruption, bit 1 = broken refs, bit 2 = index corruption.
- **`docs/commands/fsck.md`**: Comprehensive documentation for the `fsck` command
including parameter comparison with Git, design rationale, and CI/CD examples.
Comment thread src/command/fsck.rs
Comment on lines +322 to +350
pub async fn execute(args: FsckArgs) {
let storage = ClientStorage::init(path::objects());

let result = if let Some(ref object_id) = args.object {
check_single_object(object_id, &storage).await
} else {
check_all_objects(&args, &storage).await
};

match result {
Ok(fsck_result) => {
// Exit with failure code only for serious issues (not dangling/unreachable)
if fsck_result.has_errors {
std::process::exit(1);
}
}
Err(e) => {
eprintln!("fatal: {}", e);
std::process::exit(1);
}
}
}

pub async fn execute_safe(args: FsckArgs, _output: &OutputConfig) -> CliResult<()> {
// execute_safe is called from cli.rs but currently just delegates to execute
// JSON output is not supported for fsck
execute(args).await;
Ok(())
}
Comment thread src/command/fsck.rs
Comment on lines +770 to +775
fn print_notices(head_is_unborn: bool, _result: &FsckResult) {
if head_is_unborn {
eprintln!("notice: HEAD points to an unborn branch (main)");
eprintln!("notice: No default references");
}
}
Comment thread src/command/fsck.rs
Comment on lines +1169 to +1173
let lost_found_dir = storage
.base_path()
.parent()
.expect("storage should have parent")
.join("lost-found");
Comment on lines +190 to +198
// fsck with empty argument should not crash
let output = run_libra_command(&["fsck", ""], repo.path());
// Should return non-zero exit code for invalid input, but not crash
assert!(
output.status.code() == Some(1)
|| output.status.code() == Some(128)
|| output.status.code() == Some(0),
"fsck with empty arg should return valid exit code, got: {:?}",
output.status.code()
Comment thread src/command/fsck.rs
Comment on lines +1119 to +1158
/// Find and report tagged commits
/// Output format matches git fsck --tags:
/// - For annotated tags: "tagged commit <commit-hash> (<tag-name>) in <tag-object-hash>"
async fn find_and_report_tags() -> CliResult<()> {
use sea_orm::EntityTrait;

use crate::internal::model::reference;

let db_conn = db::get_db_conn_instance().await;

// Load all refs that are tags (refs/tags/*)
let refs = reference::Entity::find()
.all(&db_conn)
.await
.map_err(|e| CliError::fatal(format!("failed to load refs: {}", e)))?;

for ref_entry in refs {
let ref_name = match &ref_entry.name {
Some(name) => name,
None => continue,
};

// Only process tag refs (refs/tags/*)
if !ref_name.starts_with("refs/tags/") {
continue;
}

let tag_name = ref_name.strip_prefix("refs/tags/").unwrap();
let commit_hash = match &ref_entry.commit {
Some(hash) => hash,
None => continue,
};

// Check if this is an annotated tag (tag object exists)
// For now, just report the tagged commit
println!("tagged commit {} ({})", commit_hash, tag_name);
}

Ok(())
}
Comment thread src/command/fsck.rs
Comment on lines +1375 to +1377
// Add header: "<type> <size>\0"
let header = format!("{} {}\0", obj_type.to_string().to_lowercase(), size);
ctx.update(header.as_bytes());
Comment thread src/command/fsck.rs
Comment on lines +687 to +713
async fn build_object_name_map() -> std::collections::HashMap<String, String> {
let mut name_map: std::collections::HashMap<String, String> = std::collections::HashMap::new();
let db_conn = db::get_db_conn_instance().await;

// Collect names from refs (e.g., "refs/heads/master")
let refs = reference::Entity::find()
.all(&db_conn)
.await
.unwrap_or_default();

for ref_entry in refs {
if let Some(commit_hash) = &ref_entry.commit {
let ref_name = ref_entry.name.clone().unwrap_or_default();
// Store ref name (may have multiple names per object)
name_map
.entry(commit_hash.clone())
.and_modify(|e| *e = format!("{}, {}", e, ref_name))
.or_insert(ref_name);
}
}

// Collect names from reflogs (e.g., "HEAD@{1234567890}")
let reflogs = reflog::Entity::find()
.all(&db_conn)
.await
.unwrap_or_default();

Fixes test_add_force_dot_includes_ignored_directory test failure.

Changes:
- src/command/status.rs: Add list_workdir_files_split_force() and
  changes_to_be_staged_split_force() functions that recurse into
  ignored directories to discover files within them
- src/command/add.rs: Use force variant when --force flag is set,
  merging ignored changes into visible changes for staging
- src/command/add.rs: Skip directories in stage_a_file() to avoid
  attempting to read them as files
- tests/command/publish_test.rs: Update test to reflect that
  publish sync is now implemented (no longer expects LBR-UNSUPPORTED-001)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e3ceb01369

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +502 to +505
let refs_dir = repo.path().join(".libra").join("refs").join("heads");
fs::create_dir_all(&refs_dir).unwrap();
let broken_ref = refs_dir.join("broken");
fs::write(&broken_ref, "invalid-hash-not-exist").unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Corrupt reflog data instead of creating a ref file in test

This test claims to validate broken reflog handling, but it mutates .libra/refs/heads/* files instead of the reflog storage that fsck actually reads (reflog::Entity in src/command/fsck.rs). As written, regressions in reflog validation can ship undetected because the setup does not exercise the intended code path; please seed an invalid row in the reflog table (or helper API) so the scenario is real.

Useful? React with 👍 / 👎.

Comment on lines +509 to +510
assert!(
!output.status.success() || output.status.success(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Assert a concrete failure condition in reflog test

The assertion !output.status.success() || output.status.success() is a tautology, so this test cannot fail regardless of fsck behavior. That removes protection for a high-risk integrity path and can let silent regressions in reflog error handling pass CI; assert a specific expected outcome (for example, non-zero exit code and/or a corruption diagnostic) instead.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings May 15, 2026 03:15
Signed-off-by: Haj1M1 <1614778789@qq.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.

Comment thread CHANGELOG.md
Comment on lines +7 to +15
- **`libra fsck`**: Repository integrity checker analogous to `git fsck`. Verifies
object hash integrity (SHA1/SHA256), object format validity, ref consistency,
index integrity, and cross-reference validation (including object type mismatch
detection for tree entries). Supports `--verbose`, `--json`, `--objects-only`,
`--no-cross-ref-check`, `--no-index-check`, and `--fix` (auto-repair broken refs
and rebuild corrupted index). Exit codes use a bitmask scheme:
bit 0 = object corruption, bit 1 = broken refs, bit 2 = index corruption.
- **`docs/commands/fsck.md`**: Comprehensive documentation for the `fsck` command
including parameter comparison with Git, design rationale, and CI/CD examples.
Comment thread src/command/fsck.rs
Comment on lines +322 to +348
pub async fn execute(args: FsckArgs) {
let storage = ClientStorage::init(path::objects());

let result = if let Some(ref object_id) = args.object {
check_single_object(object_id, &storage).await
} else {
check_all_objects(&args, &storage).await
};

match result {
Ok(fsck_result) => {
// Exit with failure code only for serious issues (not dangling/unreachable)
if fsck_result.has_errors {
std::process::exit(1);
}
}
Err(e) => {
eprintln!("fatal: {}", e);
std::process::exit(1);
}
}
}

pub async fn execute_safe(args: FsckArgs, _output: &OutputConfig) -> CliResult<()> {
// execute_safe is called from cli.rs but currently just delegates to execute
// JSON output is not supported for fsck
execute(args).await;
Comment thread src/command/fsck.rs
Comment on lines +1169 to +1173
let lost_found_dir = storage
.base_path()
.parent()
.expect("storage should have parent")
.join("lost-found");
Comment thread src/command/fsck.rs
Comment on lines +770 to +775
fn print_notices(head_is_unborn: bool, _result: &FsckResult) {
if head_is_unborn {
eprintln!("notice: HEAD points to an unborn branch (main)");
eprintln!("notice: No default references");
}
}
Comment on lines +300 to +303
// Zero hash should be invalid or not found, but should not crash
assert!(
!output.status.success() || output.status.success(),
"fsck should handle zero hash without crashing"
Comment on lines +474 to +491
if object_path.exists() {
fs::remove_file(&object_path).unwrap();
let output = run_libra_command(&["fsck"], repo.path());
assert!(
!output.status.success(),
"fsck should fail on missing object"
);
let combined = format!(
"{}{}",
String::from_utf8_lossy(&output.stdout),
String::from_utf8_lossy(&output.stderr)
);
assert!(
combined.contains("missing") || combined.contains("not found"),
"should report missing object, got: {}",
combined
);
}
Comment thread src/command/fsck.rs
Comment on lines +566 to +574
// Print directory progress
if verbose {
// --verbose: match git fsck output
println!("Checking object directory");
} else {
// default: show progress
println!("Checking object directories: 100% (256/256), done.");
}

Comment thread src/command/fsck.rs
Comment on lines +1367 to +1380
// Verify hash integrity using ring crate.
// Git/Libra computes hash as: SHAx(type + ' ' + size + '\0' + content)
// The algorithm is determined by the repo's core.objectformat config.
let mut ctx = Context::new(match get_hash_kind() {
HashKind::Sha256 => &SHA256,
_ => &SHA1_FOR_LEGACY_USE_ONLY,
});

// Add header: "<type> <size>\0"
let header = format!("{} {}\0", obj_type.to_string().to_lowercase(), size);
ctx.update(header.as_bytes());
ctx.update(&data);
let computed_hash = ctx.finish();
let computed_bytes = computed_hash.as_ref();
Comment thread src/command/fsck.rs
object::{ObjectTrait, blob::Blob, commit::Commit, tree::Tree, types::ObjectType},
},
};
use hex;
Comment thread src/command/fsck.rs
Comment on lines +1601 to +1611
result.broken += 1;
let ref_name = ref_entry.name.clone().unwrap_or_default();
result.broken_ref_names.push(ref_name.clone());
report(FsckMsgId::Missing, "commit", commit_hash_str);
}
} else {
result.broken += 1;
let ref_name = ref_entry.name.clone().unwrap_or_default();
result.broken_ref_names.push(ref_name.clone());
// Invalid hash format - report as bad ref content
eprintln!("bad ref content: {}: invalid hash format", ref_name);
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5f79024b0a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cli.rs
Comment on lines +280 to +283
#[command(about = "Update remote refs along with associated objects")]
Push(command::push::PushArgs),
#[command(about = "Download objects and refs from another repository")]
Fetch(command::fetch::FetchArgs),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Deduplicate subcommand variants in Commands enum

This commit adds Push/Fetch/Pull (and also Diff/Grep/Blame) variants that are already defined elsewhere in the same Commands enum, which makes the enum definition invalid and prevents the CLI from compiling at all. Keeping only one definition per variant is required for the command dispatcher to build.

Useful? React with 👍 / 👎.

let output = run_libra_command(&["fsck", zero_hash], repo.path());
// Zero hash should be invalid or not found, but should not crash
assert!(
!output.status.success() || output.status.success(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace tautological assertion in zero-hash fsck test

The assertion !output.status.success() || output.status.success() is always true, so this test can never fail and does not validate fsck behavior for the zero-hash edge case. That leaves this changed behavior unprotected against regressions; assert a concrete expected outcome (for example, specific exit code and/or error text) instead.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings May 16, 2026 04:19
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

tests/command/fsck_test.rs:512

  • This assertion !output.status.success() || output.status.success() is always true and verifies nothing. The test name suggests it should validate that fsck handles an invalid ref hash without crashing, but as written it would also pass on a crash. Consider asserting that the process did not terminate abnormally (e.g. output.status.code().is_some()) and/or that an expected message is produced.
    assert!(
        !output.status.success() || output.status.success(),
        "fsck should handle invalid reflog reference"
    );

Comment thread CHANGELOG.md
Comment on lines +7 to +15
- **`libra fsck`**: Repository integrity checker analogous to `git fsck`. Verifies
object hash integrity (SHA1/SHA256), object format validity, ref consistency,
index integrity, and cross-reference validation (including object type mismatch
detection for tree entries). Supports `--verbose`, `--json`, `--objects-only`,
`--no-cross-ref-check`, `--no-index-check`, and `--fix` (auto-repair broken refs
and rebuild corrupted index). Exit codes use a bitmask scheme:
bit 0 = object corruption, bit 1 = broken refs, bit 2 = index corruption.
- **`docs/commands/fsck.md`**: Comprehensive documentation for the `fsck` command
including parameter comparison with Git, design rationale, and CI/CD examples.
Comment thread src/command/fsck.rs
Comment on lines +317 to +344
pub async fn execute(args: FsckArgs) {
let storage = ClientStorage::init(path::objects());

let result = if let Some(ref object_id) = args.object {
check_single_object(object_id, &storage).await
} else {
check_all_objects(&args, &storage).await
};

match result {
Ok(fsck_result) => {
// Exit with failure code only for serious issues (not dangling/unreachable)
if fsck_result.has_errors {
std::process::exit(1);
}
}
Err(e) => {
eprintln!("fatal: {}", e);
std::process::exit(1);
}
}
}

pub async fn execute_safe(args: FsckArgs, _output: &OutputConfig) -> CliResult<()> {
// execute_safe is called from cli.rs but currently just delegates to execute
// JSON output is not supported for fsck
execute(args).await;
Ok(())
Comment thread src/command/fsck.rs
Comment on lines +1164 to +1168
let lost_found_dir = storage
.base_path()
.parent()
.expect("storage should have parent")
.join("lost-found");
Comment on lines +300 to +303
// Zero hash should be invalid or not found, but should not crash
assert!(
!output.status.success() || output.status.success(),
"fsck should handle zero hash without crashing"
Comment on lines +474 to +491
if object_path.exists() {
fs::remove_file(&object_path).unwrap();
let output = run_libra_command(&["fsck"], repo.path());
assert!(
!output.status.success(),
"fsck should fail on missing object"
);
let combined = format!(
"{}{}",
String::from_utf8_lossy(&output.stdout),
String::from_utf8_lossy(&output.stderr)
);
assert!(
combined.contains("missing") || combined.contains("not found"),
"should report missing object, got: {}",
combined
);
}
Comment thread src/command/fsck.rs
Comment on lines +765 to +770
fn print_notices(head_is_unborn: bool, _result: &FsckResult) {
if head_is_unborn {
eprintln!("notice: HEAD points to an unborn branch (main)");
eprintln!("notice: No default references");
}
}
Comment thread src/command/fsck.rs
Comment on lines +490 to +499
// Stage 7: Check connectivity (re-verify all objects in storage order)
check_connectivity(
&all_hashes,
storage,
&mut result,
args.verbose,
args.name_objects,
args.connectivity_only,
)
.await?;
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc881ddee1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/command/fsck.rs

let overall_status = match check_result.status {
CheckStatus::Ok => {
println!("Object {} is valid", object_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid reporting object as valid after emitted fsck errors

check_single_object prints Object <id> is valid whenever check_result.status == Ok, but verify_object can still emit hard-error diagnostics (for example missing author/committer fields) while returning Ok status. In that case fsck exits non-zero yet still prints a success message, which is misleading for operators and machine parsers consuming stdout. Gate the success message on !has_errors (or propagate a non-Ok status when semantic checks fail).

Useful? React with 👍 / 👎.

Comment thread docs/commands/fsck.md
Comment on lines +159 to +163
Diagnostic messages are printed to stdout and do NOT cause non-zero exit codes:

```text
missing <type> <object-id>
hash mismatch <type> <object-id>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align fsck docs with actual failure semantics for missing objects

The docs classify missing/hash mismatch as diagnostics that do not cause non-zero exit codes, but the implementation treats them as failure-causing conditions (FsckMsgId::causes_failure), so CI or scripts following this section will mis-handle command outcomes. Update this section to match the shipped behavior so users can rely on exit-code contracts.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b3bbfeacb6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/command/add.rs
Comment on lines +978 to +979
if file_abs.is_dir() {
return Ok(StagedAction::Unchanged);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid treating symlinked paths as unstaged directories

The new file_abs.is_dir() guard returns Unchanged before status evaluation, but Path::is_dir() follows symlinks. On Unix, a symlink to a directory is a valid index entry (120000) and should be staged as a symlink; with this change, libra add silently skips it. This creates a real correctness regression for repositories that track directory symlinks (or convert a path into such a symlink), because the change cannot be staged at all.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76febfd0fb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/command/fsck.rs
Comment on lines +1575 to +1576
if let Some(commit_hash_str) = &ref_entry.commit {
if let Some(hash) = parse_object_hash(commit_hash_str) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat refs with missing target OIDs as corruption

check_refs only validates rows where ref_entry.commit is Some(...); rows with None are silently skipped, so a corrupted branch/tag reference with a missing target OID is counted as checked but never marked broken. In a repo where that is the only corruption, fsck can still report success, which is a false negative for integrity checks. Add an else path that reports bad ref content (at least for non-HEAD refs) and increments the broken count.

Useful? React with 👍 / 👎.

Comment on lines +660 to +663
let tags_dir = repo.path().join(".libra").join("refs").join("tags");
fs::create_dir_all(&tags_dir).unwrap();
let broken_tag = tags_dir.join("broken-tag");
fs::write(&broken_tag, "nonexistent-commit-hash").unwrap();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Seed broken tag refs in DB for fsck --tags coverage

This test writes .libra/refs/tags/broken-tag, but fsck --tags loads tags from the database (reference::Entity) rather than loose ref files, so the fixture does not exercise the code path it claims to validate. As a result, regressions in tag-ref validation/reporting can pass CI undetected. Create the broken tag in the reference table (or through a helper that populates it) so the test actually covers find_and_report_tags behavior.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@KishibeKafka
Copy link
Copy Markdown
Author

why the checks still pending?

@genedna
Copy link
Copy Markdown
Member

genedna commented May 17, 2026

why the checks still pending?

It's GitHub Action OOM problem, we are working on it.

- Add list_workdir_files_split_force() to recurse into ignored dirs
- Add changes_to_be_staged_split_force() for --force status checks
- Skip directories in stage_a_file() to avoid opening dirs as files
- Fix cli.rs duplicate command definitions (Diff, Grep, Blame, Fetch, Pull, Push)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2af4ce53f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/command/fsck.rs
Comment on lines +666 to +667
match Head::current_result().await {
Ok(Head::Branch(name)) => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle missing repository DB without panicking

check_head() calls Head::current_result() unconditionally, but that API relies on get_db_conn_instance() which panics when .libra/libra.db is missing or unreadable. In those cases (e.g., running libra fsck outside a repo or with a broken DB), fsck crashes the CLI thread and exits via panic instead of returning a user-facing CliError, so callers get a generic failure rather than actionable diagnostics. Please switch this path to a fallible DB access flow and propagate a structured fatal error.

Useful? React with 👍 / 👎.

Comment thread src/command/fsck.rs
Comment on lines +390 to +392
if sub_path.is_file()
&& let Some(hash) = try_parse_loose_object(dir_name, &sub_path)
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report malformed loose-object filenames as corruption

The object scan only records entries when try_parse_loose_object(...) returns Some, so any non-hex or wrong-length filename under .libra/objects/<2-char>/ is silently ignored. That means fsck can return success while corrupted loose-object directory entries are present, because those files never reach object/ref/connectivity validation. This should emit a corruption diagnostic and mark the run as failed instead of skipping invalid filenames.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings May 17, 2026 12:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

tests/command/fsck_test.rs:512

  • Same tautological assertion pattern as in test_fsck_with_zero_hash: !output.status.success() || output.status.success() is always true regardless of fsck's behavior, so this test only verifies the process exits at all. Please pin the expected exit code/output explicitly.
    assert!(
        !output.status.success() || output.status.success(),
        "fsck should handle invalid reflog reference"
    );

Comment thread CHANGELOG.md
Comment on lines +7 to +15
- **`libra fsck`**: Repository integrity checker analogous to `git fsck`. Verifies
object hash integrity (SHA1/SHA256), object format validity, ref consistency,
index integrity, and cross-reference validation (including object type mismatch
detection for tree entries). Supports `--verbose`, `--json`, `--objects-only`,
`--no-cross-ref-check`, `--no-index-check`, and `--fix` (auto-repair broken refs
and rebuild corrupted index). Exit codes use a bitmask scheme:
bit 0 = object corruption, bit 1 = broken refs, bit 2 = index corruption.
- **`docs/commands/fsck.md`**: Comprehensive documentation for the `fsck` command
including parameter comparison with Git, design rationale, and CI/CD examples.
Comment thread src/command/fsck.rs
Comment on lines +1164 to +1168
let lost_found_dir = storage
.base_path()
.parent()
.expect("storage should have parent")
.join("lost-found");
Comment thread src/command/fsck.rs
Comment on lines +765 to +770
fn print_notices(head_is_unborn: bool, _result: &FsckResult) {
if head_is_unborn {
eprintln!("notice: HEAD points to an unborn branch (main)");
eprintln!("notice: No default references");
}
}
Comment thread src/command/fsck.rs
Comment on lines +488 to +510
check_index(storage, &mut result, args.verbose)?;

// Stage 7: Check connectivity (re-verify all objects in storage order)
check_connectivity(
&all_hashes,
storage,
&mut result,
args.verbose,
args.name_objects,
args.connectivity_only,
)
.await?;

// Stage 8: Find dangling and unreachable objects
find_dangling_unreachable(
storage,
&mut result,
args.unreachable,
args.no_reflogs,
args.dangling_enabled(),
args.lost_found,
)
.await?;
Comment on lines +300 to +303
// Zero hash should be invalid or not found, but should not crash
assert!(
!output.status.success() || output.status.success(),
"fsck should handle zero hash without crashing"
Comment on lines +666 to +678
// Should report error or handle gracefully
let combined = format!(
"{}{}",
String::from_utf8_lossy(&output.stdout),
String::from_utf8_lossy(&output.stderr)
);
// fsck may report "unknown" for invalid commit hashes in tags
assert!(
combined.contains("unknown")
|| combined.contains("error")
|| combined.contains("not found")
|| output.status.success(),
"should handle broken tag reference, got: {}",
Comment thread src/command/fsck.rs
Comment on lines +317 to +345
pub async fn execute(args: FsckArgs) {
let storage = ClientStorage::init(path::objects());

let result = if let Some(ref object_id) = args.object {
check_single_object(object_id, &storage).await
} else {
check_all_objects(&args, &storage).await
};

match result {
Ok(fsck_result) => {
// Exit with failure code only for serious issues (not dangling/unreachable)
if fsck_result.has_errors {
std::process::exit(1);
}
}
Err(e) => {
eprintln!("fatal: {}", e);
std::process::exit(1);
}
}
}

pub async fn execute_safe(args: FsckArgs, _output: &OutputConfig) -> CliResult<()> {
// execute_safe is called from cli.rs but currently just delegates to execute
// JSON output is not supported for fsck
execute(args).await;
Ok(())
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants