|
| 1 | +# GitHub Copilot Instructions for RecursiveExtractor |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +RecursiveExtractor is a cross-platform .NET library and CLI tool for parsing archive files and disk images, including nested archives. It provides a unified interface to extract arbitrary archives using libraries like SharpCompress and DiscUtils. |
| 6 | + |
| 7 | +## Tech Stack |
| 8 | + |
| 9 | +- **Language**: C# 10.0 |
| 10 | +- **Target Frameworks**: .NET Standard 2.0, .NET Standard 2.1, .NET 8.0, .NET 9.0, .NET 10.0 |
| 11 | +- **Testing Framework**: xUnit (based on project structure) |
| 12 | +- **Key Dependencies**: SharpCompress, LTRData.DiscUtils, NLog, Glob |
| 13 | + |
| 14 | +## Building and Testing |
| 15 | + |
| 16 | +### Build Commands |
| 17 | +```bash |
| 18 | +# Build the entire solution |
| 19 | +dotnet build RecursiveExtractor.sln |
| 20 | + |
| 21 | +# Build a specific project |
| 22 | +dotnet build RecursiveExtractor/RecursiveExtractor.csproj |
| 23 | +``` |
| 24 | + |
| 25 | +### Test Commands |
| 26 | +```bash |
| 27 | +# Run all tests |
| 28 | +dotnet test RecursiveExtractor.sln |
| 29 | + |
| 30 | +# Run tests for a specific project |
| 31 | +dotnet test RecursiveExtractor.Tests/RecursiveExtractor.Tests.csproj |
| 32 | +dotnet test RecursiveExtractor.Cli.Tests/RecursiveExtractor.Cli.Tests.csproj |
| 33 | +``` |
| 34 | + |
| 35 | +### Restore Packages |
| 36 | +```bash |
| 37 | +dotnet restore RecursiveExtractor.sln |
| 38 | +``` |
| 39 | + |
| 40 | +## NuGet Configuration |
| 41 | + |
| 42 | +⚠️ **Important**: The repository uses a private NuGet feed configured in `nuget.config`: |
| 43 | +- The `nuget.config` file points to a private Azure DevOps feed: `https://pkgs.dev.azure.com/microsoft-sdl/General/_packaging/PublicRegistriesFeed/nuget/v3/index.json` |
| 44 | +- **When working as an agent, you may need to temporarily modify `nuget.config` to use public NuGet feeds** (e.g., `https://api.nuget.org/v3/index.json`) to restore packages successfully |
| 45 | +- **ALWAYS restore the `nuget.config` to its original configuration before completing your work** |
| 46 | +- The original configuration must be preserved to maintain consistency with the team's workflow |
| 47 | + |
| 48 | +Example of temporarily switching to public feed: |
| 49 | +```xml |
| 50 | +<?xml version="1.0" encoding="utf-8"?> |
| 51 | +<configuration> |
| 52 | + <packageSources> |
| 53 | + <clear /> |
| 54 | + <add key="nuget.org" value="https://api.nuget.org/v3/index.json" /> |
| 55 | + </packageSources> |
| 56 | +</configuration> |
| 57 | +``` |
| 58 | + |
| 59 | +## Code Style Guidelines |
| 60 | + |
| 61 | +### Follow .editorconfig Settings |
| 62 | +- Use 4 spaces for indentation (no tabs) |
| 63 | +- CRLF line endings |
| 64 | +- Open braces on new lines |
| 65 | +- Use `var` for local variables when type is apparent |
| 66 | +- Follow PascalCase for types, methods, and properties |
| 67 | +- Interfaces should begin with 'I' |
| 68 | +- Do not use `this.` qualifier unless necessary |
| 69 | + |
| 70 | +### Naming Conventions |
| 71 | +- **Interfaces**: Start with 'I' (e.g., `ICustomAsyncExtractor`) |
| 72 | +- **Classes**: PascalCase (e.g., `FileEntry`, `Extractor`) |
| 73 | +- **Methods**: PascalCase (e.g., `Extract`, `ExtractAsync`) |
| 74 | +- **Properties**: PascalCase (e.g., `FullPath`, `Content`) |
| 75 | +- **Parameters**: camelCase (e.g., `fileEntry`, `options`) |
| 76 | + |
| 77 | +### C# Best Practices |
| 78 | +- Enable nullable reference types (project uses `<Nullable>Enable</Nullable>`) |
| 79 | +- Prefer pattern matching over `as` with null checks |
| 80 | +- Use expression-bodied members for simple properties and accessors |
| 81 | +- Prefer `null` propagation (`?.`) when appropriate |
| 82 | +- Use async/await for I/O operations |
| 83 | +- Implement both synchronous and asynchronous versions of extraction methods |
| 84 | + |
| 85 | +## Testing Practices |
| 86 | + |
| 87 | +### Test Organization |
| 88 | +- Unit tests go in `RecursiveExtractor.Tests` project |
| 89 | +- CLI tests go in `RecursiveExtractor.Cli.Tests` project |
| 90 | +- Use xUnit as the testing framework |
| 91 | +- Test files should mirror the structure of source files |
| 92 | + |
| 93 | +### Test Naming |
| 94 | +- Use descriptive test names that explain what is being tested |
| 95 | +- Follow pattern: `MethodName_StateUnderTest_ExpectedBehavior` |
| 96 | + |
| 97 | +### Test Data |
| 98 | +- Test archives and files should be placed in appropriate test data directories |
| 99 | +- Include edge cases: nested archives, encrypted files, malformed content, zip bombs |
| 100 | + |
| 101 | +## Security Considerations |
| 102 | + |
| 103 | +- The library includes protections against ZipSlip, Quines, and Zip Bombs |
| 104 | +- Always validate file paths to prevent directory traversal attacks |
| 105 | +- Handle malformed archives gracefully without crashes |
| 106 | +- Implement proper resource cleanup (dispose streams, file handles) |
| 107 | + |
| 108 | +## Documentation |
| 109 | + |
| 110 | +- Add XML documentation comments for public APIs |
| 111 | +- Keep README.md updated with new features or changes |
| 112 | +- Document breaking changes clearly |
| 113 | +- Include code examples for new public APIs |
| 114 | + |
| 115 | +## Project Structure |
| 116 | + |
| 117 | +``` |
| 118 | +RecursiveExtractor/ # Main library project |
| 119 | +RecursiveExtractor.Tests/ # Unit tests for library |
| 120 | +RecursiveExtractor.Cli/ # Command-line interface project |
| 121 | +RecursiveExtractor.Cli.Tests/ # Tests for CLI |
| 122 | +``` |
| 123 | + |
| 124 | +## Common Patterns |
| 125 | + |
| 126 | +### Extraction Pattern |
| 127 | +- Use `Extractor` class as the main entry point |
| 128 | +- Support both `Extract()` (sync) and `ExtractAsync()` (async) methods |
| 129 | +- Return `IEnumerable<FileEntry>` or `IAsyncEnumerable<FileEntry>` |
| 130 | +- Each `FileEntry` contains a Stream of content that should be disposed properly |
| 131 | + |
| 132 | +### Custom Extractors |
| 133 | +- Implement `ICustomAsyncExtractor` for new archive formats |
| 134 | +- Include `CanExtract()` method to detect file format via magic bytes |
| 135 | +- Preserve stream position in `CanExtract()` |
| 136 | +- Support both sync and async extraction |
| 137 | + |
| 138 | +### Error Handling |
| 139 | +- Throw `OverflowException` for detected quines or zip bombs |
| 140 | +- Throw `TimeoutException` when timing limits are exceeded |
| 141 | +- Log errors and skip invalid files during extraction |
| 142 | +- Use `ExtractSelfOnFail` option to return original archive on failure |
| 143 | + |
| 144 | +## Important Notes |
| 145 | + |
| 146 | +- Multi-targeting means code must be compatible with .NET Standard 2.0 |
| 147 | +- Some features (like WIM support) are Windows-only |
| 148 | +- The library automatically detects archive types |
| 149 | +- Streams in FileEntry objects should be disposed by consumers |
| 150 | +- Avoid multiple enumeration of extraction results |
| 151 | +- For parallel processing, use batching mechanism as documented in README |
0 commit comments