Skip to content

Commit 42a66d7

Browse files
initial commit
1 parent 45dd257 commit 42a66d7

File tree

4 files changed

+163
-0
lines changed

4 files changed

+163
-0
lines changed

.vscode/settings.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"cSpell.words": [
3+
"Rutkowski"
4+
]
5+
}

README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
2+
# Duplicate Guard
3+
4+
**Duplicate Guard** is a lightweight GitHub Action designed to prevent duplicate files from being added or modified in a repository. This helps reduce repository bloat, minimize downloadable app sizes, and optimize asset management. Duplicate files can significantly increase the size of compressed artifacts (such as ZIP files) because they are not optimized against themselves during compression. This action ensures your repository remains clean and efficient by detecting and blocking redundant files.
5+
6+
---
7+
8+
## 🚀 Features
9+
- Detects and blocks unintentionally duplicated files in pull requests.
10+
- Helps reduce downloadable app sizes by eliminating redundant assets.
11+
- Supports .gitignore-like syntax to exclude specific files or directories.
12+
13+
---
14+
15+
## 🛠️ Usage
16+
17+
### 1. **Create an Ignore File**
18+
Add a `duplicate_guard.ignore` file to the root of your repository to define patterns for files or directories to exclude from duplicate checks. The syntax follows `.gitignore` conventions.
19+
20+
**Example `duplicate_guard.ignore`:**
21+
```gitignore
22+
test/*
23+
logs/*
24+
*.log
25+
```
26+
27+
---
28+
29+
### 2. **Add the GitHub Action**
30+
Create a GitHub Actions workflow in `.github/workflows/duplicate_guard.yml`:
31+
32+
```yaml
33+
name: Duplicate Guard
34+
on:
35+
pull_request:
36+
branches:
37+
- master
38+
workflow_dispatch:
39+
40+
jobs:
41+
filesize_guard:
42+
runs-on: ubuntu-latest
43+
steps:
44+
- name: Duplicate Guard
45+
uses: chris-rutkowski/duplicate-guard@v1.0.0
46+
```
47+
48+
---
49+
50+
## ⚙️ Configuration
51+
52+
### **Specify a Custom Ignore File Path**
53+
If your `duplicate_guard.ignore` file is not in the root directory, specify its location using the `ignore_file` input:
54+
55+
```yaml
56+
steps:
57+
- name: Duplicate Guard
58+
uses: chris-rutkowski/filesize-guard@v1.0.0
59+
with:
60+
ignore_file: ./my/path/my_filesize_guard.ignore
61+
```
62+
63+
---
64+
65+
## 📄 License
66+
This project is licensed under the [MIT License](LICENSE).

action.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
name: "Duplicate Guard"
2+
description: "Blocks pull requests with unintentionally duplicated files"
3+
author: "Chris Rutkowski"
4+
inputs:
5+
ignore_file:
6+
description: "Path to the ignore file"
7+
required: true
8+
default: "./duplicate_guard.ignore"
9+
10+
runs:
11+
using: "composite"
12+
steps:
13+
- name: Checkout repository
14+
uses: actions/checkout@v4
15+
16+
- name: Get changed files
17+
id: changed-files
18+
uses: tj-actions/changed-files@v45
19+
with:
20+
separator: ","
21+
22+
- name: Run Duplicate Guard
23+
run: |
24+
files="${{ steps.changed-files.outputs.added_files }},${{ steps.changed-files.outputs.modified_files }}"
25+
python3 ${GITHUB_ACTION_PATH}/duplicate_guard.py ${{ inputs.ignore_file }} "$files"
26+
shell: bash

duplicate_guard.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import fnmatch
2+
import hashlib
3+
import os
4+
import sys
5+
6+
def load_ignore_patterns(ignore_file):
7+
with open(ignore_file, "r") as f:
8+
return [line.strip() for line in f if line.strip() and not line.startswith("#")]
9+
10+
def should_ignore(file, patterns):
11+
return any(fnmatch.fnmatch(file, pattern) for pattern in patterns)
12+
13+
def calculate_checksum(file_path):
14+
sha256_hash = hashlib.sha256()
15+
with open(file_path, "rb") as f:
16+
for byte_block in iter(lambda: f.read(4096), b""):
17+
sha256_hash.update(byte_block)
18+
return sha256_hash.hexdigest()
19+
20+
def get_all_repository_files(ignore_patterns):
21+
repo_files = []
22+
for root, _, files in os.walk("."):
23+
for file in files:
24+
file_path = os.path.join(root, file)
25+
relative_path = os.path.relpath(file_path, ".")
26+
if not should_ignore(relative_path, ignore_patterns):
27+
repo_files.append(relative_path)
28+
return repo_files
29+
30+
ignore_file = sys.argv[1]
31+
files = sys.argv[2].split(",")
32+
ignore_patterns = load_ignore_patterns(ignore_file)
33+
34+
# Step 1: Build a checksum map for all existing repository files
35+
print("Calculating checksums for all repository files...")
36+
checksums = {}
37+
for file in get_all_repository_files(ignore_patterns):
38+
checksum = calculate_checksum(file)
39+
checksums[checksum] = file
40+
print(f"Done, {len(checksums)} checksums")
41+
42+
# Step 2: Check new/modified files against the repository and themselves
43+
exit_code = 0
44+
45+
for file in files:
46+
if not file or not os.path.isfile(file):
47+
continue
48+
49+
if should_ignore(file, ignore_patterns):
50+
print(f"Ignoring: '{file}'")
51+
continue
52+
53+
print(f"Processing: '{file}'")
54+
55+
checksum = calculate_checksum(file)
56+
57+
if checksum in checksums:
58+
if checksums[checksum] == file:
59+
continue
60+
61+
print(f"Error: '{file}' is a duplicate of '{checksums[checksum]}'")
62+
exit_code = 1
63+
else:
64+
checksums[checksum] = file
65+
66+
sys.exit(exit_code)

0 commit comments

Comments
 (0)