Skip to content

Conversation

@itsrafsanjani
Copy link
Member

@itsrafsanjani itsrafsanjani commented Sep 18, 2025

  • Added a script update-bad-bot-blocker.sh to automate the fetching and updating of the bad bot lists from the nginx-ultimate-bad-bot-blocker repository.
  • The script includes error handling, logging, and validation of the Nginx configuration after updates.

Summary by CodeRabbit

  • New Features
    • Introduces an automated updater for Nginx bad-bot and bad-referrer blocklists with retries and clear, colorized logs.
    • Adds generated configuration snippets to flag malicious user agents and referrers and return 403 responses when matched.
    • Performs post-update Nginx configuration validation and reports item counts, results, and next steps for reloading.
    • Provides safe execution with cleanup on exit and sensible handling when referrer list updates are unavailable.

@coderabbitai
Copy link

coderabbitai bot commented Sep 18, 2025

Walkthrough

Adds a Bash script to fetch bad bot/referrer lists and generate Nginx map/blocker snippets, then validates Nginx configuration. Introduces new Nginx config fragments to map and block bad user agents and referrers using $bad_user_agent and $bad_referer variables, intended for inclusion in http and server blocks respectively.

Changes

Cohort / File(s) Summary
Updater Script
scripts/update-bad-bot-blocker.sh
New Bash script to download bad-user-agents and bad-referrers lists with retries, generate four Nginx config fragments (maps and server-level blockers), and optionally validate via nginx -t. Includes logging utilities and a main orchestration flow.
Nginx Bad-Bot Configs
nginx/common/block-bad-bots-list.conf, nginx/common/block-bad-bots-blocker.conf, nginx/common/block-bad-referrers-blocker.conf
Adds Nginx snippets: a map setting $bad_user_agent to 3 for matching UAs; server-block snippets that return 403 when $bad_user_agent or $bad_referer are set. Intended for inclusion in http (map) and server (blockers) contexts.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Dev as Developer
    participant Script as update-bad-bot-blocker.sh
    participant Git as nginx-ultimate-bad-bot-blocker (GitHub)
    participant FS as Filesystem (nginx/common)
    participant Nginx as nginx

    Dev->>Script: Run script
    Script->>Git: Download bad-user-agents list (retries)
    Git-->>Script: UA list (or error)
    Script->>Git: Download bad-referrers list (retries)
    Git-->>Script: Referrers list (or warning on failure)

    rect rgba(230,245,255,0.6)
    note right of Script: Generate config fragments
    Script->>FS: Write block-bad-bots-...*.conf (map + blocker)
    Script->>FS: Write block-bad-referrers-...*.conf (map + blocker)
    end

    alt nginx present
      Script->>Nginx: nginx -t
      Nginx-->>Script: Validation result
    else nginx absent
      Script-->>Dev: Warning: nginx not found
    end

    Script-->>Dev: Paths written + next steps (reload guidance)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A bunny taps a shell with cheer,
Pulls lists from clouds, keeps mischief clear.
Maps in place, the bots get barred,
Referrers rude are quickly scarred.
Nginx nods—tests all green—
Hippity-hop, our configs clean! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: bad bot blocker configuration and script" accurately and concisely describes the main change set—adding Nginx bad-bot blocking configuration and a script to fetch/update the lists—as reflected in the PR summary and file diffs, so it clearly communicates the primary intent to reviewers.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/bad-bot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (8)
scripts/update-bad-bot-blocker.sh (6)

7-7: Harden shell options.

Use stricter flags for safer failure modes.

Apply:

-set -e  # Exit on any error
+set -Eeuo pipefail  # Stricter: exit on errors, unset vars, and pipeline failures

12-13: Ensure output directory exists.

Creates target dir before writing files to avoid partial failures on fresh clones or different CI runners.

 NGINX_COMMON_DIR="$PROJECT_ROOT/nginx/common"
 
+# Ensure target directory exists
+mkdir -p "$NGINX_COMMON_DIR"
+

44-55: Follow redirects and simplify retry with curl’s built-ins.

GitHub endpoints can redirect; also prefer curl’s native retry knobs.

-    while [[ $retry_count -lt $max_retries ]]; do
-        if curl -s --fail "$url" -o "$output_file"; then
+    while [[ $retry_count -lt $max_retries ]]; do
+        if curl -fsSL --retry 3 --retry-delay 2 --retry-connrefused \
+            "$url" -o "$output_file"; then
             log_info "Successfully downloaded: $url"
             return 0
         else
             retry_count=$((retry_count + 1))
             log_warn "Download failed (attempt $retry_count/$max_retries): $url"
             if [[ $retry_count -lt $max_retries ]]; then
                 sleep 2
             fi
         fi
     done

199-200: Fix trap quoting (ShellCheck SC2064).

Avoid early expansion; quote variables at trap execution time.

-    trap "rm -f $temp_user_agents $temp_referrers" EXIT
+    trap 'rm -f "$temp_user_agents" "$temp_referrers"' EXIT

176-186: Don’t hide nginx -t output; surface diagnostics.

You’re discarding stderr, which is where nginx writes validation messages.

 validate_nginx_config() {
     if command -v nginx >/dev/null 2>&1; then
         log_info "Validating Nginx configuration..."
-        if nginx -t 2>/dev/null; then
-            log_info "Nginx configuration is valid"
-        else
-            log_warn "Nginx configuration validation failed. Please check manually."
-        fi
+        if output=$(nginx -t 2>&1); then
+            log_info "Nginx configuration is valid"
+            echo "$output"
+        else
+            log_warn "Nginx configuration validation failed:"
+            echo "$output"
+            return 1
+        fi
     else
         log_warn "Nginx not found in PATH, skipping configuration validation"
     fi
 }

228-239: Consider printing absolute paths and next steps only on success.

Minor polish: only emit reload guidance if validation passed.

nginx/common/block-bad-bots-http.conf (1)

9-690: Potential overblocking and performance sanity.

A few patterns (e.g., Firefox/7.0, Joomla) can hit legitimate legacy clients or admin tools. Consider a small allowlist map (e.g., uptime checks, SE bots) evaluated before this map.

Would you like a minimal allowlist snippet you can drop into http {}?

nginx/common/block-bad-bots.conf (1)

6-9: Style nit: avoid quoting the source in map.

map $http_user_agent $bad_user_agent is the canonical form per nginx docs. Quoted works but is unconventional.

-map "$http_user_agent" $bad_user_agent {
+map $http_user_agent $bad_user_agent {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f4dcb9 and 76423b4.

📒 Files selected for processing (5)
  • nginx/common/block-bad-bots-http.conf (1 hunks)
  • nginx/common/block-bad-bots-server.conf (1 hunks)
  • nginx/common/block-bad-bots.conf (1 hunks)
  • nginx/common/block-bad-referrers-server.conf (1 hunks)
  • scripts/update-bad-bot-blocker.sh (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
scripts/update-bad-bot-blocker.sh

[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)


[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🔇 Additional comments (3)
scripts/update-bad-bot-blocker.sh (1)

213-217: Accurate counts: exclude comments and blank lines.

Current grep counts blank lines as entries.
[ suggest_nitpick_refactor ]

-    user_agents_count=$(grep -c '^[^#]' "$temp_user_agents" || echo "0")
-    referrers_count=$(grep -c '^[^#]' "$temp_referrers" || echo "0")
+    user_agents_count=$(grep -Evc '^\s*(#|$)' "$temp_user_agents" || echo "0")
+    referrers_count=$(grep -Evc '^\s*(#|$)' "$temp_referrers" || echo "0")
nginx/common/block-bad-bots-http.conf (1)

6-9: Confirm no duplicate $bad_user_agent definition

Repo search for includes of block-bad-bots.* returned no matches; manually verify no consolidated block-bad-bots.conf (or any other file) defines $bad_user_agent to avoid variable redefinition.

nginx/common/block-bad-bots-server.conf (1)

6-9: Simple, effective server guard — verify include order

Works as intended if $bad_user_agent is defined via an http-level map. I couldn't find any include referencing block-bad-bots-(http|server).conf in the repo (ripgrep returned no matches); confirm the http-level map that defines $bad_user_agent is included before nginx/common/block-bad-bots-server.conf (check nginx.conf / http-level includes).

- Introduced a new configuration file `block-bad-bots.conf` to block known bad user agents and referrers in Nginx.
- Added a script `update-bad-bot-blocker.sh` to automate the fetching and updating of the bad bot lists from the nginx-ultimate-bad-bot-blocker repository.
- The script includes error handling, logging, and validation of the Nginx configuration after updates.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
nginx/common/block-bad-bots-list.conf (1)

7-690: Avoid double-defining $bad_user_agent across “consolidated” vs “split” configs.

Make sure only one map defines $bad_user_agent at runtime. If nginx/common/block-bad-bots.conf (consolidated) exists or is included elsewhere, don’t also include this list file to prevent redefinition conflicts.

scripts/update-bad-bot-blocker.sh (3)

219-224: Prevent config duplication with any consolidated blocker.

If a consolidated nginx/common/block-bad-bots.conf exists/is included, skip generating or including split files to avoid redefining $bad_user_agent.


79-87: Escape quotes and backslashes in UA patterns to avoid broken maps.

Unescaped " or \ in the source list will corrupt the generated config.

-    while IFS= read -r user_agent || [[ -n "$user_agent" ]]; do
+    while IFS= read -r user_agent || [[ -n "$user_agent" ]]; do
         # Skip empty lines and comments
         [[ -z "$user_agent" ]] && continue
         [[ "$user_agent" =~ ^[[:space:]]*# ]] && continue
 
-        # Add to configuration with proper formatting (using printf for correct tabs)
-        printf '\t"~*%s"\t\t%s;\n' "(?:\b)${user_agent}(?:\b)" "3" >> "$output_file"
+        # Escape for safe embedding inside quoted Nginx regex token
+        ua_escaped=${user_agent//\\/\\\\}
+        ua_escaped=${ua_escaped//\"/\\\"}
+        printf '\t"~*(?:\\b)%s(?:\\b)"\t\t%s;\n' "$ua_escaped" "3" >> "$output_file"
     done < "$temp_user_agents"

136-144: Apply identical escaping for referrer patterns.

Same breakage risk exists for referrers.

-    while IFS= read -r referrer || [[ -n "$referrer" ]]; do
+    while IFS= read -r referrer || [[ -n "$referrer" ]]; do
         # Skip empty lines and comments
         [[ -z "$referrer" ]] && continue
         [[ "$referrer" =~ ^[[:space:]]*# ]] && continue
 
-        # Add to configuration with proper formatting (using printf for correct tabs)
-        printf '\t"~*%s"\t\t%s;\n' "(?:\b)${referrer}(?:\b)" "1" >> "$output_file"
+        # Escape quotes and backslashes
+        ref_escaped=${referrer//\\/\\\\}
+        ref_escaped=${ref_escaped//\"/\\\"}
+        printf '\t"~*(?:\\b)%s(?:\\b)"\t\t%s;\n' "$ref_escaped" "1" >> "$output_file"
     done < "$temp_referrers"
🧹 Nitpick comments (5)
nginx/common/block-bad-referrers-blocker.conf (1)

6-9: Ensure the map is loaded before this guard.

Confirm nginx/common/block-bad-referrers-list.conf is included in http {} before this server-level check; otherwise $bad_referer will be unset and the guard won’t trigger. If you want to silently drop instead of 403, consider 444 at your discretion.

scripts/update-bad-bot-blocker.sh (4)

7-7: Harden shell options.

Use errexit + nounset + pipefail (and preserve ERR trap propagation) for safer failure modes.

Apply:

-set -e  # Exit on any error
+set -Eeuo pipefail  # Safer defaults: exit-on-error, undefined vars, and pipeline failures

9-13: Create target directory before writing outputs.

Prevent failures on fresh clones/paths.

 NGINX_COMMON_DIR="$PROJECT_ROOT/nginx/common"
 
+# Ensure output directory exists
+mkdir -p "$NGINX_COMMON_DIR"

175-183: Don’t swallow nginx -t diagnostics.

Expose validation errors to aid debugging.

-        if nginx -t 2>/dev/null; then
-            log_info "Nginx configuration is valid"
-        else
-            log_warn "Nginx configuration validation failed. Please check manually."
-        fi
+        if out=$(nginx -t 2>&1); then
+            log_info "Nginx configuration is valid"
+        else
+            log_warn "Nginx configuration validation failed:"
+            echo "$out" >&2
+        fi

199-200: Quote trap command to avoid premature expansion (SC2064).

Use single quotes and quote vars inside.

-    trap "rm -f $temp_user_agents $temp_referrers" EXIT
+    trap 'rm -f "$temp_user_agents" "$temp_referrers"' EXIT
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76423b4 and 70196fa.

📒 Files selected for processing (4)
  • nginx/common/block-bad-bots-blocker.conf (1 hunks)
  • nginx/common/block-bad-bots-list.conf (1 hunks)
  • nginx/common/block-bad-referrers-blocker.conf (1 hunks)
  • scripts/update-bad-bot-blocker.sh (1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
scripts/update-bad-bot-blocker.sh

[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)


[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.

(SC2064)

🔇 Additional comments (2)
nginx/common/block-bad-bots-blocker.conf (1)

6-9: Guard is fine; verify inclusion order.

Ensure nginx/common/block-bad-bots-list.conf is loaded in http {} before this server-level blocker so $bad_user_agent is defined.

scripts/update-bad-bot-blocker.sh (1)

212-218: Count only non-empty, non-comment entries.

Current regex counts blank lines; tweak for accuracy.
[raise_nitpick_refactor]

-    user_agents_count=$(grep -c '^[^#]' "$temp_user_agents" || echo "0")
-    referrers_count=$(grep -c '^[^#]' "$temp_referrers" || echo "0")
+    user_agents_count=$(grep -Ec '^[[:space:]]*[^#[:space:]]' "$temp_user_agents" || echo "0")
+    referrers_count=$(grep -Ec '^[[:space:]]*[^#[:space:]]' "$temp_referrers" || echo "0")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants