-
Couldn't load subscription status.
- Fork 4
feat: bad bot blocker configuration and script #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a Bash script to fetch bad bot/referrer lists and generate Nginx map/blocker snippets, then validates Nginx configuration. Introduces new Nginx config fragments to map and block bad user agents and referrers using $bad_user_agent and $bad_referer variables, intended for inclusion in http and server blocks respectively. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Dev as Developer
participant Script as update-bad-bot-blocker.sh
participant Git as nginx-ultimate-bad-bot-blocker (GitHub)
participant FS as Filesystem (nginx/common)
participant Nginx as nginx
Dev->>Script: Run script
Script->>Git: Download bad-user-agents list (retries)
Git-->>Script: UA list (or error)
Script->>Git: Download bad-referrers list (retries)
Git-->>Script: Referrers list (or warning on failure)
rect rgba(230,245,255,0.6)
note right of Script: Generate config fragments
Script->>FS: Write block-bad-bots-...*.conf (map + blocker)
Script->>FS: Write block-bad-referrers-...*.conf (map + blocker)
end
alt nginx present
Script->>Nginx: nginx -t
Nginx-->>Script: Validation result
else nginx absent
Script-->>Dev: Warning: nginx not found
end
Script-->>Dev: Paths written + next steps (reload guidance)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (8)
scripts/update-bad-bot-blocker.sh (6)
7-7: Harden shell options.Use stricter flags for safer failure modes.
Apply:
-set -e # Exit on any error +set -Eeuo pipefail # Stricter: exit on errors, unset vars, and pipeline failures
12-13: Ensure output directory exists.Creates target dir before writing files to avoid partial failures on fresh clones or different CI runners.
NGINX_COMMON_DIR="$PROJECT_ROOT/nginx/common" +# Ensure target directory exists +mkdir -p "$NGINX_COMMON_DIR" +
44-55: Follow redirects and simplify retry with curl’s built-ins.GitHub endpoints can redirect; also prefer curl’s native retry knobs.
- while [[ $retry_count -lt $max_retries ]]; do - if curl -s --fail "$url" -o "$output_file"; then + while [[ $retry_count -lt $max_retries ]]; do + if curl -fsSL --retry 3 --retry-delay 2 --retry-connrefused \ + "$url" -o "$output_file"; then log_info "Successfully downloaded: $url" return 0 else retry_count=$((retry_count + 1)) log_warn "Download failed (attempt $retry_count/$max_retries): $url" if [[ $retry_count -lt $max_retries ]]; then sleep 2 fi fi done
199-200: Fix trap quoting (ShellCheck SC2064).Avoid early expansion; quote variables at trap execution time.
- trap "rm -f $temp_user_agents $temp_referrers" EXIT + trap 'rm -f "$temp_user_agents" "$temp_referrers"' EXIT
176-186: Don’t hide nginx -t output; surface diagnostics.You’re discarding stderr, which is where nginx writes validation messages.
validate_nginx_config() { if command -v nginx >/dev/null 2>&1; then log_info "Validating Nginx configuration..." - if nginx -t 2>/dev/null; then - log_info "Nginx configuration is valid" - else - log_warn "Nginx configuration validation failed. Please check manually." - fi + if output=$(nginx -t 2>&1); then + log_info "Nginx configuration is valid" + echo "$output" + else + log_warn "Nginx configuration validation failed:" + echo "$output" + return 1 + fi else log_warn "Nginx not found in PATH, skipping configuration validation" fi }
228-239: Consider printing absolute paths and next steps only on success.Minor polish: only emit reload guidance if validation passed.
nginx/common/block-bad-bots-http.conf (1)
9-690: Potential overblocking and performance sanity.A few patterns (e.g.,
Firefox/7.0,Joomla) can hit legitimate legacy clients or admin tools. Consider a small allowlist map (e.g., uptime checks, SE bots) evaluated before this map.Would you like a minimal allowlist snippet you can drop into
http {}?nginx/common/block-bad-bots.conf (1)
6-9: Style nit: avoid quoting the source in map.
map $http_user_agent $bad_user_agentis the canonical form per nginx docs. Quoted works but is unconventional.-map "$http_user_agent" $bad_user_agent { +map $http_user_agent $bad_user_agent {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
nginx/common/block-bad-bots-http.conf(1 hunks)nginx/common/block-bad-bots-server.conf(1 hunks)nginx/common/block-bad-bots.conf(1 hunks)nginx/common/block-bad-referrers-server.conf(1 hunks)scripts/update-bad-bot-blocker.sh(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
scripts/update-bad-bot-blocker.sh
[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
🔇 Additional comments (3)
scripts/update-bad-bot-blocker.sh (1)
213-217: Accurate counts: exclude comments and blank lines.Current grep counts blank lines as entries.
[ suggest_nitpick_refactor ]- user_agents_count=$(grep -c '^[^#]' "$temp_user_agents" || echo "0") - referrers_count=$(grep -c '^[^#]' "$temp_referrers" || echo "0") + user_agents_count=$(grep -Evc '^\s*(#|$)' "$temp_user_agents" || echo "0") + referrers_count=$(grep -Evc '^\s*(#|$)' "$temp_referrers" || echo "0")nginx/common/block-bad-bots-http.conf (1)
6-9: Confirm no duplicate $bad_user_agent definitionRepo search for includes of block-bad-bots.* returned no matches; manually verify no consolidated block-bad-bots.conf (or any other file) defines $bad_user_agent to avoid variable redefinition.
nginx/common/block-bad-bots-server.conf (1)
6-9: Simple, effective server guard — verify include orderWorks as intended if $bad_user_agent is defined via an http-level map. I couldn't find any include referencing block-bad-bots-(http|server).conf in the repo (ripgrep returned no matches); confirm the http-level map that defines $bad_user_agent is included before nginx/common/block-bad-bots-server.conf (check nginx.conf / http-level includes).
- Introduced a new configuration file `block-bad-bots.conf` to block known bad user agents and referrers in Nginx. - Added a script `update-bad-bot-blocker.sh` to automate the fetching and updating of the bad bot lists from the nginx-ultimate-bad-bot-blocker repository. - The script includes error handling, logging, and validation of the Nginx configuration after updates.
76423b4 to
70196fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
nginx/common/block-bad-bots-list.conf (1)
7-690: Avoid double-defining $bad_user_agent across “consolidated” vs “split” configs.Make sure only one map defines $bad_user_agent at runtime. If nginx/common/block-bad-bots.conf (consolidated) exists or is included elsewhere, don’t also include this list file to prevent redefinition conflicts.
scripts/update-bad-bot-blocker.sh (3)
219-224: Prevent config duplication with any consolidated blocker.If a consolidated nginx/common/block-bad-bots.conf exists/is included, skip generating or including split files to avoid redefining $bad_user_agent.
79-87: Escape quotes and backslashes in UA patterns to avoid broken maps.Unescaped " or \ in the source list will corrupt the generated config.
- while IFS= read -r user_agent || [[ -n "$user_agent" ]]; do + while IFS= read -r user_agent || [[ -n "$user_agent" ]]; do # Skip empty lines and comments [[ -z "$user_agent" ]] && continue [[ "$user_agent" =~ ^[[:space:]]*# ]] && continue - # Add to configuration with proper formatting (using printf for correct tabs) - printf '\t"~*%s"\t\t%s;\n' "(?:\b)${user_agent}(?:\b)" "3" >> "$output_file" + # Escape for safe embedding inside quoted Nginx regex token + ua_escaped=${user_agent//\\/\\\\} + ua_escaped=${ua_escaped//\"/\\\"} + printf '\t"~*(?:\\b)%s(?:\\b)"\t\t%s;\n' "$ua_escaped" "3" >> "$output_file" done < "$temp_user_agents"
136-144: Apply identical escaping for referrer patterns.Same breakage risk exists for referrers.
- while IFS= read -r referrer || [[ -n "$referrer" ]]; do + while IFS= read -r referrer || [[ -n "$referrer" ]]; do # Skip empty lines and comments [[ -z "$referrer" ]] && continue [[ "$referrer" =~ ^[[:space:]]*# ]] && continue - # Add to configuration with proper formatting (using printf for correct tabs) - printf '\t"~*%s"\t\t%s;\n' "(?:\b)${referrer}(?:\b)" "1" >> "$output_file" + # Escape quotes and backslashes + ref_escaped=${referrer//\\/\\\\} + ref_escaped=${ref_escaped//\"/\\\"} + printf '\t"~*(?:\\b)%s(?:\\b)"\t\t%s;\n' "$ref_escaped" "1" >> "$output_file" done < "$temp_referrers"
🧹 Nitpick comments (5)
nginx/common/block-bad-referrers-blocker.conf (1)
6-9: Ensure the map is loaded before this guard.Confirm nginx/common/block-bad-referrers-list.conf is included in http {} before this server-level check; otherwise $bad_referer will be unset and the guard won’t trigger. If you want to silently drop instead of 403, consider 444 at your discretion.
scripts/update-bad-bot-blocker.sh (4)
7-7: Harden shell options.Use errexit + nounset + pipefail (and preserve ERR trap propagation) for safer failure modes.
Apply:
-set -e # Exit on any error +set -Eeuo pipefail # Safer defaults: exit-on-error, undefined vars, and pipeline failures
9-13: Create target directory before writing outputs.Prevent failures on fresh clones/paths.
NGINX_COMMON_DIR="$PROJECT_ROOT/nginx/common" +# Ensure output directory exists +mkdir -p "$NGINX_COMMON_DIR"
175-183: Don’t swallow nginx -t diagnostics.Expose validation errors to aid debugging.
- if nginx -t 2>/dev/null; then - log_info "Nginx configuration is valid" - else - log_warn "Nginx configuration validation failed. Please check manually." - fi + if out=$(nginx -t 2>&1); then + log_info "Nginx configuration is valid" + else + log_warn "Nginx configuration validation failed:" + echo "$out" >&2 + fi
199-200: Quote trap command to avoid premature expansion (SC2064).Use single quotes and quote vars inside.
- trap "rm -f $temp_user_agents $temp_referrers" EXIT + trap 'rm -f "$temp_user_agents" "$temp_referrers"' EXIT
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
nginx/common/block-bad-bots-blocker.conf(1 hunks)nginx/common/block-bad-bots-list.conf(1 hunks)nginx/common/block-bad-referrers-blocker.conf(1 hunks)scripts/update-bad-bot-blocker.sh(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
scripts/update-bad-bot-blocker.sh
[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
[warning] 199-199: Use single quotes, otherwise this expands now rather than when signalled.
(SC2064)
🔇 Additional comments (2)
nginx/common/block-bad-bots-blocker.conf (1)
6-9: Guard is fine; verify inclusion order.Ensure nginx/common/block-bad-bots-list.conf is loaded in http {} before this server-level blocker so $bad_user_agent is defined.
scripts/update-bad-bot-blocker.sh (1)
212-218: Count only non-empty, non-comment entries.Current regex counts blank lines; tweak for accuracy.
[raise_nitpick_refactor]- user_agents_count=$(grep -c '^[^#]' "$temp_user_agents" || echo "0") - referrers_count=$(grep -c '^[^#]' "$temp_referrers" || echo "0") + user_agents_count=$(grep -Ec '^[[:space:]]*[^#[:space:]]' "$temp_user_agents" || echo "0") + referrers_count=$(grep -Ec '^[[:space:]]*[^#[:space:]]' "$temp_referrers" || echo "0")
update-bad-bot-blocker.shto automate the fetching and updating of the bad bot lists from the nginx-ultimate-bad-bot-blocker repository.Summary by CodeRabbit