Skip to content

HARMONY-2244:Excessive alert emails from Kubernetes cluster#845

Merged
blackone-sudo merged 2 commits intomainfrom
HARMONY-2244
Feb 5, 2026
Merged

HARMONY-2244:Excessive alert emails from Kubernetes cluster#845
blackone-sudo merged 2 commits intomainfrom
HARMONY-2244

Conversation

@blackone-sudo
Copy link
Collaborator

@blackone-sudo blackone-sudo commented Feb 3, 2026

Jira Issue ID

HARMONY-2244

Description

Excessive alert emails from Kubernetes cluster (Pods in Pending/Unknown state)

Local Test Steps

  1. Deploy Harmony the normal way
  2. Spin up a pod an ensure the pod stays in a pending state to trigger alerts
  3. Modify the repeat interval to 5hrs
  4. Check email to see when the next alert was received after the first alert(email)
  5. If this check then with the new value we shouldn't get an alert for 7days after the first one

PR Acceptance Checklist

  • [ x] Acceptance criteria met
  • Tests added/updated (if needed) and passing
  • Documentation updated (if needed)
  • Harmony in a Box tested (if changes made to microservices or new dependencies added)

Summary by CodeRabbit

  • Chores

    • Alert repetition interval increased: ongoing alerts will now be resent weekly (instead of every two hours) to reduce notification volume and fatigue.
  • Documentation

    • Updated developer setup guide: clarified platform-specific instructions, updated example Node/Python versions, removed M1/Rosetta notes, added new environment variables and an npm/gyp tip.

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

Warning

Rate limit exceeded

@blackone-sudo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 22 minutes and 15 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Updated alert-manager route repeat_interval from 2h to 168h (7 days). Developer guide docs/guides/develop.md revised: version numbers, platform notes, environment variable examples, and formatting adjusted; no application logic changed.

Changes

Cohort / File(s) Summary
Alert Manager Configuration
config/alert-manager.yaml
Changed route.repeat_interval from 2h to 168h. Minor formatting-only change to a targetPort line in service section.
Developer Guide Docs
docs/guides/develop.md
Multiple documentation edits: Python bumped to 3.13, Node/npm example versions updated, M1/Rosetta notes removed, new env vars added (QUERY_CMR_IMAGE, QUERY_CMR_SERVICE_QUEUE_URLS, SERVICE_RUNNER_IMAGE), platform-specific env blocks restructured, and assorted formatting/heading tweaks.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • indiejames
  • flamingbear
  • ygliuvt

Poem

🐰 I hopped through configs, neat and spry,
I stretched two hours to seven days high,
Tucked docs in order, versions aligned,
Nibbled a carrot, left a note behind,
Hooray — the alerts can now nap and sigh.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change: modifying alert configuration to reduce excessive Kubernetes alert emails by adjusting the repeat interval from 2h to 168h.
Description check ✅ Passed The description follows the template with all required sections: Jira Issue ID, Description, Local Test Steps, and PR Acceptance Checklist. The acceptance criteria is marked as met.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch HARMONY-2244

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
config/alert-manager.yaml (1)

56-56: ⚠️ Potential issue | 🟡 Minor

Update AlertManager image to a newer version.

The image prom/alertmanager:v0.23.0 is approximately 4–5 years old. The current stable release is v0.31.0 (released February 2, 2026), with multiple minor and patch versions released since v0.23.0. Consider updating to the latest version to gain bug fixes, security patches, and performance improvements.

🧹 Nitpick comments (1)
config/alert-manager.yaml (1)

16-16: Significant increase in repeat interval from 2h to 168h (7 days).

This change effectively suppresses duplicate alerts for a week. While this addresses alert fatigue from pods stuck in Pending/Unknown states, consider the trade-off: if an issue persists unresolved, teams won't receive reminder notifications for 7 days.

A middle-ground like 24h or 48h might balance noise reduction while still providing periodic reminders for unresolved issues. However, if the current alert volume is severe and resolution workflows are well-established, 168h may be acceptable.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/guides/develop.md`:
- Around line 19-20: The phrase "command line interface" should be corrected to
use the compound modifier "command-line interface"; update the second bullet
that currently reads "Install
[kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command line
interface to Kubernetes." to "Install
[kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command-line
interface to Kubernetes." so the document follows the compound modifier grammar
rule.
- Line 89: Replace the asterisk-based emphasis around the leading "Tip" to use
underscores to satisfy MD049: locate the "*Tip*:" occurrence in the
docs/guides/develop.md content and change it to "_Tip_:" (leave surrounding
inline code spans like `gyp ERR!`, `v8.4.1`, and `pip install setuptools`
unchanged).
- Line 116: Update the phrase "Mac OS X specific env:" to use a hyphenated
compound modifier, i.e., change the text token "Mac OS X specific env" (found in
the docs/guides/develop.md line containing that phrase) to "Mac OS X-specific
env" so the compound modifier before the noun is correctly hyphenated.
- Line 35: Update the docs line that currently reads "Python version 3.13" to
reflect the supported range by changing it to "Python 3.10+" so the guide
accurately states support for Python 3.10–3.13+ (locate the text "Python version
3.13" in docs/guides/develop.md and replace it with "Python 3.10+").

Comment on lines 19 to 20
* Install [minikube](https://kubernetes.io/docs/tasks/tools/#minikube), a single-node Kubernetes cluster useful for local linux development.
* Install [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command line interface to Kubernetes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use “command-line” as a compound modifier.

This matches the grammar rule and the static lint hint.

✏️ Proposed fix
-* Install [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command line interface to Kubernetes.
+* Install [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command-line interface to Kubernetes.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* Install [minikube](https://kubernetes.io/docs/tasks/tools/#minikube), a single-node Kubernetes cluster useful for local linux development.
* Install [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command line interface to Kubernetes.
* Install [minikube](https://kubernetes.io/docs/tasks/tools/#minikube), a single-node Kubernetes cluster useful for local linux development.
* Install [kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command-line interface to Kubernetes.
🧰 Tools
🪛 LanguageTool

[grammar] ~20-~20: Use a hyphen to join words.
Context: ...io/docs/tasks/tools/#kubectl), a command line interface to Kubernetes. * [Docker ...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In `@docs/guides/develop.md` around lines 19 - 20, The phrase "command line
interface" should be corrected to use the compound modifier "command-line
interface"; update the second bullet that currently reads "Install
[kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command line
interface to Kubernetes." to "Install
[kubectl](https://kubernetes.io/docs/tasks/tools/#kubectl), a command-line
interface to Kubernetes." so the document follows the compound modifier grammar
rule.


Optional:
* [Python](https://www.python.org) version 3.11 - Useful for locally running and testing harmony-docker and other backend services
* [Python](https://www.python.org) version 3.13 - Useful for locally running and testing harmony-docker and other backend services
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check workload/requirements.txt for Python version specifications
echo "=== Checking workload/requirements.txt ===" 
if [ -f "workload/requirements.txt" ]; then
  cat -n workload/requirements.txt | head -20
else
  echo "File not found, searching for requirements files..."
  find . -name "requirements.txt" -type f 2>/dev/null | head -10
fi

Repository: nasa/harmony

Length of output: 353


🏁 Script executed:

#!/bin/bash
# Check docs/guides/develop.md at line 35
echo "=== Checking docs/guides/develop.md ===" 
if [ -f "docs/guides/develop.md" ]; then
  sed -n '30,40p' docs/guides/develop.md | cat -n
else
  echo "File not found, searching for develop.md..."
  find . -name "develop.md" -type f 2>/dev/null
fi

Repository: nasa/harmony

Length of output: 677


🏁 Script executed:

#!/bin/bash
# Search for all Python version references in documentation
echo "=== Python version references in docs ===" 
rg -n "Python.*3\.(10|11|12|13)" docs/ 2>/dev/null || echo "No matches in docs/"
rg -n "python.*3\.(10|11|12|13)" docs/ 2>/dev/null || echo "No matches in docs/"

Repository: nasa/harmony

Length of output: 411


🏁 Script executed:

#!/bin/bash
# Search for Python version constraints in setup.py or pyproject.toml
echo "=== Checking for Python version constraints ===" 
rg -n "python_requires|requires-python" -A2 2>/dev/null | head -20

Repository: nasa/harmony

Length of output: 103


🏁 Script executed:

#!/bin/bash
# Look for setup.py or pyproject.toml files
fd -n "setup.py|pyproject.toml|setup.cfg" -type f

Repository: nasa/harmony

Length of output: 286


🏁 Script executed:

#!/bin/bash
# Look for setup.py or pyproject.toml files (fix syntax)
fd "setup.py|pyproject.toml|setup.cfg" -type f

Repository: nasa/harmony

Length of output: 226


🏁 Script executed:

#!/bin/bash
# Search for python_requires or requires-python in all files
rg "python_requires|requires-python" -A2

Repository: nasa/harmony

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Look for setup.py or pyproject.toml files
find . -name "setup.py" -o -name "pyproject.toml" -o -name "setup.cfg" 2>/dev/null

Repository: nasa/harmony

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Check if there's a .python-version file
find . -name ".python-version" 2>/dev/null

Repository: nasa/harmony

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Check GitHub workflow files or CI config for Python version specifications
find . -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -20 | xargs grep -l "python" 2>/dev/null

Repository: nasa/harmony

Length of output: 82


🏁 Script executed:

#!/bin/bash
# Check the services-prod.yml file for Python version info
cat ./config/services-prod.yml | grep -A5 -B5 -i python

Repository: nasa/harmony

Length of output: 498


🏁 Script executed:

#!/bin/bash
# Search more broadly in workflow and config files for Python versions
find . -path "./.git" -prune -o -type f \( -name "*.yml" -o -name "*.yaml" -o -name "Dockerfile*" \) -print 2>/dev/null | xargs grep -i "python.*3\.(10|11|12|13)" 2>/dev/null | head -20

Repository: nasa/harmony

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Check if there's a tox.ini or similar test config
find . -name "tox.ini" -o -name ".nvmrc" -o -name ".node-version" 2>/dev/null

Repository: nasa/harmony

Length of output: 64


🏁 Script executed:

#!/bin/bash
# Look for Dockerfile references with Python versions
find . -name "Dockerfile*" 2>/dev/null | xargs cat 2>/dev/null | grep -i "python" | head -20

Repository: nasa/harmony

Length of output: 1074


🏁 Script executed:

#!/bin/bash
# Check for docker-compose files
find . -name "docker-compose*" 2>/dev/null

Repository: nasa/harmony

Length of output: 38


🏁 Script executed:

#!/bin/bash
# Let's check the main README for Python version guidance
if [ -f "README.md" ]; then
  grep -n -i "python" README.md | head -10
fi

Repository: nasa/harmony

Length of output: 38


Update Python version guidance to reflect full supported range.

The documentation currently specifies "Python version 3.13", but the project supports Python 3.10–3.13+. Change this to "Python 3.10+" to accurately reflect the supported versions.

✏️ Proposed fix
-* [Python](https://www.python.org) version 3.13 - Useful for locally running and testing harmony-docker and other backend services
+* [Python](https://www.python.org) version 3.10+ - Useful for locally running and testing harmony-docker and other backend services
🤖 Prompt for AI Agents
In `@docs/guides/develop.md` at line 35, Update the docs line that currently reads
"Python version 3.13" to reflect the supported range by changing it to "Python
3.10+" so the guide accurately states support for Python 3.10–3.13+ (locate the
text "Python version 3.13" in docs/guides/develop.md and replace it with "Python
3.10+").

```bash
npm install
```
*Tip*: if you get `gyp ERR!` you might be working with a very old version (v8.4.1) to build sqlite and you will need to pip install `setuptools` into your python environment before continuing.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix emphasis style to satisfy markdownlint (MD049).

Switch to underscores for emphasis.

✏️ Proposed fix
-*Tip*: if you get `gyp ERR!` you might be working with a very old version (v8.4.1) to build sqlite and you will need to pip install `setuptools` into your python environment before continuing.
+_Tip_: if you get `gyp ERR!` you might be working with a very old version (v8.4.1) to build sqlite and you will need to pip install `setuptools` into your python environment before continuing.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
*Tip*: if you get `gyp ERR!` you might be working with a very old version (v8.4.1) to build sqlite and you will need to pip install `setuptools` into your python environment before continuing.
_Tip_: if you get `gyp ERR!` you might be working with a very old version (v8.4.1) to build sqlite and you will need to pip install `setuptools` into your python environment before continuing.
🧰 Tools
🪛 LanguageTool

[style] ~89-~89: As an alternative to the over-used intensifier ‘very’, consider replacing this phrase.
Context: ...et gyp ERR! you might be working with a very old version (v8.4.1) to build sqlite and yo...

(EN_WEAK_ADJECTIVE)

🪛 markdownlint-cli2 (0.20.0)

[warning] 89-89: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)


[warning] 89-89: Emphasis style
Expected: underscore; Actual: asterisk

(MD049, emphasis-style)

🤖 Prompt for AI Agents
In `@docs/guides/develop.md` at line 89, Replace the asterisk-based emphasis
around the leading "Tip" to use underscores to satisfy MD049: locate the
"*Tip*:" occurrence in the docs/guides/develop.md content and change it to
"_Tip_:" (leave surrounding inline code spans like `gyp ERR!`, `v8.4.1`, and
`pip install setuptools` unchanged).

SERVICE_RUNNER_IMAGE=harmonyservices/service-runner:latest
```

Mac OS X specific env:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hyphenate “Mac OS X-specific”.

Compound modifier before a noun should be hyphenated.

✏️ Proposed fix
-Mac OS X specific env:
+Mac OS X-specific env:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Mac OS X specific env:
Mac OS X-specific env:
🧰 Tools
🪛 LanguageTool

[grammar] ~116-~116: Use a hyphen to join words.
Context: ...ices/service-runner:latest Mac OS X specific env: text BACKEND_HOST=host...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
In `@docs/guides/develop.md` at line 116, Update the phrase "Mac OS X specific
env:" to use a hyphenated compound modifier, i.e., change the text token "Mac OS
X specific env" (found in the docs/guides/develop.md line containing that
phrase) to "Mac OS X-specific env" so the compound modifier before the noun is
correctly hyphenated.

@blackone-sudo blackone-sudo merged commit 61aa330 into main Feb 5, 2026
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants