Skip to content

chore: Add visibility and error handling for BH cron jobs#38438

Open
KevLehman wants to merge 2 commits intodevelopfrom
chore/agenda-logs-error
Open

chore: Add visibility and error handling for BH cron jobs#38438
KevLehman wants to merge 2 commits intodevelopfrom
chore/agenda-logs-error

Conversation

@KevLehman
Copy link
Member

@KevLehman KevLehman commented Jan 30, 2026

Proposed changes (including videos or screenshots)

Issue(s)

https://rocketchat.atlassian.net/browse/SUP-876

Steps to test or reproduce

Further comments

Summary by CodeRabbit

  • Bug Fixes

    • Business hours workflows now handle and log errors so individual failures don’t stop other operations.
    • Scheduler/job database interactions now capture, report, and propagate errors to prevent silent failures.
  • Chores

    • Added broader runtime logging and observability across scheduling lifecycle for easier troubleshooting.

✏️ Tip: You can customize this high-level summary in your review settings.

@changeset-bot
Copy link

changeset-bot bot commented Jan 30, 2026

⚠️ No Changeset found

Latest commit: 56e1c65

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@dionisio-bot
Copy link
Contributor

dionisio-bot bot commented Jan 30, 2026

Looks like this PR is not ready to merge, because of the following issues:

  • This PR is missing the 'stat: QA assured' label
  • This PR is targeting the wrong base branch. It should target 8.2.0, but it targets 8.1.0

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Walkthrough

Added try/catch error handling and logging across livechat business-hour flows and Agenda DB operations; introduced enhanced observability and logger integration in the cron scheduler. Public APIs and method signatures remain unchanged.

Changes

Cohort / File(s) Summary
Livechat Business-Hour Core
apps/meteor/app/livechat/server/business-hour/BusinessHourManager.ts, apps/meteor/app/livechat/server/business-hour/Helper.ts, apps/meteor/app/livechat/server/business-hour/Single.ts, apps/meteor/ee/app/livechat-enterprise/server/business-hour/Multiple.ts
Wrapped business-hour opening/start flows in try/catch blocks, added local logging on exceptions, changed some call sites to await the default opener, and added per-business-hour error handling to avoid aborting the startup loop. No public signatures changed.
Agenda DB Error Handling
packages/agenda/src/Agenda.ts
Surrounded MongoDB/Agenda operations (find, insert, update, delete, lock/unlock) with try/catch, emit error:database events on failures, log debug/error, and rethrow; added safe defaults for destructured DB results.
Cron Logging & Dependency
packages/cron/package.json, packages/cron/src/index.ts
Added @rocket.chat/logger dependency and a module Logger; added lifecycle and action logging (start, complete, success, fail, database error, scheduling/removal) with job metadata and timing. No public API changes.

Sequence Diagram(s)

sequenceDiagram
  participant Cron as Cron Scheduler
  participant Agenda as Agenda Service
  participant DB as MongoDB
  participant Logger as Logger (module)

  rect rgba(200,230,201,0.5)
  Cron->>Agenda: schedule/add job
  Agenda->>DB: insert job document
  DB-->>Agenda: ack/insert result
  Agenda-->>Logger: debug "job scheduled"
  end

  rect rgba(187,222,251,0.5)
  Cron->>Agenda: run job
  Agenda->>DB: fetch & lock job
  DB-->>Agenda: job document
  Agenda->>Logger: log "start"
  Agenda->>DB: update job status/result
  DB-->>Agenda: update ack
  Agenda->>Logger: log "success" or "fail"
  Agenda->>Logger: emit "error:database" on DB error
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

stat: ready to merge, stat: QA assured

Suggested reviewers

  • sampaiodiego
  • d-gubert
  • ricardogarim

Poem

🐰
With nimble paws I hop and see,
I catch the faults that try to flee,
I log each tumble, one, two, three,
So systems wake more safely, wee! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore: Add visibility and error handling for BH cron jobs' accurately reflects the main changes: adding error handling and logging to business hour cron job processing across multiple files.
Linked Issues check ✅ Passed The PR addresses the linked issue SUP-876 by implementing error handling and logging for business hour cron jobs to improve reliability and diagnoseability of BH activation failures.
Out of Scope Changes check ✅ Passed All changes are scoped to error handling and logging for business hour cron jobs, directly addressing the investigation objective of SUP-876 with no unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chore/agenda-logs-error

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dougfabris dougfabris changed the title visibility & error handling for bh cron jobs chore: Add visibility and error handling for BH cron jobs Jan 30, 2026
@codecov
Copy link

codecov bot commented Jan 30, 2026

Codecov Report

❌ Patch coverage is 0% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.42%. Comparing base (7a10990) to head (56e1c65).
⚠️ Report is 5 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #38438      +/-   ##
===========================================
- Coverage    70.85%   70.42%   -0.43%     
===========================================
  Files         3161     3161              
  Lines       109785   110159     +374     
  Branches     19688    19896     +208     
===========================================
- Hits         77783    77576     -207     
- Misses       29973    30549     +576     
- Partials      2029     2034       +5     
Flag Coverage Δ
e2e 60.33% <ø> (-0.04%) ⬇️
e2e-api 47.79% <ø> (+0.01%) ⬆️
unit 71.45% <0.00%> (-0.60%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 30, 2026

📦 Docker Image Size Report

➡️ Changes

Service Current Baseline Change Percent
sum of all images 0B 0B 0B
account-service 0B 0B 0B
authorization-service 0B 0B 0B
ddp-streamer-service 0B 0B 0B
omnichannel-transcript-service 0B 0B 0B
presence-service 0B 0B 0B
queue-worker-service 0B 0B 0B
rocketchat 0B 0B 0B

📊 Historical Trend

---
config:
  theme: "dark"
  xyChart:
    width: 900
    height: 400
---
xychart
  title "Image Size Evolution by Service (Last 30 Days + This PR)"
  x-axis ["11/18 22:53", "11/19 23:02", "11/21 16:49", "11/24 17:34", "11/27 22:32", "11/28 19:05", "12/01 23:01", "12/02 21:57", "12/03 21:00", "12/04 18:17", "12/05 21:56", "12/08 20:15", "12/09 22:17", "12/10 23:26", "12/11 21:56", "12/12 22:45", "12/13 01:34", "12/15 22:31", "12/16 22:18", "12/17 21:04", "12/18 23:12", "12/19 23:27", "12/20 21:03", "12/22 18:54", "12/23 16:16", "12/24 19:38", "12/25 17:51", "12/26 13:18", "12/29 19:01", "12/30 20:52", "01/30 19:20 (PR)"]
  y-axis "Size (GB)" 0 --> 0.5
  line "account-service" [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.00]
  line "authorization-service" [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.00]
  line "ddp-streamer-service" [0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.00]
  line "omnichannel-transcript-service" [0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.00]
  line "presence-service" [0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.00]
  line "queue-worker-service" [0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.13, 0.00]
  line "rocketchat" [0.35, 0.35, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.34, 0.00]
Loading

Statistics (last 30 days):

  • 📊 Average: 1.5GiB
  • ⬇️ Minimum: 1.4GiB
  • ⬆️ Maximum: 1.6GiB
  • 🎯 Current PR: 0B
ℹ️ About this report

This report compares Docker image sizes from this build against the develop baseline.

  • Tag: pr-38438
  • Baseline: develop
  • Timestamp: 2026-01-30 19:20:31 UTC
  • Historical data points: 30

Updated: Fri, 30 Jan 2026 19:20:31 GMT

@KevLehman KevLehman marked this pull request as ready for review January 30, 2026 18:04
@KevLehman KevLehman requested a review from a team as a code owner January 30, 2026 18:04
@KevLehman KevLehman added this to the 8.2.0 milestone Jan 30, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 8 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/agenda/src/Agenda.ts">

<violation number="1" location="packages/agenda/src/Agenda.ts:188">
P1: In `_lockOnTheFly`, if the database operation fails, the job (which was already popped from `_jobsToLock`) is permanently lost because it is not returned to the queue. Additionally, the function recursively calls itself immediately after the error, which will rapidly drain the remaining queue and trigger a storm of database errors if the issue is persistent.

Recommended fix: Return the job to the queue and exit the function to prevent job loss and avoid a tight error loop.</violation>

<violation number="2" location="packages/agenda/src/Agenda.ts:188">
P1: The `database()` method catches connection errors and emits an event but fails to rethrow the error. This causes `Agenda.start()` to hang indefinitely because it awaits `this._ready`, which only resolves when the `'ready'` event is emitted (inside `dbInit`). Since `dbInit` is skipped on connection failure, `'ready'` is never emitted.</violation>
</file>

<file name="apps/meteor/app/livechat/server/business-hour/BusinessHourManager.ts">

<violation number="1" location="apps/meteor/app/livechat/server/business-hour/BusinessHourManager.ts:229">
P2: Swallowing the error prevents the Cron system from recording the failure in `CronHistory` and triggers a false 'success' event in Agenda. The `AgendaCronJobs` wrapper relies on the job function throwing an error to correctly mark the job as failed and log it via the Cron logger. Rethrow the error to ensure observability.</violation>

<violation number="2" location="apps/meteor/app/livechat/server/business-hour/BusinessHourManager.ts:237">
P2: Swallowing the error prevents the Cron system from recording the failure in `CronHistory` and triggers a false 'success' event in Agenda. The `AgendaCronJobs` wrapper relies on the job function throwing an error to correctly mark the job as failed and log it via the Cron logger. Rethrow the error to ensure observability.</violation>
</file>

<file name="apps/meteor/app/livechat/server/business-hour/Single.ts">

<violation number="1" location="apps/meteor/app/livechat/server/business-hour/Single.ts:12">
P2: Redundant error handling. The `openBusinessHourDefault` function (in `Helper.ts`) already wraps its logic in a try-catch block and suppresses errors. Consequently, this `catch` block is unreachable, and the `try` wrapper is unnecessary.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants