-
Notifications
You must be signed in to change notification settings - Fork 81
docs(package): Add user documents for retention control; Add garbage collector to the multi-node guide. #1181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (4)
docs/src/user-guide/guides-retention.md (4)
24-25
: Use MyST/Sphinx cross-refs instead of raw fragment links to the figureSwitch
[Figure 1](#figure-1)
to a{ref}
link for robustness and to satisfy MD051. You already have the label(figure-1)=
, so this is a drop-in change.-criteria shown in [Figure 1](#figure-1): +criteria shown in {ref}`Figure 1 <figure-1>`:-For archives, $data\_timestamp$ (in the expiry criteria equation from [Figure 1](#figure-1)) is the +For archives, $data\_timestamp$ (in the expiry criteria equation from {ref}`Figure 1 <figure-1>`) is the-For search results, $data\_timestamp$ (in the expiry criteria equation from [Figure 1](#figure-1)) +For search results, $data\_timestamp$ (in the expiry criteria equation from {ref}`Figure 1 <figure-1>`)Also applies to: 85-86, 141-142
117-122
: State units for signed_utc_offset and add a quick example after the formulaThis prevents misconfiguration and aligns with prior reviewer guidance. Per the team’s agreed logic, the signed subtraction is correct: adjusted = retention − signed_offset.
$$adjusted\_retention\_period = retention\_period - signed\_utc\_offset$$ + +where `signed_utc_offset` is the time zone offset in minutes (e.g., UTC+8 = +480, UTC−4 = −240). +For example, if `retention_period = 60` and logs are in UTC−4, then +`adjusted_retention_period = 60 - (−240) = 300` minutes.
186-188
: Trim trailing whitespace (MD009)There’s trailing space at the end of Line 186. Remove it to keep diffs clean and pass markdownlint.
-* If any query job is running, CLP conservatively calculates a **safe expiry timestamp** based on +* If any query job is running, CLP conservatively calculates a **safe expiry timestamp** based on
20-20
: Clarify that current_time is UTCMake UTC explicit where you define
current_time
to reduce confusion later in the document.-| $current\_time$ | The time at which the garbage collector is performing a check. | +| $current\_time$ | The time (UTC) at which the garbage collector is performing a check. |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
docs/src/user-guide/guides-retention.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
PR: y-scope/clp#662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-retention.md
🪛 LanguageTool
docs/src/user-guide/guides-retention.md
[typographical] ~110-~110: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...The equation then becomes
(DASH_RULE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-retention.md
24-24: Link fragments should be valid
(MD051, link-fragments)
85-85: Link fragments should be valid
(MD051, link-fragments)
141-141: Link fragments should be valid
(MD051, link-fragments)
186-186: Trailing spaces
Expected: 0 or 2; Actual: 1
(MD009, no-trailing-spaces)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (macos-15)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
docs/src/user-guide/guides-retention.md (2)
117-121
: Units and example for the signed UTC offset (reduces misconfiguration risk)Spell out units and add a short numeric example below the formula.
$$adjusted\_retention\_period = retention\_period - signed\_utc\_offset$$ +where `signed_utc_offset` is the time zone offset in minutes (e.g., UTC+8 = +480, UTC−4 = −240). +For example, if `retention_period = 60` and the logs are in UTC−4, then +`adjusted_retention_period = 60 - (−240) = 300` minutes.
3-4
: Clarify scope: retention applies to both local filesystem and object storageMake it explicit up front where deletions occur.
-CLP can automatically delete *archives* and/or *search results* once they're older than a configured -retention period. This guide explains: +CLP can automatically delete *archives* and/or *search results* from both the local filesystem and +object storage once they're older than a configured retention period. This guide explains:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
docs/src/user-guide/guides-retention.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
PR: y-scope/clp#662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-retention.md
🪛 LanguageTool
docs/src/user-guide/guides-retention.md
[typographical] ~110-~110: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...The equation then becomes
(DASH_RULE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-retention.md
24-24: Link fragments should be valid
(MD051, link-fragments)
85-85: Link fragments should be valid
(MD051, link-fragments)
141-141: Link fragments should be valid
(MD051, link-fragments)
🔇 Additional comments (2)
docs/src/user-guide/guides-retention.md (2)
117-121
: Retention offset formula is correct (signed subtraction is required)Using a signed offset matches the intended behaviour: negative offsets increase TTL; positive offsets decrease it. Good catch.
122-137
: Add default for search result retention periodInclude a one-liner under the YAML example to document the out-of-the-box default:
File: docs/src/user-guide/guides-retention.md
Location: immediately after the closing ``` of theresults_cache
YAML blockProposed diff:
```yaml retention_period: 1440
- By default,
results_cache.retention_period
is60
minutes.</details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly lgtm.
The only big comment I have is about the data expiry criteria in Figure 1. I really think that it should be arranged as current_time - data_timestamp > retention_period
as it was before (though I do like the new variable names and the formatting). My rationale is that the current_time
and data_timestamp
variables are both times of day, while retention_period
is a duration. I think that it is much more intuitive to subtract two times of day from one another to get a duration, and then compare those two durations with the >
operator. If we leave it as it currently is, then we're subtracting a duration from a time of day, and then comparing two times of day with the >
operator, which does not feel intuitive to me (to the point where I had to write it out and check whether it was correct).
That being said, I won't push back any farther than this comment; @kirkrodrigues I leave it to you to decide.
Also left three specific comments.
(figure-1)= | ||
:::{card} | ||
|
||
$$data\_timestamp < current\_time - retention\_period$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See high-level comment.
Co-authored-by: Quinn Taylor Mitchell <q.mitchell@mail.utoronto.ca> Co-authored-by: haiqi96 <14502009+haiqi96@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (7)
docs/src/user-guide/guides-retention.md (7)
94-96
: Add an explicit MyST label for the non-UTC sectionEnables
{ref}
cross-references to resolve.+ (handling-log-events-with-non-utc-timestamps)= #### Handling log events with non-UTC timestamps
76-81
: Prefer{ref}
over hash fragments for the non-UTC section linkImproves stability and consistency with other cross-refs.
-If your log events use timestamps that *aren't* in the UTC time zone, you will need to adjust the -configured retention period to ensure expired archives are deleted at the correct time. See -[Handling log events with non-UTC timestamps](#handling-log-events-with-non-utc-timestamps) for -details. +If your log events use timestamps that *aren't* in the UTC time zone, you will need to adjust the +configured retention period to ensure expired archives are deleted at the correct time. See +{ref}`Handling log events with non-UTC timestamps <handling-log-events-with-non-utc-timestamps>` for +details.
141-142
: Cross-reference Figure 1 with{ref}
(search results section)Aligns with earlier Figure link changes.
-For search results, $data\_timestamp$ (in the expiry criteria equation from [Figure 1](#figure-1)) +For search results, $data\_timestamp$ (in the expiry criteria equation from {ref}`Figure 1 <figure-1>`)
85-86
: Cross-reference Figure 1 with{ref}
Keeps link style consistent and resolves the markdownlint MD051 hint.
-For archives, $data\_timestamp$ (in the expiry criteria equation from [Figure 1](#figure-1)) is the +For archives, $data\_timestamp$ (in the expiry criteria equation from {ref}`Figure 1 <figure-1>`) is the
23-25
: Use MyST cross-ref for Figure 1 (fixes MD051 and is more robust)Switch the fragment link to a
{ref}
.-When the garbage collector wakes up, it will scan for and delete any data that fits the expiry -criteria shown in [Figure 1](#figure-1): +When the garbage collector wakes up, it will scan for and delete any data that fits the expiry +criteria shown in {ref}`Figure 1 <figure-1>`:
117-121
: Clarify units for the UTC offset and add a quick numeric exampleKeeps the correct signed-offset formula and makes it harder to misconfigure.
-To avoid this issue, you can adjust the retention period to account for the offset of the log -events' time zone from UTC: +To avoid this issue, you can adjust the retention period to account for the offset of the log +events' time zone from UTC (use minutes for both values): $$adjusted\_retention\_period = retention\_period - signed\_utc\_offset$$ + +where `signed_utc_offset` is the time zone offset in minutes (e.g., UTC+8 = +480, UTC−4 = −240). +For example, if `retention_period = 60` and the logs are in UTC−4, then +`adjusted_retention_period = 60 - (−240) = 300` minutes.
16-22
: Add a brief global UTC note after the definitionsGives readers an early heads-up and a stable link to the detailed guidance.
| $data\_timestamp$ | The end of the time range for the data being evaluated for expiration (e.g., for an archive, this is the timestamp of the most recent log event contained in the archive). | +:::{note} +CLP assumes timestamps are in UTC. If your logs use local (non-UTC) timestamps, see +{ref}`Handling log events with non-UTC timestamps <handling-log-events-with-non-utc-timestamps>` to +adjust your retention period accordingly. +:::
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
docs/src/user-guide/guides-retention.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
PR: y-scope/clp#662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-retention.md
🪛 LanguageTool
docs/src/user-guide/guides-retention.md
[typographical] ~110-~110: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ...The equation then becomes
(DASH_RULE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-retention.md
24-24: Link fragments should be valid
(MD051, link-fragments)
85-85: Link fragments should be valid
(MD051, link-fragments)
141-141: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (1)
docs/src/user-guide/guides-retention.md (1)
130-137
: Document the default forresults_cache.retention_period
(symmetry with archives)You state the default for
archive_output.retention_period
(null/disabled) but not forresults_cache.retention_period
. Please confirm the default and add it here for consistency.Would you confirm the default value for
results_cache.retention_period
in the current release? If it's not null, add a short note after the YAML block (e.g., “By default, results are retained for X minutes.”). I can follow up with a patch once you confirm the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
docs/src/user-guide/guides-retention.md (2)
76-81
: Add a brief global UTC note earlier for quicker orientation.You have a solid warning here; consider also adding a short note right after the definitions so readers see the UTC assumption before the first figure/example.
Proposed insertion just after the definitions table (after Line 22):
+:::{note} +CLP assumes timestamps are in UTC. If your logs use local (non‑UTC) timestamps, see +{ref}`Handling log events with non-UTC timestamps <handling-log-events-with-non-utc-timestamps>` to +adjust the configured retention period. +:::
120-124
: Clarify units and add a one-line numeric example to the signed-offset formula.The formula is correct; specifying units and a quick example reduces misconfiguration.
-To avoid this issue, you can adjust the retention period to account for the offset of the log -events' time zone from UTC: +To avoid this issue, you can adjust the retention period to account for the offset of the log +events' time zone from UTC (use minutes for both values): $$adjusted\_retention\_period = retention\_period - signed\_utc\_offset$$ + +For example, if `retention_period = 60` and logs are in UTC−4 (`signed_utc_offset = −240`), +then `adjusted_retention_period = 60 - (−240) = 300` minutes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
docs/src/user-guide/guides-retention.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
PR: y-scope/clp#662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-retention.md
🪛 LanguageTool
docs/src/user-guide/guides-retention.md
[typographical] ~112-~112: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... then becomes...
(DASH_RULE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-retention.md
24-24: Link fragments should be valid
(MD051, link-fragments)
85-85: Link fragments should be valid
(MD051, link-fragments)
157-157: Link fragments should be valid
(MD051, link-fragments)
🔇 Additional comments (2)
docs/src/user-guide/guides-retention.md (2)
206-207
: Verify runtime behaviour: scheduler avoidance of expired archives.“CLP will not search an archive once it is considered expired” is a strong behavioural guarantee. Please confirm the scheduler/engine enforces this (e.g., by filtering on a safe expiry timestamp in queries) so users don’t rely on an assumption that might change.
If confirmed, consider linking to the relevant module or adding a brief note about how the safe cut-off is computed and applied.
142-143
: The verification script is running—I'll review the default value definitions for the config objects and confirm the actual defaults once it completes.
When the garbage collector runs, it will evaluate the archive's expiry criteria, substituting | ||
$08:00$ for $data\_timestamp$, and $01:01$ for $current\_time$, since $09:01$ AWST = $01:01$ UTC. | ||
The equation then becomes... | ||
|
||
$$is\_expired = (01:01 - 08:00 > 01:00)$$ | ||
|
||
... which evaluates to false. Thus, the garbage collector won't delete the archive; in fact, it | ||
won't delete it until $09:01$ UTC, which is 8 hours later than it should've been deleted. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optional: add a short parenthetical noting omitted dates/seconds.
The time-of-day arithmetic is correct, but readers may wonder about dates. A brief parenthetical keeps the example unambiguous.
-... which evaluates to false. Thus, the garbage collector won't delete the archive; in fact, it
-won't delete it until $09:01$ UTC, which is 8 hours later than it should've been deleted.
+... which evaluates to false. Thus, the garbage collector won't delete the archive; in fact, it
+won't delete it until $09:01$ UTC (dates/seconds omitted for brevity), which is 8 hours later than
+it should've been deleted.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
When the garbage collector runs, it will evaluate the archive's expiry criteria, substituting | |
$08:00$ for $data\_timestamp$, and $01:01$ for $current\_time$, since $09:01$ AWST = $01:01$ UTC. | |
The equation then becomes... | |
$$is\_expired = (01:01 - 08:00 > 01:00)$$ | |
... which evaluates to false. Thus, the garbage collector won't delete the archive; in fact, it | |
won't delete it until $09:01$ UTC, which is 8 hours later than it should've been deleted. | |
... which evaluates to false. Thus, the garbage collector won't delete the archive; in fact, it | |
won't delete it until $09:01$ UTC (dates/seconds omitted for brevity), which is 8 hours later than | |
it should've been deleted. |
🧰 Tools
🪛 LanguageTool
[typographical] ~112-~112: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... then becomes...
(DASH_RULE)
🤖 Prompt for AI Agents
docs/src/user-guide/guides-retention.md around lines 108 to 116: the example's
time-of-day arithmetic omits dates and seconds which can confuse readers; add a
short parenthetical after the sentence that introduces the substituted times
clarifying that dates and seconds are omitted for brevity (e.g., “(dates and
seconds omitted for clarity)”), so the example remains unambiguous without
changing the math or times used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @kirkrodrigues!
retained for 60 minutes (1 hour). | ||
|
||
:::{note} | ||
When a user runs consecutive queries in the webui without refreshing the page, the results of one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about? or maybe something like "current query"
When a user runs consecutive queries in the webui without refreshing the page, the results of one | |
When a user runs consecutive queries in the webui without refreshing the page, the results of the previous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
docs/src/user-guide/guides-retention.md (4)
103-107
: Prefer numeric UTC offsets over “AWST” to avoid DST/locale ambiguity (optional).This keeps examples globally unambiguous; if you want to keep AWST for regional familiarity, no blocker.
Apply:
-* your log events use timestamps in the AWST timezone (UTC+8); +* your log events use timestamps in a time zone with UTC+8;
76-81
: Optional: switch the intra-doc link to a MyST {ref} and add a section label.Keeps cross-references consistent with the Figure link.
Apply:
-See -[Handling log events with non-UTC timestamps](#handling-log-events-with-non-utc-timestamps) for -details. +See {ref}`Handling log events with non-UTC timestamps <handling-log-events-with-non-utc-timestamps>` for details.And add a label immediately above the subsection header (Line 94):
+(handling-log-events-with-non-utc-timestamps)= #### Handling log events with non-UTC timestamps
Note: If you intentionally avoid explicit labels, feel free to skip—this is purely for cross-ref consistency.
22-23
: Add an early, global UTC note after the definitions (navigation aid).Provides an immediate heads-up and a stable pointer to the detailed section.
Apply:
| $data\_timestamp$ | The end of the time range for the data being evaluated for expiration (e.g., for an archive, this is the timestamp of the most recent log event contained in the archive). | +:::{note} +CLP assumes timestamps are in UTC. If your logs use local (non-UTC) timestamps, see +[Handling log events with non-UTC timestamps](#handling-log-events-with-non-utc-timestamps) to adjust your retention period accordingly. +::: + When the garbage collector wakes up, it will scan for and delete any data that satisfies the expiry criteria shown in {ref}`Figure 1 <figure-1>`:If you prefer to avoid hash fragments, I can convert that link to a
{ref}
by adding a label above the subsection.
120-124
: Clarify units and add a concrete example for the signed UTC offset formula.Prevents misconfiguration and makes the guidance immediately actionable. Retain the signed subtraction; do not use absolute value.
Apply:
To avoid this issue, you can adjust the retention period to account for the offset of the log -events' time zone from UTC: +events' time zone from UTC (use minutes for both values): $$adjusted\_retention\_period = retention\_period - signed\_utc\_offset$$ + +where `signed_utc_offset` is the time-zone offset in minutes (e.g., UTC+8 = +480, UTC−4 = −240). +Example: for `retention_period = 60` and logs in UTC−4, `adjusted_retention_period = 60 - (−240) = 300` minutes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
-
docs/src/user-guide/guides-retention.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
PR: y-scope/clp#662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-retention.md
🪛 LanguageTool
docs/src/user-guide/guides-retention.md
[typographical] ~112-~112: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... then becomes...
(DASH_RULE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-retention.md
24-24: Link fragments should be valid
(MD051, link-fragments)
85-85: Link fragments should be valid
(MD051, link-fragments)
157-157: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (1)
docs/src/user-guide/guides-retention.md (1)
88-92
: The script will search across the repository for how "expiry", "Retention", "garbage", and "timestamp" are handled in the retention/garbage-collector code to confirm whether archives lacking message timestamps are excluded from retention. Once the results come back, I’ll verify the edge-case behavior and then update the note accordingly.
5. (Optional) Configure retention periods for archives and search results. See | ||
[retention control](guides-retention) for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit out of scope of this doc, right?
docs/src/user-guide/index.md
Outdated
@@ -63,6 +63,7 @@ guides-overview | |||
guides-using-object-storage/index | |||
guides-multi-node | |||
guides-using-presto | |||
guides-retention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this above guides-multi-node
? Same for the card in the overview.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
docs/src/user-guide/guides-multi-node.md
(2 hunks)docs/src/user-guide/guides-overview.md
(2 hunks)docs/src/user-guide/index.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-overview.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
docs/src/user-guide/guides-multi-node.md (1)
93-99
: Confirming garbage_collector Group 2 placement and UTC-based expiry
- The retention guide explicitly states that the garbage collector runs in the UTC time zone (see the warning and “Handling log events with non-UTC timestamps” section noting GC’s UTC-based scheduling).
- The garbage_collector component is correctly placed in Group 2: it only depends on Group 1 services (database, queue, Redis, results_cache) and does not rely on any Group 3 workers to compute expirations or enqueue deletion tasks.
No changes needed here.
docs/src/user-guide/index.md (1)
62-67
: Good placement of the new guide in the toctreeAdding guides-retention before guides-multi-node aligns with prior feedback about ordering. No issues from a Sphinx/MyST perspective with :glob: plus explicit entries here.
docs/src/user-guide/guides-overview.md (1)
16-20
: Title casing and positioning look consistent“Retention control” uses sentence case and appears before “Multi-node deployment,” matching the requested ordering. Looks good.
| garbage_collector | Background process for retention control | | ||
::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add cross-link for discoverability to the new component row
Point readers from the components table directly to the new guide.
-| garbage_collector | Background process for retention control |
+| garbage_collector | Background process for retention control; see [Retention control](guides-retention) |
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| garbage_collector | Background process for retention control | | |
::: | |
| garbage_collector | Background process for retention control; see [Retention control](guides-retention) | | |
::: |
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-multi-node.md around lines 34–35, the components
table row for "garbage_collector" should include a cross-link to the new guide
for discoverability; update the table cell to make the component name or its
description a Markdown link pointing to the new guide (for example the relative
path docs/src/user-guide/guides-garbage-collector.md or the correct guide
filename), ensuring the link text remains clear (e.g., "garbage_collector —
Background process for retention control") and the link URL points to the new
guide.
:::{grid-item-card} | ||
:link: guides-using-presto | ||
Using Presto with CLP | ||
^^^ | ||
How to use Presto to query compressed logs in CLP. | ||
::: | ||
:::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optional copy tweak for parallelism across cards
Minor style suggestion to make descriptions crisper and consistent with the “How to …” phrasing used elsewhere.
-How to use Presto to query compressed logs in CLP.
+How to use Presto to query compressed logs stored by CLP.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
:::{grid-item-card} | |
:link: guides-using-presto | |
Using Presto with CLP | |
^^^ | |
How to use Presto to query compressed logs in CLP. | |
::: | |
:::: | |
:::{grid-item-card} | |
:link: guides-using-presto | |
Using Presto with CLP | |
How to use Presto to query compressed logs stored by CLP. | |
::: | |
:::: |
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-overview.md around lines 29 to 35, update the card
copy to use the "How to ..." phrasing for parallelism: change the card title
from "Using Presto with CLP" to "How to use Presto with CLP" (or otherwise
ensure the header matches the descriptive sentence), and confirm the description
reads "How to use Presto to query compressed logs in CLP." so the header and
body follow the same "How to ..." style used across other cards.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
docs/src/user-guide/guides-overview.md (1)
29-34
: Optional: align title with “How to …” phrasing for parallelismFor consistency with the description and other cards that use “How to …”, consider updating the title.
-Using Presto with CLP +How to use Presto with CLP @@ -How to use Presto to query compressed logs in CLP. +How to use Presto to query compressed logs stored by CLP.docs/src/user-guide/guides-retention.md (2)
16-22
: Add a short, upfront UTC note (addresses the UTC reminder and improves discoverability)You explain non‑UTC handling later, but an early heads‑up reduces surprises and gives readers a quick pointer.
| $data\_timestamp$ | The end of the time range for the data being evaluated for expiration (e.g., for an archive, this is the timestamp of the most recent log event contained in the archive). | +:::{note} +CLP assumes timestamps are in UTC. If your logs use local (non‑UTC) timestamps, see “Handling log events with non‑UTC timestamps” below to adjust your retention period accordingly. +:::
120-124
: Clarify units for signed_utc_offset and add a concrete examplePrevents misconfiguration and makes the formula directly actionable.
-To avoid this issue, you can adjust the retention period to account for the offset of the log -events' time zone from UTC: +To avoid this issue, you can adjust the retention period to account for the offset of the log +events' time zone from UTC (use minutes for both values): $$adjusted\_retention\_period = retention\_period - signed\_utc\_offset$$ + +where `signed_utc_offset` is the time zone offset in minutes (e.g., UTC+8 = +480, UTC−4 = −240). +For example, if `retention_period = 60` and the logs are in UTC−4, then +`adjusted_retention_period = 60 - (−240) = 300` minutes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
-
docs/src/user-guide/guides-overview.md
(2 hunks) -
docs/src/user-guide/guides-retention.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-guide/guides-overview.md
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-08-15T21:48:46.674Z
Learnt from: quinntaylormitchell
PR: y-scope/clp#1181
File: docs/src/user-guide/guides-retention.md:57-66
Timestamp: 2025-08-15T21:48:46.674Z
Learning: When dealing with timezone offset adjustments for retention periods in CLP, the correct formula is `adjusted_retention_period = retention_period - UTC_offset` (not using absolute value). For negative UTC offsets, this increases the retention period; for positive UTC offsets, this decreases it, properly compensating for how local timestamps are interpreted by the UTC-based garbage collector.
Applied to files:
docs/src/user-guide/guides-overview.md
docs/src/user-guide/guides-retention.md
📚 Learning: 2025-01-13T21:18:54.629Z
Learnt from: haiqi96
PR: y-scope/clp#662
File: components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py:167-186
Timestamp: 2025-01-13T21:18:54.629Z
Learning: In the CLP system, streams can be deterministically recreated if needed for debugging purposes, so it's acceptable to delete them even if their S3 upload fails.
Applied to files:
docs/src/user-guide/guides-retention.md
🪛 LanguageTool
docs/src/user-guide/guides-retention.md
[typographical] ~112-~112: If you want to indicate numerical ranges or time ranges, consider using an en dash.
Context: ... then becomes...
(DASH_RULE)
🪛 markdownlint-cli2 (0.17.2)
docs/src/user-guide/guides-retention.md
24-24: Link fragments should be valid
(MD051, link-fragments)
85-85: Link fragments should be valid
(MD051, link-fragments)
157-157: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (3)
docs/src/user-guide/guides-overview.md (2)
16-20
: Retention card reads clearly and matches the new guideLink target, title, and description are concise and consistent with neighbouring cards.
29-31
: Link slug verified: Presto guide existsI’ve confirmed that the bare slug
guides-using-presto
corresponds to an existing file in docs/src/user-guide (and is listed in the toctree at index.md), so the grid‐item‐card link won’t break.• docs/src/user-guide/index.md includes
guides-using-presto
in its toctree
• docs/src/user-guide/guides-using-presto.md exists and renders as expectedNo changes are strictly required here. If you’d like to standardize on directory‐style links (e.g.
guides-using-presto/index
), you can update the link—but it’s purely optional.docs/src/user-guide/guides-retention.md (1)
73-75
: Retention defaults verifiedThe documented defaults for
archive_output.retention_period
andresults_cache.retention_period
match the code’s shipped defaults. No changes are required.• In components/clp-py-utils/clp_py_utils/clp_config.py (around line 417),
ResultsCache.retention_period: Optional[int] = 60
confirms the 60-minute default.
• In components/clp-py-utils/clp_py_utils/clp_config.py (around line 658),ArchiveOutput.retention_period: Optional[int] = None
confirms thenull
default.
When the garbage collector wakes up, it will scan for and delete any data that satisfies the expiry | ||
criteria shown in [Figure 1](#figure-1): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Use MyST {ref} cross‑refs for Figure 1 (fixes MD051 and is more robust)
Replace hash fragments with {ref} to the existing “(figure-1)=” target.
-criteria shown in [Figure 1](#figure-1):
+criteria shown in {ref}`Figure 1 <figure-1>`:
@@
-For archives, $data\_timestamp$ (in the expiry criteria equation from [Figure 1](#figure-1)) is the
+For archives, $data\_timestamp$ (in the expiry criteria equation from {ref}`Figure 1 <figure-1>`) is the
@@
-For search results, $data\_timestamp$ (in the expiry criteria equation from [Figure 1](#figure-1))
+For search results, $data\_timestamp$ (in the expiry criteria equation from {ref}`Figure 1 <figure-1>`)
Also applies to: 85-86, 157-158
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
24-24: Link fragments should be valid
(MD051, link-fragments)
🤖 Prompt for AI Agents
In docs/src/user-guide/guides-retention.md around lines 23-25 (and similarly at
85-86 and 157-158), replace inline hash-fragment links like "(#figure-1)" with
MyST {ref} cross-refs pointing to the existing target label (figure-1) — e.g.
use {ref}`figure-1` style — so update each occurrence to the MyST {ref} syntax,
preserving surrounding text and punctuation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited the PR title directly.
Description
As the title describes, this PR adds the user doc for retention control.
Note that one feature described in this PR depends on #1208
Checklist
breaking change.
Validation performed
Locally built the docs and confirmed it looks as expected.
Summary by CodeRabbit