Skip to content

Conversation

Eden-D-Zhang
Copy link
Contributor

@Eden-D-Zhang Eden-D-Zhang commented Oct 15, 2025

Description

This pull request introduces a new lightweight profiling utility for CLP query execution based on PyInstrument and applies it to certain query execution functions. The profiling decorator generates HTML flame graphs and text summaries for performance analysis, with context extraction for job and task IDs.

Currently, profiling can only be enabled/disabled by modifying the return value of is_profiling_enabled in profiling_utils.py. Per-component config options should be added in a future PR.

  • Added new profiling_utils.py module in clp_py_utils providing a profile decorator for both sync and async functions, automatic context extraction, and output of HTML and text profiling reports to a designated directory.
  • Added pyinstrument as a dependency in pyproject.toml to support profiling features.
  • Added @profile decorator to some query_scheduler.py functions.
  • Added @profile decorator to search and extract_stream task functions.

Profiling output is written to <CLP_LOGS_DIR>/profiles as two files for each execution of a profiled function that are labeled with job, task, and timestamp in the filename.

  • .txt file with call stack timing information:
image
  • Interactive .html file with call stack and flame graph visualization:
image

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Performed queries using the CLP package with profiling enabled and disabled.

Summary by CodeRabbit

  • New Features

    • Optional runtime profiling that can produce interactive HTML and plain-text summaries with timestamps and contextual job/task IDs. Works for synchronous and asynchronous operations and avoids conflicts when nested.
  • Chores

    • Profiling instrumentation added across scheduler and task workflows (no behavioural or interface changes).
    • Added pyinstrument dependency to enable profiling.

@Eden-D-Zhang Eden-D-Zhang requested a review from a team as a code owner October 15, 2025 19:22
Copy link
Contributor

coderabbitai bot commented Oct 15, 2025

Walkthrough

Adds a new profiling utilities module using pyinstrument, registers pyinstrument as a dependency, and applies a decorator to selected scheduler and executor tasks to capture profiling data (sync/async support, context extraction, nested-profiler handling, and report saving). No existing function signatures were changed.

Changes

Cohort / File(s) Summary
Profiling utilities module
components/clp-py-utils/clp_py_utils/profiling_utils.py
New decorator-based profiling utility supporting sync and async functions, extracting job_id/task_id via argument names and dot-paths, gating via a global enable flag, handling nested profilers, and saving HTML + TXT reports to CLP_LOGS_DIR/profiles with timestamps and metadata.
Dependency update
components/clp-py-utils/pyproject.toml
Added dependency: pyinstrument = "^5.1.1".
Executor task instrumentation
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py, components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
Imported profile and decorated Celery task functions (extract_stream, search) with section names ("extract_stream_task", "search_task"); decorators are additive and do not change signatures or control flow.
Scheduler instrumentation
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
Imported profile and decorated dispatch_query_job, acquire_reducer_for_job (async), and handle_finished_search_job (async) with section names and job_id_param="job.id" to capture job context.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Decorator as profile(...)
  participant Target as Wrapped Function
  participant Profiler as pyinstrument.Profiler
  participant Storage as CLP_LOGS_DIR/profiles

  Caller->>Decorator: invoke wrapped function(...)
  alt profiling disabled
    Decorator->>Target: call directly
    Target-->>Decorator: result/exception
  else profiling enabled
    Decorator->>Decorator: extract job_id/task_id (args, dot-paths)
    Decorator->>Profiler: start(interval)
    note right of Profiler: If nested profiler active,\nskip profiling and continue
    Decorator->>Target: execute (sync or async)
    Target-->>Decorator: result/exception
    Decorator->>Profiler: stop()
    Decorator->>Storage: save HTML + TXT with section/job/task/timestamp
  end
  Decorator-->>Caller: return result or raise
Loading
sequenceDiagram
  autonumber
  participant Celery as Celery Worker
  participant Task as @profile + @app.task
  participant Profiler as pyinstrument.Profiler
  participant Storage as profiles/

  Celery->>Task: run search/extract_stream
  rect rgba(200,230,255,0.3)
    note left of Task: Profiling wrapper around task execution
    Task->>Profiler: start()
    Task->>Task: execute task logic
    Task->>Profiler: stop()
    Task->>Storage: write HTML and TXT reports
  end
  Task-->>Celery: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "feat(clp-package): Add Python profiling decorator for internal query performance measurement." directly and accurately describes the main change in the changeset, which is the introduction of a new profiling decorator utility for CLP query execution. The title is clear, specific, and concise, mentioning both what is being added (a Python profiling decorator) and its purpose (internal query performance measurement). It avoids vague terminology and provides sufficient context that a teammate reviewing the commit history would understand the primary objective without needing to examine the full changeset details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d164950 and cc85f93.

📒 Files selected for processing (5)
  • components/clp-py-utils/clp_py_utils/profiling_utils.py (1 hunks)
  • components/clp-py-utils/pyproject.toml (1 hunks)
  • components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (2 hunks)
  • components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
  • profile (28-119)
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
  • profile (28-119)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)
components/clp-py-utils/clp_py_utils/profiling_utils.py (1)
  • profile (28-119)
🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/profiling_utils.py

17-17: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)


17-17: typing.Tuple is deprecated, use tuple instead

(UP035)


54-54: Missing return type annotation for private function async_wrapper

(ANN202)


54-54: Missing type annotation for *args

(ANN002)


54-54: Missing type annotation for **kwargs

(ANN003)


70-71: Logging statement uses f-string

(G004)


88-88: Missing return type annotation for private function sync_wrapper

(ANN202)


88-88: Missing type annotation for *args

(ANN002)


88-88: Missing type annotation for **kwargs

(ANN003)


104-105: Logging statement uses f-string

(G004)


185-185: Do not catch blind exception: Exception

(BLE001)


186-186: Logging statement uses f-string

(G004)


216-216: Logging statement uses f-string

(G004)


222-222: datetime.datetime.now() called without a tz argument

(DTZ005)


246-246: f-string without any placeholders

Remove extraneous f prefix

(F541)


257-259: Logging statement uses f-string

(G004)


263-263: Logging .exception(...) should be used instead of .error(..., exc_info=True)

(G201)


263-263: Logging statement uses f-string

(G004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (1)
components/clp-py-utils/pyproject.toml (1)

19-19: Verify runtime support for pyinstrument and packaging impact

  • pyinstrument includes native components; confirm it installs cleanly in your worker/scheduler images and doesn’t bloat slim runtimes unnecessarily.
  • If profiling is rarely enabled, consider making it an optional extra to keep minimal installs lean.

Comment on lines +20 to +24

logger = logging.getLogger(__name__)

F = TypeVar("F", bound=Callable[..., Any])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Type‑hints modernisation and stricter types (Ruff UP035, annotations)

  • Import Callable from collections.abc.
  • Prefer built‑in generic tuple[str, str] over typing.Tuple.

Apply this diff:

-from typing import Any, Callable, Optional, Tuple, TypeVar
+from typing import Any, Optional, TypeVar
+from collections.abc import Callable
@@
-) -> Tuple[str, str]:
+) -> tuple[str, str]:

Also applies to: 17-17, 128-128

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 17,
20-24 and 128, update type imports and tuple annotations: replace Callable
imported from typing with Callable from collections.abc, remove typing.Tuple
usage and use built-in generic tuple[str, str] where applicable, and adjust any
related imports (drop unused typing names) so the file uses modern annotations
consistent with Ruff UP035; ensure F = TypeVar("F", bound=Callable[..., Any])
still references the new Callable import.

Comment on lines +147 to +176
def extract_value(param_spec: str) -> str:
"""Extract value from parameter, supporting dot notation for attributes."""
if not param_spec:
return ""

# Split on '.' to handle attribute access
parts = param_spec.split(".")
param_name = parts[0]

# Find the parameter value
value = None
if param_name in kwargs:
value = kwargs[param_name]
elif param_name in param_names:
idx = param_names.index(param_name)
if idx < len(args):
value = args[idx]

if value is None:
return ""

# Navigate through attributes if dot notation was used
for attr_name in parts[1:]:
if hasattr(value, attr_name):
value = getattr(value, attr_name)
else:
return ""

return str(value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Optional: support dict-style context extraction

If callers pass dict-like objects, walk keys as well as attributes.

Apply this diff:

-            for attr_name in parts[1:]:
-                if hasattr(value, attr_name):
-                    value = getattr(value, attr_name)
-                else:
-                    return ""
+            for attr_name in parts[1:]:
+                if isinstance(value, dict) and attr_name in value:
+                    value = value[attr_name]
+                elif hasattr(value, attr_name):
+                    value = getattr(value, attr_name)
+                else:
+                    return ""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def extract_value(param_spec: str) -> str:
"""Extract value from parameter, supporting dot notation for attributes."""
if not param_spec:
return ""
# Split on '.' to handle attribute access
parts = param_spec.split(".")
param_name = parts[0]
# Find the parameter value
value = None
if param_name in kwargs:
value = kwargs[param_name]
elif param_name in param_names:
idx = param_names.index(param_name)
if idx < len(args):
value = args[idx]
if value is None:
return ""
# Navigate through attributes if dot notation was used
for attr_name in parts[1:]:
if hasattr(value, attr_name):
value = getattr(value, attr_name)
else:
return ""
return str(value)
def extract_value(param_spec: str) -> str:
"""Extract value from parameter, supporting dot notation for attributes."""
if not param_spec:
return ""
# Split on '.' to handle attribute access
parts = param_spec.split(".")
param_name = parts[0]
# Find the parameter value
value = None
if param_name in kwargs:
value = kwargs[param_name]
elif param_name in param_names:
idx = param_names.index(param_name)
if idx < len(args):
value = args[idx]
if value is None:
return ""
# Navigate through attributes if dot notation was used
- for attr_name in parts[1:]:
- if hasattr(value, attr_name):
- value = getattr(value, attr_name)
- else:
for attr_name in parts[1:]:
if isinstance(value, dict) and attr_name in value:
value = value[attr_name]
elif hasattr(value, attr_name):
value = getattr(value, attr_name)
else:
return ""
return str(value)

Comment on lines 191 to 199
def _is_profiling_enabled() -> bool:
"""
Check if profiling is enabled.
TODO: Add `CLPConfig` mechanism to enable/disable profiling for each component.
:return: False
"""
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Enable profiling via env and make interval tunable

Avoid code edits to toggle profiling. Read from env and allow interval override.

Apply this diff:

-PROFILING_INTERVAL = 0.001
+PROFILING_INTERVAL = float(os.getenv("CLP_PROFILING_INTERVAL", "0.005"))
@@
-def _is_profiling_enabled() -> bool:
+def _is_profiling_enabled() -> bool:
@@
-    :return: False
+    :return: True if CLP_ENABLE_PROFILING is truthy, else False
@@
-    return False
+    return os.getenv("CLP_ENABLE_PROFILING", "").lower() in {"1", "true", "yes", "on"}

Also applies to: 25-25

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 191-199
(and also apply same change at line 25), replace the hardcoded return False with
logic that reads an environment variable to enable profiling (e.g.
CLP_PROFILING_ENABLED or CLP_PROFILING) and returns True for common truthy
values ("1","true","yes") and False otherwise; also make the profiling interval
tunable by reading an environment variable (e.g. CLP_PROFILING_INTERVAL_SECONDS)
elsewhere where the interval is used (or add a helper _get_profiling_interval()
that reads the env and falls back to the current default), parsing ints safely
and using a sensible default on parse failure.

Comment on lines +233 to +235
output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
output_dir.mkdir(exist_ok=True, parents=True)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: handle missing CLP_LOGS_DIR to avoid crash when enabled

Path(os.getenv("CLP_LOGS_DIR")) raises a TypeError if the env var is unset. Provide a safe fallback and warn.

Apply this diff:

-        output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
+        logs_dir = os.getenv("CLP_LOGS_DIR")
+        if not logs_dir:
+            logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles")
+            output_dir = Path.cwd() / "profiles"
+        else:
+            output_dir = Path(logs_dir) / "profiles"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
output_dir.mkdir(exist_ok=True, parents=True)
logs_dir = os.getenv("CLP_LOGS_DIR")
if not logs_dir:
logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles")
output_dir = Path.cwd() / "profiles"
else:
output_dir = Path(logs_dir) / "profiles"
output_dir.mkdir(exist_ok=True, parents=True)
🤖 Prompt for AI Agents
components/clp-py-utils/clp_py_utils/profiling_utils.py around lines 233-235:
currently Path(os.getenv("CLP_LOGS_DIR")) will raise a TypeError if the env var
is unset; instead read the env var into a variable, check if it's None or empty,
log a warning (use the module logger or warnings.warn) and choose a safe
fallback directory (e.g. tempfile.mkdtemp(prefix="clp_logs_") or
os.getcwd()/".clp_logs"), then construct a Path from that string and call
output_dir.mkdir(exist_ok=True, parents=True); ensure the code uses the
checked/fallback path variable rather than calling Path on a possibly None
value.

Comment on lines +246 to +255
f.write(f"CLP Query Profiling Report (pyinstrument)\n")
f.write(f"Section: {section_name}\n")
if job_id:
f.write(f"Job ID: {job_id}\n")
if task_id:
f.write(f"Task ID: {task_id}\n")
f.write(f"Timestamp: {timestamp}\n")
f.write("=" * 80 + "\n\n")
f.write(profiler.output_text(unicode=True, color=False))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Logging/text nits: remove stray f, use logger.exception, timezone

  • Remove the stray f-string with no placeholders.
  • Prefer logger.exception over error(..., exc_info=True).
  • Consider timezone-aware timestamps or time.time() for filenames.

Apply this diff:

-            f.write(f"CLP Query Profiling Report (pyinstrument)\n")
+            f.write("CLP Query Profiling Report (pyinstrument)\n")
@@
-        logger.info(
+        logger.info(
             f"Profile saved: {section_name} "
             f"(duration={duration:.6f}s, samples={sample_count}) "
             f"HTML={html_path}, TXT={txt_path}"
         )
@@
-    except Exception as e:
-        logger.error(f"Failed to save profile for {section_name}: {e}", exc_info=True)
+    except Exception:
+        logger.exception("Failed to save profile for %s", section_name)

Optionally:

-        timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")
+        timestamp = datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%d_%H%M%S_%fZ")

Also applies to: 256-260, 263-263, 222-222

🧰 Tools
🪛 Ruff (0.14.0)

246-246: f-string without any placeholders

Remove extraneous f prefix

(F541)

WorkerConfig,
)
from clp_py_utils.clp_logging import set_logging_level
from clp_py_utils.profiling_utils import profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Decorator usage looks correct; add explicit context mapping and env check

  • Order is right (@profile wraps the function that Celery registers).
  • Consider passing job_id_param="job_id", task_id_param="task_id" for clarity.
  • Ensure CLP_LOGS_DIR is set in worker env; otherwise saving profiles will fail once enabled (see _save_profile).

Also applies to: 183-184

🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py
around lines 17 and 183-184, update the @profile decorator usage to pass
explicit context mapping by adding job_id_param="job_id" and
task_id_param="task_id" so the profiler knows which function args map to
job/task IDs, and add an environment check (e.g., verify CLP_LOGS_DIR is present
and non-empty in os.environ) before attempting to save profiles in _save_profile
to avoid failures when the worker env is missing CLP_LOGS_DIR.

WorkerConfig,
)
from clp_py_utils.clp_logging import set_logging_level
from clp_py_utils.profiling_utils import profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Good instrumentation; consider explicit job/task mapping and env readiness

  • Order of decorators is correct.
  • Optionally pass job_id_param="job_id", task_id_param="task_id" to be explicit.
  • Ensure CLP_LOGS_DIR exists in the search worker environment when profiling is enabled.

Also applies to: 169-170

🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py
around line 15 (and also at lines 169-170), the profile decorator is used but
not explicit about which args map to job and task and the code assumes
CLP_LOGS_DIR exists; update the profile decorator calls to include
job_id_param="job_id", task_id_param="task_id" for clarity, and add an
environment-readiness check that ensures CLP_LOGS_DIR exists (create the
directory if missing or raise a clear error) before profiling is enabled so
profiling can write logs reliably.

)
from clp_py_utils.core import read_yaml_config_file
from clp_py_utils.decorators import exception_default_value
from clp_py_utils.profiling_utils import profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Scheduler profiling: sane targets; add guardrails for async sections and interval

  • Targets are sensible; job_id_param="job.id" is correct.
  • For long‑lived async functions (acquire_reducer_for_job, handle_finished_search_job), consider:
    • Using a coarser interval (e.g., 5–10ms) to reduce overhead/trace size.
    • Making interval and enablement configurable via env to avoid code changes when toggling.

Also applies to: 547-555, 569-611, 898-902

🤖 Prompt for AI Agents
In
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py
around lines 49 (and also apply similar changes at 547-555, 569-611, 898-902),
the profiler is applied with defaults that are too fine-grained for long‑lived
async sections; update profiling to use a coarser sampling interval (e.g.,
5–10ms) and add guardrails: make both profiling enablement and interval
configurable via environment variables (e.g., JOB_SCHEDULER_PROFILING_ENABLED,
JOB_SCHEDULER_PROFILE_INTERVAL_MS) and wrap or conditionally apply the profiler
only when enabled so async functions like acquire_reducer_for_job and
handle_finished_search_job use the higher interval (or are skipped) to reduce
overhead and trace size.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a formal review)

note we are migrating the build system from poetry to uv in #1405 so feel free to hold off on the dependency changes

mariadb = "~1.0.11"
mysql-connector-python = "^8.2.0"
pydantic = "^2.11.9"
pyinstrument = "^5.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we plan to include the profiling feature in the formal releases? if not, maybe we should put this into a

[dependency-groups]
dev = [
    ...
]

that said, the pyinstrument package is only ~270KB (https://pypi.org/project/pyinstrument/#files) so it shouldn't hurt to be bundled into the release.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a full review either. Just catch some errors when taking a glance.
It would be better if we can add some instructions in the PR to indicate:

  • How to turn on the profiler (seems like to tune the flag in the code level and rebuild the package)
  • A step-by-step setup for testing, and what are the expected outputs?



def profile(
section_name: Optional[str] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
section_name: Optional[str] = None,
section_name: str | None = None,

Do it for all occurrences.

Copy link
Contributor Author

@Eden-D-Zhang Eden-D-Zhang Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, are there cases where do we prefer Optional or should it always be | None?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
components/clp-py-utils/clp_py_utils/profiling_utils.py (6)

17-17: Modernize type imports per Python 3.9+ standards.

Use collections.abc.Callable instead of typing.Callable and built-in tuple instead of typing.Tuple.

Based on past review comments and Ruff UP035 hints.


68-68: Make nested-profiler detection more resilient.

The current substring match is case-sensitive and brittle. Consider a case-insensitive check covering broader phrasings.

Based on past review comments.

Also applies to: 100-100


185-192: Enable profiling via environment variable for better UX.

Currently, profiling requires code edits to enable. Consider reading from an environment variable (e.g., CLP_ENABLE_PROFILING) to allow runtime control without redeployment.

Apply this diff:

+import os
+
 def _is_profiling_enabled() -> bool:
     """
     Checks if profiling is enabled.
-    TODO: Add `CLPConfig` mechanism to enable/disable profiling for each component.
 
-    :return: Whether the profiler is enabled.
+    :return: True if CLP_ENABLE_PROFILING is truthy, else False.
     """
-    return False
+    return os.getenv("CLP_ENABLE_PROFILING", "").lower() in {"1", "true", "yes", "on"}

Based on past review comments.


25-25: Make profiling interval configurable.

Consider reading the interval from an environment variable to allow runtime tuning without code changes.

Apply this diff:

-PROFILING_INTERVAL = 0.001
+PROFILING_INTERVAL = float(os.getenv("CLP_PROFILING_INTERVAL", "0.001"))

Based on past review comments.


227-228: Critical: Handle missing CLP_LOGS_DIR to prevent crash.

Path(os.getenv("CLP_LOGS_DIR")) raises TypeError if the environment variable is unset. This will crash when profiling is enabled.

Apply this diff:

-        output_dir = Path(os.getenv("CLP_LOGS_DIR")) / "profiles"
-        output_dir.mkdir(exist_ok=True, parents=True)
+        logs_dir = os.getenv("CLP_LOGS_DIR")
+        if not logs_dir:
+            logger.warning("CLP_LOGS_DIR is not set; writing profiles to CWD ./profiles")
+            output_dir = Path.cwd() / "profiles"
+        else:
+            output_dir = Path(logs_dir) / "profiles"
+        output_dir.mkdir(exist_ok=True, parents=True)

Based on past review comments.


240-240: Address logging and text formatting nits.

Minor improvements for logging statements:

  1. Line 240: Remove unnecessary f prefix from string without placeholders.
  2. Line 257: Use logger.exception() instead of logger.error(..., exc_info=True).

Apply this diff:

-            f.write(f"CLP Query Profiling Report (pyinstrument)\n")
+            f.write("CLP Query Profiling Report (pyinstrument)\n")
 
-    except Exception as e:
-        logger.error(f"Failed to save profile for {section_name}: {e}", exc_info=True)
+    except Exception:
+        logger.exception("Failed to save profile for %s", section_name)

Based on past review comments and Ruff hints (F541, G201).

Also applies to: 257-257

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b013429 and a33a9c1.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/profiling_utils.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: clp-lint
components/clp-py-utils/clp_py_utils/profiling_utils.py

[error] 97-97: Black formatting check failed. 1 file would be reformatted (profiling_utils.py). Run 'black' to format the file.

🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/profiling_utils.py

17-17: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)


17-17: typing.Tuple is deprecated, use tuple instead

(UP035)


54-54: Missing return type annotation for private function async_wrapper

(ANN202)


54-54: Missing type annotation for *args

(ANN002)


54-54: Missing type annotation for **kwargs

(ANN003)


70-71: Logging statement uses f-string

(G004)


86-86: Missing return type annotation for private function sync_wrapper

(ANN202)


86-86: Missing type annotation for *args

(ANN002)


86-86: Missing type annotation for **kwargs

(ANN003)


102-103: Logging statement uses f-string

(G004)


179-179: Do not catch blind exception: Exception

(BLE001)


180-180: Logging statement uses f-string

(G004)


210-210: Logging statement uses f-string

(G004)


216-216: datetime.datetime.now() called without a tz argument

(DTZ005)


240-240: f-string without any placeholders

Remove extraneous f prefix

(F541)


251-253: Logging statement uses f-string

(G004)


257-257: Logging .exception(...) should be used instead of .error(..., exc_info=True)

(G201)


257-257: Logging statement uses f-string

(G004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: rust-checks (macos-15)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
components/clp-py-utils/clp_py_utils/profiling_utils.py (6)

17-17: Modernize type imports per past feedback.

The past review already flagged this: import Callable from collections.abc and use built-in tuple instead of typing.Tuple.


70-71: F-string logging already flagged in past review.

Past comments already noted that f-strings in logging statements should be avoided (Ruff G004). Use lazy formatting with % or avoid f-strings in logger calls.

Also applies to: 102-102


178-179: Minor code quality issues already flagged.

  • Line 178: Catching blind Exception (Ruff BLE001) - acceptable for context extraction but flagged by static analysis
  • Line 179: F-string in logging (Ruff G004) - already noted in past comments

184-191: Environment-based enablement already suggested in past review.

A past comment already recommended reading CLP_ENABLE_PROFILING from environment to avoid code edits. The TODO indicates this will be addressed in a future PR.


226-226: CRITICAL: Missing CLP_LOGS_DIR will crash (already flagged).

Past review already identified this critical issue: Path(os.getenv("CLP_LOGS_DIR")) raises TypeError if the environment variable is unset. The suggested fix provides a fallback directory and warning.


209-209: Logging and text formatting nits already flagged.

Past comments already noted:

  • Line 209: f-string logging (Ruff G004)
  • Line 239: f-string without placeholders (Ruff F541)
  • Lines 250-252: f-string logging (Ruff G004)
  • Line 256: Use logger.exception instead of logger.error(..., exc_info=True) (Ruff G201)

Also applies to: 239-239, 250-252, 256-256

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a33a9c1 and 99d134b.

📒 Files selected for processing (1)
  • components/clp-py-utils/clp_py_utils/profiling_utils.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.0)
components/clp-py-utils/clp_py_utils/profiling_utils.py

17-17: Import from collections.abc instead: Callable

Import from collections.abc

(UP035)


17-17: typing.Tuple is deprecated, use tuple instead

(UP035)


54-54: Missing return type annotation for private function async_wrapper

(ANN202)


54-54: Missing type annotation for *args

(ANN002)


54-54: Missing type annotation for **kwargs

(ANN003)


70-71: Logging statement uses f-string

(G004)


86-86: Missing return type annotation for private function sync_wrapper

(ANN202)


86-86: Missing type annotation for *args

(ANN002)


86-86: Missing type annotation for **kwargs

(ANN003)


102-102: Logging statement uses f-string

(G004)


178-178: Do not catch blind exception: Exception

(BLE001)


179-179: Logging statement uses f-string

(G004)


209-209: Logging statement uses f-string

(G004)


215-215: datetime.datetime.now() called without a tz argument

(DTZ005)


239-239: f-string without any placeholders

Remove extraneous f prefix

(F541)


250-252: Logging statement uses f-string

(G004)


256-256: Logging .exception(...) should be used instead of .error(..., exc_info=True)

(G201)


256-256: Logging statement uses f-string

(G004)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: rust-checks (ubuntu-24.04)
  • GitHub Check: rust-checks (macos-15)
  • GitHub Check: rust-checks (ubuntu-22.04)
  • GitHub Check: lint-check (macos-15)


profiler = Profiler(interval=PROFILING_INTERVAL)
try:
profiler.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Black formatting failure blocks CI.

The pipeline reports a Black formatting check failed at this line. Run black components/clp-py-utils/clp_py_utils/profiling_utils.py locally to auto-format the file and commit the result.

🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/profiling_utils.py around line 97, Black
formatting failed at the call to profiler.start(); run the Black formatter on
this file (e.g., black components/clp-py-utils/clp_py_utils/profiling_utils.py),
ensure the file is auto-formatted and any resulting changes are committed, then
push the commit to satisfy the CI formatting check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants