Skip to content

bulk api filter label fix with testing logs#1875

Open
pinkygupta-hub wants to merge 1 commit intokruize:mvp_demofrom
pinkygupta-hub:bulk-api-label-filter-fix
Open

bulk api filter label fix with testing logs#1875
pinkygupta-hub wants to merge 1 commit intokruize:mvp_demofrom
pinkygupta-hub:bulk-api-label-filter-fix

Conversation

@pinkygupta-hub
Copy link
Copy Markdown
Contributor

@pinkygupta-hub pinkygupta-hub commented Apr 10, 2026

Description

Please describe the issue or feature and the summary of changes made to fix this.

Fixes # (issue)

Type of change

  • Bug fix
  • New feature
  • Docs update
  • Breaking change (What changes might users need to make in their application due to this PR?)
  • Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

  • New Test X
  • Functional testsuite

Test Configuration

  • Kubernetes clusters tested on:

Checklist 🎯

  • Followed coding guidelines
  • Comments added
  • Dependent changes merged
  • Documentation updated
  • Tests added or updated

Additional information

Include any additional information such as links, test results, screenshots here

Summary by Sourcery

Handle bulk API label filters when constructing queries and add detailed logging around label-based filtering and experiment creation.

New Features:

  • Support passing label filters from bulk API requests into datasource query construction so they can override the unique key when present.

Enhancements:

  • Apply a unified effective label filter to namespace, workload, and container queries, falling back to the existing uniqueKey parameter when no labels are specified.
  • Improve logging around bulk label filtering, including input filters, constructed queries, and experiment creation metrics for easier debugging.

Signed-off-by: Pinky Gupta <pinkygupta@Pinkys-MacBook-Pro.local>
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 10, 2026

Reviewer's Guide

Adjusts bulk API label filtering to prioritize explicit label filters from the request over the uniqueKey parameter and adds detailed logging around label-based filtering and experiment creation.

Sequence diagram for bulk API label filter handling in bulk job execution

sequenceDiagram
    actor User
    participant BulkJobManager
    participant DataSourceMetadataOperator
    participant DataSource

    User->>BulkJobManager: submitBulkRequest(BulkInput)
    BulkJobManager->>BulkJobManager: buildRegexFilters(filter)
    BulkJobManager->>BulkJobManager: getLabels(FilterWrapper) to build labels
    BulkJobManager->>BulkJobManager: resourceFilters.put(labels, labelFilterString)

    BulkJobManager->>DataSourceMetadataOperator: processQueriesAndPopulateDataSourceMetadataInfo(uniqueKey, includeResources, excludeResources)
    DataSourceMetadataOperator->>DataSourceMetadataOperator: labelFilter = includeResources.get(labels)
    DataSourceMetadataOperator->>DataSourceMetadataOperator: if labelFilter empty, use excludeResources.get(labels)
    DataSourceMetadataOperator->>DataSourceMetadataOperator: effectiveLabelFilter = labelFilter or uniqueKey
    DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to namespaceQuery
    DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to workloadQuery
    DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to containerQuery

    DataSourceMetadataOperator->>DataSource: fetchQueryResults(namespaceQuery, workloadQuery, containerQuery)
    DataSource-->>DataSourceMetadataOperator: metadataResults
    DataSourceMetadataOperator-->>BulkJobManager: DataSourceMetadataInfo

    BulkJobManager->>BulkJobManager: getExperimentMap(labelString, metadataInfo)
    BulkJobManager->>BulkJobManager: iterate clusters, namespaces, workloads, containers
    BulkJobManager->>BulkJobManager: createExperimentAPIObject(...) per container
    BulkJobManager-->>User: experiments created only for workloads matching label filter
Loading

Flow diagram for effective label filter selection and query modification

flowchart TD
    A[Start label filter processing] --> B[Read labelFilter from includeResources.labels]
    B --> C{labelFilter is empty?}
    C -->|Yes| D[Read labelFilter from excludeResources.labels]
    C -->|No| E[Keep labelFilter from includeResources]
    D --> F[Compute effectiveLabelFilter]
    E --> F[Compute effectiveLabelFilter]
    F --> G{effectiveLabelFilter is not empty?}
    G -->|Yes| H[Replace ADDITIONAL_LABEL in namespaceQuery with ,effectiveLabelFilter]
    H --> I[Replace ADDITIONAL_LABEL in workloadQuery with ,effectiveLabelFilter]
    I --> J[Replace ADDITIONAL_LABEL in containerQuery with ,effectiveLabelFilter]
    J --> K[Log final queries with applied label filter]
    G -->|No| L[Replace ADDITIONAL_LABEL in namespaceQuery with empty string]
    L --> M[Replace ADDITIONAL_LABEL in workloadQuery with empty string]
    M --> N[Replace ADDITIONAL_LABEL in containerQuery with empty string]
    N --> O[Log that no label filter is applied]
    K --> P[Proceed to fetch query results]
    O --> P[Proceed to fetch query results]
    P --> Q[End]
Loading

File-Level Changes

Change Details Files
Use explicit label filters from bulk API request when constructing datasource queries, falling back to uniqueKey only if no labels are provided, and enhance logging for query construction.
  • Extract a label filter string from includeResources["labels"] or excludeResources["labels"] and compute an effectiveLabelFilter that falls back to uniqueKey when labels are not provided.
  • Apply effectiveLabelFilter instead of uniqueKey when substituting KruizeConstants.KRUIZE_BULK_API.ADDITIONAL_LABEL in namespace, workload, and container queries, or remove the placeholder if no filter is available.
  • Replace generic query logging with more descriptive, prefixed logs that show raw label inputs, uniqueKey, the chosen effective label filter, and the final rendered queries.
src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java
Improve observability of bulk job experiment creation and label-based filtering behaviour in the analyzer worker.
  • Add info-level logs at the start and end of getExperimentMap to summarize processing, including workloads processed and experiments created, and show which label string is used for experiment naming.
  • Log each created experiment with cluster, namespace, workload, container, and experiment name to make it easier to correlate with label filters.
  • Extend buildRegexFilters to derive a labels entry in the resourceFilters map from BulkInput.Filter.labels using getLabels, and log both the derived label filter string and the full resourceFilters map for debugging.
src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java

Possibly linked issues

  • #(not specified in text): PR changes how label filters are built/applied in metadata queries, likely fixing the 422 datasource error seen in Bulk API.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The new [BULK API LABEL FILTER] logging is very verbose and uses INFO for what looks like diagnostic detail (including full queries and per-experiment logs); consider downgrading most of these to DEBUG or wrapping them with a debug flag to avoid log noise and potential performance overhead in production.
  • In buildRegexFilters, using double-brace initialization to construct a temporary BulkInput.FilterWrapper creates an unnecessary anonymous inner class; it would be clearer and cheaper to instantiate the wrapper normally, set include(filter), and then pass it to getLabels.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `[BULK API LABEL FILTER]` logging is very verbose and uses `INFO` for what looks like diagnostic detail (including full queries and per-experiment logs); consider downgrading most of these to `DEBUG` or wrapping them with a debug flag to avoid log noise and potential performance overhead in production.
- In `buildRegexFilters`, using double-brace initialization to construct a temporary `BulkInput.FilterWrapper` creates an unnecessary anonymous inner class; it would be clearer and cheaper to instantiate the wrapper normally, set `include(filter)`, and then pass it to `getLabels`.

## Individual Comments

### Comment 1
<location path="src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java" line_range="197-202" />
<code_context>
         MetadataProfile metadataProfile = MetadataProfileCollection.getInstance().getMetadataProfileCollection().get(metadataProfileName);

+        // Extract label filters from includeResources if present
+        String labelFilter = includeResources.getOrDefault("labels", "");
+        if (labelFilter.isEmpty()) {
+            labelFilter = excludeResources.getOrDefault("labels", "");
+        }
+        // Use labelFilter if provided, otherwise use uniqueKey
+        String effectiveLabelFilter = !labelFilter.isEmpty() ? labelFilter : uniqueKey;
+
+        LOGGER.info("[BULK API LABEL FILTER] Label filter from includeResources: {}", includeResources.getOrDefault("labels", "NOT_FOUND"));
</code_context>
<issue_to_address>
**question:** Clarify precedence between include/exclude label filters and uniqueKey.

Current behavior is `include.labels``exclude.labels``uniqueKey`: if `include.labels` is set, `exclude.labels` and `uniqueKey` are ignored; if `exclude.labels` is set, `uniqueKey` is ignored. If callers set both include and exclude labels, this precedence may be surprising. Either validate and reject conflicting combinations or document this precedence clearly in the API contract to avoid subtle misconfigurations.
</issue_to_address>

### Comment 2
<location path="src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java" line_range="514-522" />
<code_context>
         try {
             Map<String, CreateExperimentAPIObject> createExperimentAPIObjectMap = new HashMap<>();
             Collection<DataSource> dataSourceCollection = metadataInfo.getDatasources().values();
+            LOGGER.info("[BULK API LABEL FILTER] Starting to process metadata and create experiments");
+            LOGGER.info("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE");
+            
</code_context>
<issue_to_address>
**suggestion (performance):** Consider lowering log level to avoid excessive INFO noise in production.

The per-experiment logs in the inner loop will generate a lot of INFO entries for large bulk jobs, increasing log volume and I/O. Consider moving these detailed logs (especially the per-experiment ones) to DEBUG/TRACE, keeping only a brief summary at INFO or gating the detailed logs behind a feature flag for troubleshooting.

Suggested implementation:

```java
            // High-level summary at INFO to avoid excessive noise
            LOGGER.info("[BULK API LABEL FILTER] Starting bulk metadata processing to create experiments");

            // Detailed configuration logged at DEBUG level for troubleshooting
            LOGGER.debug("[BULK API LABEL FILTER] Label string used for experiment naming: {}",
                    labelString != null ? labelString : "NONE");

```

From your comment, there are likely additional `LOGGER.info` calls inside the inner loops (e.g., per workload / per experiment) that will produce one log line per experiment. To fully implement your suggestion:

1. Locate all per-workload / per-experiment logs in this method (and related helper methods invoked inside the loop) and:
   - Change `LOGGER.info(...)` to `LOGGER.debug(...)` (or `LOGGER.trace(...)` if they are extremely detailed).
2. Optionally, keep a small number of aggregate logs at INFO, such as:
   - Total workloads processed.
   - Total experiments created.
3. If your codebase uses feature flags / config for verbose logging, consider wrapping the detailed DEBUG/TRACE logs in a conditional check (e.g., `if (loggingConfig.isBulkJobVerboseEnabled())`) so you can turn them on for troubleshooting without relying solely on global log level changes.
</issue_to_address>

### Comment 3
<location path="src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java" line_range="596-606" />
<code_context>
                     filter.getContainers().stream().map(String::trim).collect(Collectors.joining("|")) : "");
+            // Add label filters to the resource filters map
+            if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
+                String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{
+                    setInclude(filter);
+                }});
+                resourceFilters.put("labels", labelFilterString);
+                LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
+                LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
+            } else {
+                LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
</code_context>
<issue_to_address>
**suggestion:** Avoid double-brace initialization when creating FilterWrapper for getLabels.

This `new BulkInput.FilterWrapper() {{ setInclude(filter); }}` construct creates an anonymous subclass, adds overhead, and may hold an implicit reference to the enclosing instance. Prefer passing `BulkInput.Filter` directly to `getLabels` if feasible, or instantiate and configure `FilterWrapper` explicitly (e.g., `FilterWrapper wrapper = new FilterWrapper(); wrapper.setInclude(filter);`) to avoid double-brace initialization.

```suggestion
            // Add label filters to the resource filters map
            if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
                BulkInput.FilterWrapper filterWrapper = new BulkInput.FilterWrapper();
                filterWrapper.setInclude(filter);
                String labelFilterString = getLabels(filterWrapper);
                resourceFilters.put("labels", labelFilterString);
                LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
                LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
            } else {
                LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
            }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +197 to +202
String labelFilter = includeResources.getOrDefault("labels", "");
if (labelFilter.isEmpty()) {
labelFilter = excludeResources.getOrDefault("labels", "");
}
// Use labelFilter if provided, otherwise use uniqueKey
String effectiveLabelFilter = !labelFilter.isEmpty() ? labelFilter : uniqueKey;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Clarify precedence between include/exclude label filters and uniqueKey.

Current behavior is include.labelsexclude.labelsuniqueKey: if include.labels is set, exclude.labels and uniqueKey are ignored; if exclude.labels is set, uniqueKey is ignored. If callers set both include and exclude labels, this precedence may be surprising. Either validate and reject conflicting combinations or document this precedence clearly in the API contract to avoid subtle misconfigurations.

Comment on lines +514 to 522
LOGGER.info("[BULK API LABEL FILTER] Starting to process metadata and create experiments");
LOGGER.info("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE");

int totalWorkloadsProcessed = 0;
int totalExperimentsCreated = 0;

for (DataSource ds : dataSourceCollection) {
HashMap<String, DataSourceCluster> clusterHashMap = ds.getClusters();
for (DataSourceCluster dsc : clusterHashMap.values()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider lowering log level to avoid excessive INFO noise in production.

The per-experiment logs in the inner loop will generate a lot of INFO entries for large bulk jobs, increasing log volume and I/O. Consider moving these detailed logs (especially the per-experiment ones) to DEBUG/TRACE, keeping only a brief summary at INFO or gating the detailed logs behind a feature flag for troubleshooting.

Suggested implementation:

            // High-level summary at INFO to avoid excessive noise
            LOGGER.info("[BULK API LABEL FILTER] Starting bulk metadata processing to create experiments");

            // Detailed configuration logged at DEBUG level for troubleshooting
            LOGGER.debug("[BULK API LABEL FILTER] Label string used for experiment naming: {}",
                    labelString != null ? labelString : "NONE");

From your comment, there are likely additional LOGGER.info calls inside the inner loops (e.g., per workload / per experiment) that will produce one log line per experiment. To fully implement your suggestion:

  1. Locate all per-workload / per-experiment logs in this method (and related helper methods invoked inside the loop) and:
    • Change LOGGER.info(...) to LOGGER.debug(...) (or LOGGER.trace(...) if they are extremely detailed).
  2. Optionally, keep a small number of aggregate logs at INFO, such as:
    • Total workloads processed.
    • Total experiments created.
  3. If your codebase uses feature flags / config for verbose logging, consider wrapping the detailed DEBUG/TRACE logs in a conditional check (e.g., if (loggingConfig.isBulkJobVerboseEnabled())) so you can turn them on for troubleshooting without relying solely on global log level changes.

Comment on lines +596 to +606
// Add label filters to the resource filters map
if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{
setInclude(filter);
}});
resourceFilters.put("labels", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
} else {
LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Avoid double-brace initialization when creating FilterWrapper for getLabels.

This new BulkInput.FilterWrapper() {{ setInclude(filter); }} construct creates an anonymous subclass, adds overhead, and may hold an implicit reference to the enclosing instance. Prefer passing BulkInput.Filter directly to getLabels if feasible, or instantiate and configure FilterWrapper explicitly (e.g., FilterWrapper wrapper = new FilterWrapper(); wrapper.setInclude(filter);) to avoid double-brace initialization.

Suggested change
// Add label filters to the resource filters map
if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{
setInclude(filter);
}});
resourceFilters.put("labels", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
} else {
LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
}
// Add label filters to the resource filters map
if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
BulkInput.FilterWrapper filterWrapper = new BulkInput.FilterWrapper();
filterWrapper.setInclude(filter);
String labelFilterString = getLabels(filterWrapper);
resourceFilters.put("labels", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
} else {
LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant