bulk api filter label fix with testing logs by pinkygupta-hub · Pull Request #1875 · kruize/autotune

pinkygupta-hub · 2026-04-10T11:18:01Z

Description

Please describe the issue or feature and the summary of changes made to fix this.

Fixes # (issue)

Type of change

Bug fix
New feature
Docs update
Breaking change (What changes might users need to make in their application due to this PR?)
Requires DB changes

How has this been tested?

Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.

New Test X
Functional testsuite

Test Configuration

Kubernetes clusters tested on:

Checklist 🎯

Additional information

Include any additional information such as links, test results, screenshots here

Summary by Sourcery

Handle bulk API label filters when constructing queries and add detailed logging around label-based filtering and experiment creation.

New Features:

Support passing label filters from bulk API requests into datasource query construction so they can override the unique key when present.

Enhancements:

Apply a unified effective label filter to namespace, workload, and container queries, falling back to the existing uniqueKey parameter when no labels are specified.
Improve logging around bulk label filtering, including input filters, constructed queries, and experiment creation metrics for easier debugging.

Signed-off-by: Pinky Gupta <pinkygupta@Pinkys-MacBook-Pro.local>

sourcery-ai · 2026-04-10T11:18:19Z

Reviewer's Guide

Adjusts bulk API label filtering to prioritize explicit label filters from the request over the uniqueKey parameter and adds detailed logging around label-based filtering and experiment creation.

Sequence diagram for bulk API label filter handling in bulk job execution

sequenceDiagram
    actor User
    participant BulkJobManager
    participant DataSourceMetadataOperator
    participant DataSource

    User->>BulkJobManager: submitBulkRequest(BulkInput)
    BulkJobManager->>BulkJobManager: buildRegexFilters(filter)
    BulkJobManager->>BulkJobManager: getLabels(FilterWrapper) to build labels
    BulkJobManager->>BulkJobManager: resourceFilters.put(labels, labelFilterString)

    BulkJobManager->>DataSourceMetadataOperator: processQueriesAndPopulateDataSourceMetadataInfo(uniqueKey, includeResources, excludeResources)
    DataSourceMetadataOperator->>DataSourceMetadataOperator: labelFilter = includeResources.get(labels)
    DataSourceMetadataOperator->>DataSourceMetadataOperator: if labelFilter empty, use excludeResources.get(labels)
    DataSourceMetadataOperator->>DataSourceMetadataOperator: effectiveLabelFilter = labelFilter or uniqueKey
    DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to namespaceQuery
    DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to workloadQuery
    DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to containerQuery

    DataSourceMetadataOperator->>DataSource: fetchQueryResults(namespaceQuery, workloadQuery, containerQuery)
    DataSource-->>DataSourceMetadataOperator: metadataResults
    DataSourceMetadataOperator-->>BulkJobManager: DataSourceMetadataInfo

    BulkJobManager->>BulkJobManager: getExperimentMap(labelString, metadataInfo)
    BulkJobManager->>BulkJobManager: iterate clusters, namespaces, workloads, containers
    BulkJobManager->>BulkJobManager: createExperimentAPIObject(...) per container
    BulkJobManager-->>User: experiments created only for workloads matching label filter

Flow diagram for effective label filter selection and query modification

flowchart TD
    A[Start label filter processing] --> B[Read labelFilter from includeResources.labels]
    B --> C{labelFilter is empty?}
    C -->|Yes| D[Read labelFilter from excludeResources.labels]
    C -->|No| E[Keep labelFilter from includeResources]
    D --> F[Compute effectiveLabelFilter]
    E --> F[Compute effectiveLabelFilter]
    F --> G{effectiveLabelFilter is not empty?}
    G -->|Yes| H[Replace ADDITIONAL_LABEL in namespaceQuery with ,effectiveLabelFilter]
    H --> I[Replace ADDITIONAL_LABEL in workloadQuery with ,effectiveLabelFilter]
    I --> J[Replace ADDITIONAL_LABEL in containerQuery with ,effectiveLabelFilter]
    J --> K[Log final queries with applied label filter]
    G -->|No| L[Replace ADDITIONAL_LABEL in namespaceQuery with empty string]
    L --> M[Replace ADDITIONAL_LABEL in workloadQuery with empty string]
    M --> N[Replace ADDITIONAL_LABEL in containerQuery with empty string]
    N --> O[Log that no label filter is applied]
    K --> P[Proceed to fetch query results]
    O --> P[Proceed to fetch query results]
    P --> Q[End]

File-Level Changes

Change	Details	Files
Use explicit label filters from bulk API request when constructing datasource queries, falling back to uniqueKey only if no labels are provided, and enhance logging for query construction.	Extract a label filter string from includeResources["labels"] or excludeResources["labels"] and compute an effectiveLabelFilter that falls back to uniqueKey when labels are not provided. Apply effectiveLabelFilter instead of uniqueKey when substituting KruizeConstants.KRUIZE_BULK_API.ADDITIONAL_LABEL in namespace, workload, and container queries, or remove the placeholder if no filter is available. Replace generic query logging with more descriptive, prefixed logs that show raw label inputs, uniqueKey, the chosen effective label filter, and the final rendered queries.	`src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java`
Improve observability of bulk job experiment creation and label-based filtering behaviour in the analyzer worker.	Add info-level logs at the start and end of getExperimentMap to summarize processing, including workloads processed and experiments created, and show which label string is used for experiment naming. Log each created experiment with cluster, namespace, workload, container, and experiment name to make it easier to correlate with label filters. Extend buildRegexFilters to derive a labels entry in the resourceFilters map from BulkInput.Filter.labels using getLabels, and log both the derived label filter string and the full resourceFilters map for debugging.	`src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java`

Possibly linked issues

#(not specified in text): PR changes how label filters are built/applied in metadata queries, likely fixing the 422 datasource error seen in Bulk API.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

The new [BULK API LABEL FILTER] logging is very verbose and uses INFO for what looks like diagnostic detail (including full queries and per-experiment logs); consider downgrading most of these to DEBUG or wrapping them with a debug flag to avoid log noise and potential performance overhead in production.
In buildRegexFilters, using double-brace initialization to construct a temporary BulkInput.FilterWrapper creates an unnecessary anonymous inner class; it would be clearer and cheaper to instantiate the wrapper normally, set include(filter), and then pass it to getLabels.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `[BULK API LABEL FILTER]` logging is very verbose and uses `INFO` for what looks like diagnostic detail (including full queries and per-experiment logs); consider downgrading most of these to `DEBUG` or wrapping them with a debug flag to avoid log noise and potential performance overhead in production.
- In `buildRegexFilters`, using double-brace initialization to construct a temporary `BulkInput.FilterWrapper` creates an unnecessary anonymous inner class; it would be clearer and cheaper to instantiate the wrapper normally, set `include(filter)`, and then pass it to `getLabels`.

## Individual Comments

### Comment 1
<location path="src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java" line_range="197-202" />
<code_context>
         MetadataProfile metadataProfile = MetadataProfileCollection.getInstance().getMetadataProfileCollection().get(metadataProfileName);

+        // Extract label filters from includeResources if present
+        String labelFilter = includeResources.getOrDefault("labels", "");
+        if (labelFilter.isEmpty()) {
+            labelFilter = excludeResources.getOrDefault("labels", "");
+        }
+        // Use labelFilter if provided, otherwise use uniqueKey
+        String effectiveLabelFilter = !labelFilter.isEmpty() ? labelFilter : uniqueKey;
+
+        LOGGER.info("[BULK API LABEL FILTER] Label filter from includeResources: {}", includeResources.getOrDefault("labels", "NOT_FOUND"));
</code_context>
<issue_to_address>
**question:** Clarify precedence between include/exclude label filters and uniqueKey.

Current behavior is `include.labels` → `exclude.labels` → `uniqueKey`: if `include.labels` is set, `exclude.labels` and `uniqueKey` are ignored; if `exclude.labels` is set, `uniqueKey` is ignored. If callers set both include and exclude labels, this precedence may be surprising. Either validate and reject conflicting combinations or document this precedence clearly in the API contract to avoid subtle misconfigurations.
</issue_to_address>

### Comment 2
<location path="src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java" line_range="514-522" />
<code_context>
         try {
             Map<String, CreateExperimentAPIObject> createExperimentAPIObjectMap = new HashMap<>();
             Collection<DataSource> dataSourceCollection = metadataInfo.getDatasources().values();
+            LOGGER.info("[BULK API LABEL FILTER] Starting to process metadata and create experiments");
+            LOGGER.info("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE");
+            
</code_context>
<issue_to_address>
**suggestion (performance):** Consider lowering log level to avoid excessive INFO noise in production.

The per-experiment logs in the inner loop will generate a lot of INFO entries for large bulk jobs, increasing log volume and I/O. Consider moving these detailed logs (especially the per-experiment ones) to DEBUG/TRACE, keeping only a brief summary at INFO or gating the detailed logs behind a feature flag for troubleshooting.

Suggested implementation:

```java
            // High-level summary at INFO to avoid excessive noise
            LOGGER.info("[BULK API LABEL FILTER] Starting bulk metadata processing to create experiments");

            // Detailed configuration logged at DEBUG level for troubleshooting
            LOGGER.debug("[BULK API LABEL FILTER] Label string used for experiment naming: {}",
                    labelString != null ? labelString : "NONE");

```

From your comment, there are likely additional `LOGGER.info` calls inside the inner loops (e.g., per workload / per experiment) that will produce one log line per experiment. To fully implement your suggestion:

1. Locate all per-workload / per-experiment logs in this method (and related helper methods invoked inside the loop) and:
   - Change `LOGGER.info(...)` to `LOGGER.debug(...)` (or `LOGGER.trace(...)` if they are extremely detailed).
2. Optionally, keep a small number of aggregate logs at INFO, such as:
   - Total workloads processed.
   - Total experiments created.
3. If your codebase uses feature flags / config for verbose logging, consider wrapping the detailed DEBUG/TRACE logs in a conditional check (e.g., `if (loggingConfig.isBulkJobVerboseEnabled())`) so you can turn them on for troubleshooting without relying solely on global log level changes.
</issue_to_address>

### Comment 3
<location path="src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java" line_range="596-606" />
<code_context>
                     filter.getContainers().stream().map(String::trim).collect(Collectors.joining("|")) : "");
+            // Add label filters to the resource filters map
+            if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
+                String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{
+                    setInclude(filter);
+                }});
+                resourceFilters.put("labels", labelFilterString);
+                LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
+                LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
+            } else {
+                LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
</code_context>
<issue_to_address>
**suggestion:** Avoid double-brace initialization when creating FilterWrapper for getLabels.

This `new BulkInput.FilterWrapper() {{ setInclude(filter); }}` construct creates an anonymous subclass, adds overhead, and may hold an implicit reference to the enclosing instance. Prefer passing `BulkInput.Filter` directly to `getLabels` if feasible, or instantiate and configure `FilterWrapper` explicitly (e.g., `FilterWrapper wrapper = new FilterWrapper(); wrapper.setInclude(filter);`) to avoid double-brace initialization.

```suggestion
            // Add label filters to the resource filters map
            if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
                BulkInput.FilterWrapper filterWrapper = new BulkInput.FilterWrapper();
                filterWrapper.setInclude(filter);
                String labelFilterString = getLabels(filterWrapper);
                resourceFilters.put("labels", labelFilterString);
                LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
                LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
            } else {
                LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
            }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-04-10T11:19:25Z

src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java

+        String labelFilter = includeResources.getOrDefault("labels", "");
+        if (labelFilter.isEmpty()) {
+            labelFilter = excludeResources.getOrDefault("labels", "");
+        }
+        // Use labelFilter if provided, otherwise use uniqueKey
+        String effectiveLabelFilter = !labelFilter.isEmpty() ? labelFilter : uniqueKey;


question: Clarify precedence between include/exclude label filters and uniqueKey.

Current behavior is include.labels → exclude.labels → uniqueKey: if include.labels is set, exclude.labels and uniqueKey are ignored; if exclude.labels is set, uniqueKey is ignored. If callers set both include and exclude labels, this precedence may be surprising. Either validate and reject conflicting combinations or document this precedence clearly in the API contract to avoid subtle misconfigurations.

sourcery-ai · 2026-04-10T11:19:25Z

src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java

+            LOGGER.info("[BULK API LABEL FILTER] Starting to process metadata and create experiments");
+            LOGGER.info("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE");
+
+            int totalWorkloadsProcessed = 0;
+            int totalExperimentsCreated = 0;
+
            for (DataSource ds : dataSourceCollection) {
                HashMap<String, DataSourceCluster> clusterHashMap = ds.getClusters();
                for (DataSourceCluster dsc : clusterHashMap.values()) {


suggestion (performance): Consider lowering log level to avoid excessive INFO noise in production.

The per-experiment logs in the inner loop will generate a lot of INFO entries for large bulk jobs, increasing log volume and I/O. Consider moving these detailed logs (especially the per-experiment ones) to DEBUG/TRACE, keeping only a brief summary at INFO or gating the detailed logs behind a feature flag for troubleshooting.

Suggested implementation:

// High-level summary at INFO to avoid excessive noise LOGGER.info("[BULK API LABEL FILTER] Starting bulk metadata processing to create experiments"); // Detailed configuration logged at DEBUG level for troubleshooting LOGGER.debug("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE");

From your comment, there are likely additional LOGGER.info calls inside the inner loops (e.g., per workload / per experiment) that will produce one log line per experiment. To fully implement your suggestion:

Locate all per-workload / per-experiment logs in this method (and related helper methods invoked inside the loop) and:

Change LOGGER.info(...) to LOGGER.debug(...) (or LOGGER.trace(...) if they are extremely detailed).

Optionally, keep a small number of aggregate logs at INFO, such as:

Total workloads processed.

Total experiments created.

If your codebase uses feature flags / config for verbose logging, consider wrapping the detailed DEBUG/TRACE logs in a conditional check (e.g., if (loggingConfig.isBulkJobVerboseEnabled())) so you can turn them on for troubleshooting without relying solely on global log level changes.

sourcery-ai · 2026-04-10T11:19:25Z

src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java

+            // Add label filters to the resource filters map
+            if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
+                String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{
+                    setInclude(filter);
+                }});
+                resourceFilters.put("labels", labelFilterString);
+                LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
+                LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
+            } else {
+                LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
+            }


suggestion: Avoid double-brace initialization when creating FilterWrapper for getLabels.

This new BulkInput.FilterWrapper() {{ setInclude(filter); }} construct creates an anonymous subclass, adds overhead, and may hold an implicit reference to the enclosing instance. Prefer passing BulkInput.Filter directly to getLabels if feasible, or instantiate and configure FilterWrapper explicitly (e.g., FilterWrapper wrapper = new FilterWrapper(); wrapper.setInclude(filter);) to avoid double-brace initialization.

Suggested change

// Add label filters to the resource filters map

if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {

String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{

setInclude(filter);

}});

resourceFilters.put("labels", labelFilterString);

LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);

LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());

} else {

LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");

}

// Add label filters to the resource filters map

if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {

BulkInput.FilterWrapper filterWrapper = new BulkInput.FilterWrapper();

filterWrapper.setInclude(filter);

String labelFilterString = getLabels(filterWrapper);

resourceFilters.put("labels", labelFilterString);

LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);

LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());

} else {

LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");

}

bulk api filter label fix with testing logs

30c9b2e

Signed-off-by: Pinky Gupta <pinkygupta@Pinkys-MacBook-Pro.local>

sourcery-ai bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulk api filter label fix with testing logs#1875

bulk api filter label fix with testing logs#1875
pinkygupta-hub wants to merge 1 commit intokruize:mvp_demofrom
pinkygupta-hub:bulk-api-label-filter-fix

pinkygupta-hub commented Apr 10, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Apr 10, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Apr 10, 2026

Uh oh!

sourcery-ai bot Apr 10, 2026

Uh oh!

sourcery-ai bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pinkygupta-hub commented Apr 10, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

How has this been tested?

Checklist 🎯

Additional information

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for bulk API label filter handling in bulk job execution

Flow diagram for effective label filter selection and query modification

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pinkygupta-hub commented Apr 10, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 10, 2026 •

edited

Loading