bulk api filter label fix with testing logs#1875
bulk api filter label fix with testing logs#1875pinkygupta-hub wants to merge 1 commit intokruize:mvp_demofrom
Conversation
Signed-off-by: Pinky Gupta <pinkygupta@Pinkys-MacBook-Pro.local>
Reviewer's GuideAdjusts bulk API label filtering to prioritize explicit label filters from the request over the uniqueKey parameter and adds detailed logging around label-based filtering and experiment creation. Sequence diagram for bulk API label filter handling in bulk job executionsequenceDiagram
actor User
participant BulkJobManager
participant DataSourceMetadataOperator
participant DataSource
User->>BulkJobManager: submitBulkRequest(BulkInput)
BulkJobManager->>BulkJobManager: buildRegexFilters(filter)
BulkJobManager->>BulkJobManager: getLabels(FilterWrapper) to build labels
BulkJobManager->>BulkJobManager: resourceFilters.put(labels, labelFilterString)
BulkJobManager->>DataSourceMetadataOperator: processQueriesAndPopulateDataSourceMetadataInfo(uniqueKey, includeResources, excludeResources)
DataSourceMetadataOperator->>DataSourceMetadataOperator: labelFilter = includeResources.get(labels)
DataSourceMetadataOperator->>DataSourceMetadataOperator: if labelFilter empty, use excludeResources.get(labels)
DataSourceMetadataOperator->>DataSourceMetadataOperator: effectiveLabelFilter = labelFilter or uniqueKey
DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to namespaceQuery
DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to workloadQuery
DataSourceMetadataOperator->>DataSourceMetadataOperator: apply effectiveLabelFilter to containerQuery
DataSourceMetadataOperator->>DataSource: fetchQueryResults(namespaceQuery, workloadQuery, containerQuery)
DataSource-->>DataSourceMetadataOperator: metadataResults
DataSourceMetadataOperator-->>BulkJobManager: DataSourceMetadataInfo
BulkJobManager->>BulkJobManager: getExperimentMap(labelString, metadataInfo)
BulkJobManager->>BulkJobManager: iterate clusters, namespaces, workloads, containers
BulkJobManager->>BulkJobManager: createExperimentAPIObject(...) per container
BulkJobManager-->>User: experiments created only for workloads matching label filter
Flow diagram for effective label filter selection and query modificationflowchart TD
A[Start label filter processing] --> B[Read labelFilter from includeResources.labels]
B --> C{labelFilter is empty?}
C -->|Yes| D[Read labelFilter from excludeResources.labels]
C -->|No| E[Keep labelFilter from includeResources]
D --> F[Compute effectiveLabelFilter]
E --> F[Compute effectiveLabelFilter]
F --> G{effectiveLabelFilter is not empty?}
G -->|Yes| H[Replace ADDITIONAL_LABEL in namespaceQuery with ,effectiveLabelFilter]
H --> I[Replace ADDITIONAL_LABEL in workloadQuery with ,effectiveLabelFilter]
I --> J[Replace ADDITIONAL_LABEL in containerQuery with ,effectiveLabelFilter]
J --> K[Log final queries with applied label filter]
G -->|No| L[Replace ADDITIONAL_LABEL in namespaceQuery with empty string]
L --> M[Replace ADDITIONAL_LABEL in workloadQuery with empty string]
M --> N[Replace ADDITIONAL_LABEL in containerQuery with empty string]
N --> O[Log that no label filter is applied]
K --> P[Proceed to fetch query results]
O --> P[Proceed to fetch query results]
P --> Q[End]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The new
[BULK API LABEL FILTER]logging is very verbose and usesINFOfor what looks like diagnostic detail (including full queries and per-experiment logs); consider downgrading most of these toDEBUGor wrapping them with a debug flag to avoid log noise and potential performance overhead in production. - In
buildRegexFilters, using double-brace initialization to construct a temporaryBulkInput.FilterWrappercreates an unnecessary anonymous inner class; it would be clearer and cheaper to instantiate the wrapper normally, setinclude(filter), and then pass it togetLabels.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `[BULK API LABEL FILTER]` logging is very verbose and uses `INFO` for what looks like diagnostic detail (including full queries and per-experiment logs); consider downgrading most of these to `DEBUG` or wrapping them with a debug flag to avoid log noise and potential performance overhead in production.
- In `buildRegexFilters`, using double-brace initialization to construct a temporary `BulkInput.FilterWrapper` creates an unnecessary anonymous inner class; it would be clearer and cheaper to instantiate the wrapper normally, set `include(filter)`, and then pass it to `getLabels`.
## Individual Comments
### Comment 1
<location path="src/main/java/com/autotune/common/datasource/DataSourceMetadataOperator.java" line_range="197-202" />
<code_context>
MetadataProfile metadataProfile = MetadataProfileCollection.getInstance().getMetadataProfileCollection().get(metadataProfileName);
+ // Extract label filters from includeResources if present
+ String labelFilter = includeResources.getOrDefault("labels", "");
+ if (labelFilter.isEmpty()) {
+ labelFilter = excludeResources.getOrDefault("labels", "");
+ }
+ // Use labelFilter if provided, otherwise use uniqueKey
+ String effectiveLabelFilter = !labelFilter.isEmpty() ? labelFilter : uniqueKey;
+
+ LOGGER.info("[BULK API LABEL FILTER] Label filter from includeResources: {}", includeResources.getOrDefault("labels", "NOT_FOUND"));
</code_context>
<issue_to_address>
**question:** Clarify precedence between include/exclude label filters and uniqueKey.
Current behavior is `include.labels` → `exclude.labels` → `uniqueKey`: if `include.labels` is set, `exclude.labels` and `uniqueKey` are ignored; if `exclude.labels` is set, `uniqueKey` is ignored. If callers set both include and exclude labels, this precedence may be surprising. Either validate and reject conflicting combinations or document this precedence clearly in the API contract to avoid subtle misconfigurations.
</issue_to_address>
### Comment 2
<location path="src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java" line_range="514-522" />
<code_context>
try {
Map<String, CreateExperimentAPIObject> createExperimentAPIObjectMap = new HashMap<>();
Collection<DataSource> dataSourceCollection = metadataInfo.getDatasources().values();
+ LOGGER.info("[BULK API LABEL FILTER] Starting to process metadata and create experiments");
+ LOGGER.info("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE");
+
</code_context>
<issue_to_address>
**suggestion (performance):** Consider lowering log level to avoid excessive INFO noise in production.
The per-experiment logs in the inner loop will generate a lot of INFO entries for large bulk jobs, increasing log volume and I/O. Consider moving these detailed logs (especially the per-experiment ones) to DEBUG/TRACE, keeping only a brief summary at INFO or gating the detailed logs behind a feature flag for troubleshooting.
Suggested implementation:
```java
// High-level summary at INFO to avoid excessive noise
LOGGER.info("[BULK API LABEL FILTER] Starting bulk metadata processing to create experiments");
// Detailed configuration logged at DEBUG level for troubleshooting
LOGGER.debug("[BULK API LABEL FILTER] Label string used for experiment naming: {}",
labelString != null ? labelString : "NONE");
```
From your comment, there are likely additional `LOGGER.info` calls inside the inner loops (e.g., per workload / per experiment) that will produce one log line per experiment. To fully implement your suggestion:
1. Locate all per-workload / per-experiment logs in this method (and related helper methods invoked inside the loop) and:
- Change `LOGGER.info(...)` to `LOGGER.debug(...)` (or `LOGGER.trace(...)` if they are extremely detailed).
2. Optionally, keep a small number of aggregate logs at INFO, such as:
- Total workloads processed.
- Total experiments created.
3. If your codebase uses feature flags / config for verbose logging, consider wrapping the detailed DEBUG/TRACE logs in a conditional check (e.g., `if (loggingConfig.isBulkJobVerboseEnabled())`) so you can turn them on for troubleshooting without relying solely on global log level changes.
</issue_to_address>
### Comment 3
<location path="src/main/java/com/autotune/analyzer/workerimpl/BulkJobManager.java" line_range="596-606" />
<code_context>
filter.getContainers().stream().map(String::trim).collect(Collectors.joining("|")) : "");
+ // Add label filters to the resource filters map
+ if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
+ String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{
+ setInclude(filter);
+ }});
+ resourceFilters.put("labels", labelFilterString);
+ LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
+ LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
+ } else {
+ LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
</code_context>
<issue_to_address>
**suggestion:** Avoid double-brace initialization when creating FilterWrapper for getLabels.
This `new BulkInput.FilterWrapper() {{ setInclude(filter); }}` construct creates an anonymous subclass, adds overhead, and may hold an implicit reference to the enclosing instance. Prefer passing `BulkInput.Filter` directly to `getLabels` if feasible, or instantiate and configure `FilterWrapper` explicitly (e.g., `FilterWrapper wrapper = new FilterWrapper(); wrapper.setInclude(filter);`) to avoid double-brace initialization.
```suggestion
// Add label filters to the resource filters map
if (filter.getLabels() != null && !filter.getLabels().isEmpty()) {
BulkInput.FilterWrapper filterWrapper = new BulkInput.FilterWrapper();
filterWrapper.setInclude(filter);
String labelFilterString = getLabels(filterWrapper);
resourceFilters.put("labels", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString);
LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels());
} else {
LOGGER.info("[BULK API LABEL FILTER] No label filters found in request");
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| String labelFilter = includeResources.getOrDefault("labels", ""); | ||
| if (labelFilter.isEmpty()) { | ||
| labelFilter = excludeResources.getOrDefault("labels", ""); | ||
| } | ||
| // Use labelFilter if provided, otherwise use uniqueKey | ||
| String effectiveLabelFilter = !labelFilter.isEmpty() ? labelFilter : uniqueKey; |
There was a problem hiding this comment.
question: Clarify precedence between include/exclude label filters and uniqueKey.
Current behavior is include.labels → exclude.labels → uniqueKey: if include.labels is set, exclude.labels and uniqueKey are ignored; if exclude.labels is set, uniqueKey is ignored. If callers set both include and exclude labels, this precedence may be surprising. Either validate and reject conflicting combinations or document this precedence clearly in the API contract to avoid subtle misconfigurations.
| LOGGER.info("[BULK API LABEL FILTER] Starting to process metadata and create experiments"); | ||
| LOGGER.info("[BULK API LABEL FILTER] Label string used for experiment naming: {}", labelString != null ? labelString : "NONE"); | ||
|
|
||
| int totalWorkloadsProcessed = 0; | ||
| int totalExperimentsCreated = 0; | ||
|
|
||
| for (DataSource ds : dataSourceCollection) { | ||
| HashMap<String, DataSourceCluster> clusterHashMap = ds.getClusters(); | ||
| for (DataSourceCluster dsc : clusterHashMap.values()) { |
There was a problem hiding this comment.
suggestion (performance): Consider lowering log level to avoid excessive INFO noise in production.
The per-experiment logs in the inner loop will generate a lot of INFO entries for large bulk jobs, increasing log volume and I/O. Consider moving these detailed logs (especially the per-experiment ones) to DEBUG/TRACE, keeping only a brief summary at INFO or gating the detailed logs behind a feature flag for troubleshooting.
Suggested implementation:
// High-level summary at INFO to avoid excessive noise
LOGGER.info("[BULK API LABEL FILTER] Starting bulk metadata processing to create experiments");
// Detailed configuration logged at DEBUG level for troubleshooting
LOGGER.debug("[BULK API LABEL FILTER] Label string used for experiment naming: {}",
labelString != null ? labelString : "NONE");From your comment, there are likely additional LOGGER.info calls inside the inner loops (e.g., per workload / per experiment) that will produce one log line per experiment. To fully implement your suggestion:
- Locate all per-workload / per-experiment logs in this method (and related helper methods invoked inside the loop) and:
- Change
LOGGER.info(...)toLOGGER.debug(...)(orLOGGER.trace(...)if they are extremely detailed).
- Change
- Optionally, keep a small number of aggregate logs at INFO, such as:
- Total workloads processed.
- Total experiments created.
- If your codebase uses feature flags / config for verbose logging, consider wrapping the detailed DEBUG/TRACE logs in a conditional check (e.g.,
if (loggingConfig.isBulkJobVerboseEnabled())) so you can turn them on for troubleshooting without relying solely on global log level changes.
| // Add label filters to the resource filters map | ||
| if (filter.getLabels() != null && !filter.getLabels().isEmpty()) { | ||
| String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{ | ||
| setInclude(filter); | ||
| }}); | ||
| resourceFilters.put("labels", labelFilterString); | ||
| LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString); | ||
| LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels()); | ||
| } else { | ||
| LOGGER.info("[BULK API LABEL FILTER] No label filters found in request"); | ||
| } |
There was a problem hiding this comment.
suggestion: Avoid double-brace initialization when creating FilterWrapper for getLabels.
This new BulkInput.FilterWrapper() {{ setInclude(filter); }} construct creates an anonymous subclass, adds overhead, and may hold an implicit reference to the enclosing instance. Prefer passing BulkInput.Filter directly to getLabels if feasible, or instantiate and configure FilterWrapper explicitly (e.g., FilterWrapper wrapper = new FilterWrapper(); wrapper.setInclude(filter);) to avoid double-brace initialization.
| // Add label filters to the resource filters map | |
| if (filter.getLabels() != null && !filter.getLabels().isEmpty()) { | |
| String labelFilterString = getLabels(new BulkInput.FilterWrapper() {{ | |
| setInclude(filter); | |
| }}); | |
| resourceFilters.put("labels", labelFilterString); | |
| LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString); | |
| LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels()); | |
| } else { | |
| LOGGER.info("[BULK API LABEL FILTER] No label filters found in request"); | |
| } | |
| // Add label filters to the resource filters map | |
| if (filter.getLabels() != null && !filter.getLabels().isEmpty()) { | |
| BulkInput.FilterWrapper filterWrapper = new BulkInput.FilterWrapper(); | |
| filterWrapper.setInclude(filter); | |
| String labelFilterString = getLabels(filterWrapper); | |
| resourceFilters.put("labels", labelFilterString); | |
| LOGGER.info("[BULK API LABEL FILTER] Label filters extracted from request: {}", labelFilterString); | |
| LOGGER.info("[BULK API LABEL FILTER] Label filter map: {}", filter.getLabels()); | |
| } else { | |
| LOGGER.info("[BULK API LABEL FILTER] No label filters found in request"); | |
| } |
Description
Please describe the issue or feature and the summary of changes made to fix this.
Fixes # (issue)
Type of change
How has this been tested?
Please describe the tests that were run to verify your changes and steps to reproduce. Please specify any test configuration required.
Test Configuration
Checklist 🎯
Additional information
Include any additional information such as links, test results, screenshots here
Summary by Sourcery
Handle bulk API label filters when constructing queries and add detailed logging around label-based filtering and experiment creation.
New Features:
Enhancements: