Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/windowsperfcounters] Instance wildcard doesn't capture _Total instance #29054

Closed
pjanotti opened this issue Nov 8, 2023 · 6 comments · Fixed by #33692
Closed

[receiver/windowsperfcounters] Instance wildcard doesn't capture _Total instance #29054

pjanotti opened this issue Nov 8, 2023 · 6 comments · Fixed by #33692
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed pkg/winperfcounters receiver/windowsperfcounters

Comments

@pjanotti
Copy link
Contributor

pjanotti commented Nov 8, 2023

Component(s)

receiver/windowsperfcounters

What happened?

Description

When the instance is set to "*" the user wants to capture all instances of a counter like \Processor(*)\%Process Time. However, the wildcard doesn't capture the _Total instance.

Steps to Reproduce

Run the collector using a config like this one:

receivers:
  windowsperfcounters:
    metrics:
      processor.time:
        description: CPU active and idle time
        unit: "%"
        gauge:
    collection_interval: 30s
    perfcounters:
      - object: "Processor"
        instances:
           - "*"
        counters:
          - name: "% Processor Time"
            metric: processor.time
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    metrics:
      receivers: [windowsperfcounters]
      exporters: [debug]

Expected Result

That the _Total instance is also captured when the wildcard is used.

Actual Result

The _Total instance is not captured.

If the user tries to workaround by forcing the capture by explicitly adding _Total to the instances, i.e.:

    perfcounters:
      - object: "Processor"
        instances: ["_Total", "*"]
        counters:
          - name: "% Processor Time"
            metric: processor.time

It still doesn't work because internally if the instances contain "" they are treated as if only "" was specified.

The way to workaround the issue is to create two receivers: one to capture _Total and one to capture the other instances, the later one using the wildcard.

Collector version

22d86de

Environment information

Environment

OS: Windows 10, Windows Server 2022
Compiler(if manually compiled): go1.21.1 windows/amd64

OpenTelemetry Collector configuration

receivers:
  windowsperfcounters:
    metrics:
      processor.time:
        description: CPU active and idle time
        unit: "%"
        gauge:
    collection_interval: 30s
    perfcounters:
      - object: "Processor"
        instances:
           - "*"
        counters:
          - name: "% Processor Time"
            metric: processor.time
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    metrics:
      receivers: [windowsperfcounters]
      exporters: [debug]

Log output

No response

Additional context

No response

@pjanotti pjanotti added bug Something isn't working needs triage New item requiring triage labels Nov 8, 2023
Copy link
Contributor

github-actions bot commented Nov 8, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@pjanotti
Copy link
Contributor Author

pjanotti commented Nov 8, 2023

Spelunking the code I suspect that the root cause is that the _Total instance is not localizable. To capture such instances the code probably need to do the mentioned steps at the remarks section of the PdhAddEnglishCounter documentation.

@crobert-1
Copy link
Member

Spelunking the code I suspect that the root cause is that the _Total instance is not localizable. To capture such instances the code probably need to do the mentioned steps at the remarks section of the PdhAddEnglishCounter documentation.

Context

I'm not very familiar with this component, so I'm just going to leave some context for future reference. The scraper for this receiver creates a NewWatcher using the pkg/winperfcounters functionality for every combination of object, instances, and counter configuration. From here we can see how the config options are being used to form the query to PDH for the counters.

Possible Solution

All of the methods that @pjanotti has referenced are available to be used here, but the current setup does not work with what would be required. The following calls could be made here in the newPerfCounterMethod, but it will show the shortcoming we're running into:

localized_path, err := query.GetCounterPath(handle) // Calls the PdhGetCounterInfo function
// check err
expanded_paths, err := query.ExpandWildCardPath(localized_path) // calls PdhExpandWildCardPath
// check err
var all_handles []PDH_HCOUNTER
var all_counters []*perfCounter
for _, expanded_path := range expanded_paths {
    handle, err = query.AddCounterToQuery(path)
    // check err
    all_counters = append(all_counters, &perfCounter{ path: expanded_path, query: query, handle: handle })
}
return all_counters, nil

(Apologies if there are any obvious mistakes in the code above). When using the regex and expanding paths, the method needs to be able to return counters for each possible path. Existing functionality is only setup to return a single counter, even if the regex would match multiple.

I think this could be done by adding a new method to pkg/winperfcounters/watcher.go. Since this method could be used by other packages as well I think it makes more sense to add it to this package instead of the receiver itself. It looks like the IIS receiver has already handled something similar to this, but the solution was implemented internally so it can't be reused.

Question

Can anyone confirm if this has ever worked, if the * was ever able match all of the instances of a given metric as the README states? From what I can tell I don't know how this would have worked.

@crobert-1 crobert-1 added pkg/winperfcounters and removed needs triage New item requiring triage labels Nov 28, 2023
Copy link
Contributor

Pinging code owners for pkg/winperfcounters: @dashpole @Mrod1598 @BinaryFissionGames. See Adding Labels via Comments if you do not have permissions to add labels yourself.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 29, 2024
@crobert-1 crobert-1 added never stale Issues marked with this label will be never staled and automatically removed and removed Stale labels Jan 29, 2024
@pjanotti
Copy link
Contributor Author

pjanotti commented Jun 17, 2024

[EDIT 01: @crobert-1 @alxbl - I was using a counter for which _Total didn't make sense, however, the current method does capture the _Total instance and ends up dropping via code in the pkg/winperfcounters]

@crobert-1

Can anyone confirm if this has ever worked, if the * was ever able match all of the instances of a given metric as the README states?

I looked at the current code and as it is it can't capture the _Total instance by design:

  • The current code explicitly renames the _Total instance, see here, to an empty string. Moreover the code actually removes the _Total instance inside the same function, see here.

This makes sense because, as pointed out by @alxbl here, by including the aggregated instance there is the possibility of double counting for any aggregation happening after the receiver. Unfortunately, the name _Total is not standard and some counters use other names for their aggregated instances, e.g.: _Global_ for some .NET Framework counters.

It seems that it is better to leave the current behavior as it is and just document the special treatment that _Total receives plus the risk of double aggregation for other counters that don't use _Total as the instance name for their aggregation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working never stale Issues marked with this label will be never staled and automatically removed pkg/winperfcounters receiver/windowsperfcounters
Projects
None yet
2 participants