Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support explicitly removing attributes from an instrument API #3062

Open
asafm opened this issue Dec 28, 2022 · 27 comments
Open

Support explicitly removing attributes from an instrument API #3062

asafm opened this issue Dec 28, 2022 · 27 comments
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted

Comments

@asafm
Copy link
Contributor

asafm commented Dec 28, 2022

What are you trying to achieve?

I would like to have the ability to remove previously reported Attributes to an instrument using the OTel API

What did you expect to see?

The API specification should be changed by adding an operation named Remove to each instrument listed in the API specifications.

Let's take Java API as an example. Today we have:

  void add(double value, Attributes attributes);
  void add(double value, Attributes attributes, Context context);

I would like to have

void remove(Attributes attributes)

This operation/method will remove any accumulated measurements in memory and any reference or usage of the instance of the supplied Attributes from the in-memory storage.

Additional context.

The motivation would be best demonstrated in a real-world use case. I work on Apache Pulsar, which is a messaging platform (It's like Kafka + RabbitMQ). It is a distributed system composed of Brokers. Each broker is responsible for a set of topics out of the topics in the cluster (this is how the data is sharded). Topics have many instruments, meaning each topic has many metrics (i.e., Attributes in several instruments). A topic can move between brokers as part of automatic load balancing or a human-triggered move. Once a topic has moved from broker B1 to B2, we don't expect B1 to emit any metrics related to the moved topic, and we expect B2 to emit those metrics. For example, a topic can have a metric "broker.messaging.topic.storage_size", which counts the total size of messages stored for this topic in the persistent storage. If you can't remove the attributes associated with that topic from that instrument once the topic has been unloaded from B1, then once it is loaded into B2, that metric will be reported twice.
(Just for clarification, Apache Pulsar is yet to use OTel as we now explore it and map existing gaps).

Existing metric libraries in the Java ecosystem do support this type of removal:

  • Prometheus Java Client: As can be seen here
public void remove(String... labelValues) {
    children.remove(Arrays.asList(labelValues));
    initializeNoLabelsChild();
  }
  • Micrometer: As can be seen here
public Meter remove(Meter meter) {
        return remove(meter.getId());
    }
  • Dropwizard Metrics: as can be seen here
 public boolean remove(String name) {
        final Metric metric = metrics.remove(name);
        if (metric != null) {
            onMetricRemoved(name, metric);
            return true;
        }
        return false;
    }

This operation also reduces memory footprint since the software developer knows those attributes will never receive further measurements.

@asafm asafm added the spec:metrics Related to the specification/metrics directory label Dec 28, 2022
@asafm
Copy link
Contributor Author

asafm commented Dec 29, 2022

I found out now it was also discussed in #2747, but I wanted to:

  1. Have an issue, not a PR, to discuss this first (When I searched for this topic, I searched for issues).
  2. Focus on remove, which is different IMO from the circuit breaker mechanism or auto purging.
  3. Describe a real-world example of why it's needed.

@joaopgrassi
Copy link
Member

joaopgrassi commented Jan 9, 2023

I read the use case and I'm a bit confused: There's this part:

Once a topic has moved from broker B1 to B2, we don't expect B1 to emit any metrics related to the moved topic, and we expect B2 to emit those metrics

Makes sense to me. New broker starts reporting metrics for the topic. But then there's this:

If you can't remove the attributes associated with that topic from that instrument once the topic has been unloaded from B1, then once it is loaded into B2, that metric will be reported twice

How is it reported twice? If B1 does not emit metrics once it's moved to another broker?

And for the case of broker.messaging.topic.storage_size wouldn't it be more accurate that both of them are reported? Of course once the topic moves to another broker, the new will continue reporting but I see that as expected? Also, which attributes in this particular case you would want to remove? The broker name? I see the "count the total size of messages stored for this topic" as something that is broker agnostic and you would want to have the total being reported, no matter in which broker the topic is, no?

Sorry if I missed something obvious, I'm just trying to understand the use case.

@asafm
Copy link
Contributor Author

asafm commented Jan 10, 2023

How is it reported twice? If B1 does not emit metrics once it's moved to another broker?

Today it is not the case. But say I'm switching over all Apache Pulsar metrics to OTel, without that feature of removing attributes ("If you can't remove the attributes associated with that topic from that instrument once the topic has been unloaded from B1), then the following will happen: "once it is loaded into B2, that metric will be reported twice". I was making the case why having the feature of removing attributes from an instrument explicitly is crucial to using the epic of moving to OTel in Apache Pulsar.

And for the case of broker.messaging.topic.storage_size wouldn't it be more accurate that both of them are reported? Of course once the topic moves to another broker, the new will continue reporting but I see that as expected? Also, which attributes in this particular case you would want to remove? The broker name? I see the "count the total size of messages stored for this topic" as something that is broker agnostic and you would want to have the total being reported, no matter in which broker the topic is, no?

That means I did poorly explaining, so I'll add an example to explain better:

Say topic incoming-logs, part of the logs namespace is hosted at broker01.

This means the following will be exported (using Prometheus format here). I included 2 additional topics before and after, too so we won't think this broker has only 1 topic.

pulsar_messaging_topic_storages_size{namespace="logs", topic="incoming-logs-special", cluster="test-cluster"} 1673341740 200
pulsar_messaging_topic_storages_size{namespace="logs", topic="incoming-logs", cluster="test-cluster"} 1673341740 1020
pulsar_messaging_topic_storages_size{namespace="logs", topic="outgoing-logs", cluster="test-cluster"} 1673341740 100

Once incoming-logs was unloaded from broker01 and loaded to broker03, I expect the 2nd line above to stop being emitted in the scrape of metrics of broker01, and now have the following line in the scrape of metrics in broker03

pulsar_messaging_topic_storages_size{namespace="logs", topic="incoming-logs", cluster="test-cluster"} 1673341800 1020

In broker01, the following command (I would like to have) would be executed upon unloading of this topic:

topicStorageSize.remove(topicAttr);

and one of the topic class variables would be topicAttr init in the constructor:

        topicAttr = Attributes.builder()
                .put("namespace", "logs")
                .put("topic", "incoming-logs")
                .build();

@joaopgrassi
Copy link
Member

joaopgrassi commented Jan 10, 2023

I see your point. I'm not much familiar with prometheus scrapes but I was thinking the timestamp would stay the same, or the value would be reset, so something to indicate that this is 'stale' and didn't change from the last scrape so = ignore it.

@pirgeo
Copy link
Member

pirgeo commented Jan 10, 2023

To confirm I got it right: Basically, you want to stop exporting the metric with the label for one broker, and instead want to start exporting it with the label for another broker. You need the first broker to not export that metric again since the work is now done elsewhere (on the other broker) and if you reported it once for the "old" (or no longer active) broker, and once for the "new" (the now active broker) you would have a duplication.

Coming from this:

pulsar_messaging_topic_storages_size{broker="broker01", cluster="test-cluster"} 1673341740 200

The thing you are trying to avoid is:

pulsar_messaging_topic_storages_size{broker="broker01", cluster="test-cluster"} 1673341800 200
pulsar_messaging_topic_storages_size{broker="broker03", cluster="test-cluster"} 1673341800 200

Because that would be double counting.

For a Gauge it would be possible to model this by setting the "old" gauge to 0, and setting the "new" gauge to the "old" gauges previous value. So something like:

After the move:

pulsar_messaging_topic_storages_size{broker="broker01", cluster="test-cluster"} 1673341800 0
pulsar_messaging_topic_storages_size{broker="broker03", cluster="test-cluster"} 1673341800 200

But:

you would want to get rid of the one showing 0 as well to only have

pulsar_messaging_topic_storages_size{broker="broker03", cluster="test-cluster"} 1673341800 200

right?

@asafm
Copy link
Contributor Author

asafm commented Jan 15, 2023

To confirm I got it right: Basically, you want to stop exporting the metric with the label for one broker, and instead want to start exporting it with the label for another broker. You need the first broker to not export that metric again since the work is now done elsewhere (on the other broker) and if you reported it once for the "old" (or no longer active) broker, and once for the "new" (the now active broker) you would have a duplication.

Coming from this:

pulsar_messaging_topic_storages_size{broker="broker01", cluster="test-cluster"} 1673341740 200

The thing you are trying to avoid is:

pulsar_messaging_topic_storages_size{broker="broker01", cluster="test-cluster"} 1673341800 200
pulsar_messaging_topic_storages_size{broker="broker03", cluster="test-cluster"} 1673341800 200

Because that would be double counting.

For a Gauge it would be possible to model this by setting the "old" gauge to 0, and setting the "new" gauge to the "old" gauges previous value. So something like:

After the move:

pulsar_messaging_topic_storages_size{broker="broker01", cluster="test-cluster"} 1673341800 0
pulsar_messaging_topic_storages_size{broker="broker03", cluster="test-cluster"} 1673341800 200

But:

you would want to get rid of the one showing 0 as well to only have

pulsar_messaging_topic_storages_size{broker="broker03", cluster="test-cluster"} 1673341800 200

right?

Correct @pirgeo
Changing the value to 0 reflects a completely different and wrong value since you would presume the topic storage is located in both brokers and the "part" located at broker01 is 0 size, which is not the case.

@asafm
Copy link
Contributor Author

asafm commented Jan 15, 2023

I see your point. I'm not much familiar with prometheus scrapes but I was thinking the timestamp would stay the same, or the value would be reset, so something to indicate that this is 'stale' and didn't change from the last scrape so = ignore it.

@joaopgrassi Well, the timestamp must advance. In fact, Prometheus exporters usually don't set the timestamp on the exported text, but it is the Prometheus server adding the timestamp of the scrape during the scrape process it self, before persisting it to memory and WAL.

From my knowledge there isn't a mechanism to indicate this value is not used anymore, hence Prometheus java client has the ability to remove an attributes set (called labels in Prometheus) from a given Collector:

public void remove(String... labelValues) {
    children.remove(Arrays.asList(labelValues));
    initializeNoLabelsChild();
  }

@asafm
Copy link
Contributor Author

asafm commented Jan 25, 2023

@pirgeo and @joaopgrassi - What is required to move this issue forward? I would to discuss other concerns, and see we can progress into making it into the specifications, I just don't know the process yet.

@pirgeo
Copy link
Member

pirgeo commented Jan 25, 2023

I think this is also related to #2711 (maybe even a possible duplicate?).

To me, it seems like this is not a controversial topic, just one that doesn't affect a lot of folks, and therefore the responses are a bit slow.

As for pushing the issue: You could try bringing it up in the next spec meeting (https://github.com/open-telemetry/community#specification-sigs, see "Specification: General") on Tuesday. Everyone is invited to add items to the agenda. If you do, I would also recommend joining the meeting (it is a public meeting) and stating your case there. It might be easier to get your point across there and to get a spotlight on this issue. Usually, there is a moderator making sure that all issues on the agenda get some air time, and are discussed!

@asafm
Copy link
Contributor Author

asafm commented Jan 29, 2023

I think this is also related to #2711 (maybe even a possible duplicate?).

I think they are "sibling" issues :) That issues mentioned refers to SDK automatically detecting a certain attribute set is not being reported for x collection intervals, hence the SDK can decide to stop reporting it - implicit.
My issue is about being explicit - I know this attribute set is no longer relevant, hence I would like to stop reporting it explicitly.

Thanks for the spec meeting - I visited once in August to ask a question, and @pirgeo, you are correct that I should join the next one. Thanks for the tip!

@dashpole
Copy link
Contributor

One possible solution, which would not require changes to the current instrument API, would be to use the disappearance of an attribute set from an asynchronous instrument callback as a signal that that attribute set no longer exists. See #1891 (comment) and open-telemetry/opentelemetry-go#3605 for details. We could add Delete() to synchronous instruments at a later point if there is still a need.

@asafm
Copy link
Contributor Author

asafm commented Feb 1, 2023

@dashpole Using asynchronous attributes to achieve it has 2 disadvantages: It's not explicit, and less robust (more code):
When someone reads the code, they won't understand it right away. Asynchronous is great if you already maintain a certain counter in your code, and you would like to expose it. People reading it, can easily reason why you chose to go with the callback registration solution. It makes less sense if it's something you only count for the purpose of reporting it as a metric, because then you are forced to declare LongAdder (in Java) in your class, increase it directly and expose it via callback (Asynchronous instrument). A person looking at that code, will look at usages, and will wonder: "Why is that variable declared here, reported using callback, and not used for anything else? Why not a normal counter here?" Then you are forced to write a comment explaining the lack of functionality in OTel, link to this issue, etc.
The other way of writing it explicit - having remove(attr) or delete(attr) on synchronous instruments, is very clear, explicit, and more robust in the case I was describing IMO.

@dashpole
Copy link
Contributor

dashpole commented Feb 1, 2023

@asafm Totally agree it isn't the ideal solution for your use-case.

I'm no java expert, but it should be possible to implement your desired synchronous API (an "instrument" with Add() and Delete()) using an async instrument. That would at least keep your code readable. In Go, I would do something like:

type myInt64Counter struct {
    values map[[]attribute.KeyValue]int64
}

// Callback is what I can use when creating my async instrument
func (m *myInt64Counter) Callback(ctx context.Context, observer instrument.Int64Observer) error {
    // observe all of the values
    for attrs, value := range m.values {
        observer.Observe(value, attrs...)
    }
    return nil
}


func (m *myInt64Counter) Add(ctx context.Context, incr int64, attrs ...attribute.KeyValue) {
    // increment values[attrs] by incr
}

func (m *myInt64Counter) Delete(ctx context.Context, attrs ...attribute.KeyValue) {
    delete(m.values, attrs)
}

My hope is that it would unblock you without needing an addition to the current instrumentation API, even if it isn't ideal.

@asafm
Copy link
Contributor Author

asafm commented Feb 2, 2023

@dashpole Thanks. This is a good solution as a work-around.
My intention with this GitHub issue is to gather feedback whether it's a good idea to modify the specifications to have that functionality. Who do think can help with feedback on this, so I can know if it's possible to advance on this?

@asafm
Copy link
Contributor Author

asafm commented Feb 5, 2023

@dashpole Do you any reason not to approve this proposal?

@dashpole
Copy link
Contributor

dashpole commented Feb 6, 2023

I'm in favor of being able to remove attribute sets from synchronous instruments. In terms of the proposal, you've talked about what delete() will do, but a proposal also needs to cover what the behavior is if the deleted attribute set is re-observed with Add() later on. I've put forward suggestions for how I would like to see that handled in other linked issues.

One challenge is that SDKs currently have a lot of flexibility WRT how they deal with start times (e.g. the async cumulative guidelines) and resets. We would presumably be requiring SDKs to implement a reset mechanism by introducing delete(), as I can delete() and then add() the same attribute set to produce a reset.

Another challenge is that most APIs/SDKs have had stable releases already, and (depending on the language) might not be able to introduce an experimental function (i.e. they would have to wait for this new spec to reach stability).

Finally, spec changes need TC + spec approvers (which I am not currently a member of), and involvement from multiple SIGs to implement and prototype changes. TC members in particular are balancing a lot of priorities within the community, and may prefer to tackle other issues first.

@MrAlias
Copy link
Contributor

MrAlias commented Feb 6, 2023

@asafm Totally agree it isn't the ideal solution for your use-case.

I'm no java expert, but it should be possible to implement your desired synchronous API (an "instrument" with Add() and Delete()) using an async instrument. That would at least keep your code readable. In Go, I would do something like:

type myInt64Counter struct {
    values map[[]attribute.KeyValue]int64
}

// Callback is what I can use when creating my async instrument
func (m *myInt64Counter) Callback(ctx context.Context, observer instrument.Int64Observer) error {
    // observe all of the values
    for attrs, value := range m.values {
        observer.Observe(value, attrs...)
    }
    return nil
}


func (m *myInt64Counter) Add(ctx context.Context, incr int64, attrs ...attribute.KeyValue) {
    // increment values[attrs] by incr
}

func (m *myInt64Counter) Delete(ctx context.Context, attrs ...attribute.KeyValue) {
    delete(m.values, attrs)
}

My hope is that it would unblock you without needing an addition to the current instrumentation API, even if it isn't ideal.

I really like this proposal. It means that if we clarify an under-defined area of the OTel specification we could benefit from not only reducing overhead for aysnc instruments, but allow the building of an extension that supports the requested feature.

@asafm
Copy link
Contributor Author

asafm commented Feb 7, 2023

@dashpole Thanks a lot for the detailed response. I have several follow-up questions :)

I'm in favor of being able to remove attribute sets from synchronous instruments. In terms of the proposal, you've talked about what delete() will do, but a proposal also needs to cover what the behavior is if the deleted attribute set is re-observed with Add() later on. I've put forward suggestions for how I would like to see that handled in other linked issues.

You said you put suggestion on other linked issues - any chance you can paste those links?
You are correct that it didn't address that scenario. Before I add it, I need to understand: If a person explicitly called "delete", which means all storage related to that specific attribute set has been deleted from memory, why can't I treat the next measurement recording for that instrument (when ever it will be) just as any other measurement recording for an attribute set never seen before?
I read the reset and gaps guidelines and couldn't clearly understand it. If someone deliberately called "delete" and then after sometime started recording again - talking about synchronous instruments here - it should act as if the measurement reporting started now (on first reported measurement time), so startTime should be now(). I tried understanding why in the guidelines they mentioned using the stream start time, but I couldn't figure out why - if the stream restarts now() you want to reflect that any cumulative value started its count from the moment the stream restarted, and not from its original start point.

Another challenge is that most APIs/SDKs have had stable releases already, and (depending on the language) might not be able to introduce an experimental function (i.e. they would have to wait for this new spec to reach stability).

So you mean that even if this gets clarified, approved and merged, other SDKs can't introduce an experimental function because they don't want to disrupt users? Doesn't mark that as experimental, helps in that case?

Finally, spec changes need TC + spec approvers (which I am not currently a member of), and involvement from multiple SIGs to implement and prototype changes. TC members in particular are balancing a lot of priorities within the community, and may prefer to tackle other issues first.

So basically this means - "this will take time", right?

One challenge is that SDKs currently have a lot of flexibility WRT how they deal with start times (e.g. the async cumulative guidelines) and resets. We would presumably be requiring SDKs to implement a reset mechanism by introducing delete(), as I can delete() and then add() the same attribute set to produce a reset.

I think this text is missing some context, or maybe the problem is with me, as I can't understand it all the way. Is there anyone you recommend I can get some help from on this?

So to move forward with this:

  1. Update proposal to explain how this explicit reset should be reported if the new measurement was made to the same attribute set. This must be aligned with the guidelines linked above.
  2. Get feedback from TC and Spec Approvers - anybody specific you recommend?

@asafm
Copy link
Contributor Author

asafm commented Feb 9, 2023

@dashpole @MrAlias While writing the design, it suddenly dawned on me, I can't use the proposed workaround for Histograms, since they are only Synchronous. So without this delete() proposal, I'm quite stuck bearing my use case (topic moves from broker 1 to broker 2)

@dashpole
Copy link
Contributor

dashpole commented Feb 9, 2023

You said you put suggestion on other linked issues - any chance you can paste those links?

#1891 (comment)

So you mean that even if this gets clarified, approved and merged, other SDKs can't introduce an experimental function because they don't want to disrupt users? Doesn't mark that as experimental, helps in that case?

It really depends on the language.

So basically this means - "this will take time", right?

More or less. I'm also pointing out that I'm not the one you need to convince ;).

I think this text is missing some context, or maybe the problem is with me, as I can't understand it all the way. Is there anyone you recommend I can get some help from on this?

Your comment at the sig meeting "can't we just use the current time as the start time when it reappears" probably solves this. For an example of what I was trying to get at, it is currently valid to use the time at which the instrument was created (basically the process start time) as the start time of all cumulative metrics. After the addition of Delete(), that behavior would no longer be valid for SDKs. I would hope (and we will eventually need prototypes) that switching from that behavior to the one you suggest won't be hard.

@asafm
Copy link
Contributor Author

asafm commented Feb 22, 2023

After the addition of Delete(), that behavior would no longer be valid for SDKs. I would hope (and we will eventually need prototypes) that switching from that behavior to the one you suggest won't be hard.

I read the section of resets and gaps in the spec and also the supplementary guidelines examples. I quote here the section in the example:

If all streams in the process share a start time, and the SDK is not required to remember all past streams: the thread restarts with zero sum. Receivers with reset detection are able to calculate a correct rate (except for frequent restarts relative to the collection interval), however the precise time of a reset will be unknown.

Let's assume we have two conditions met:

  • You were to delete an attribute set
  • The SDK decided it only uses the process start time
    In that case, once the stream resumes, it's valid that the start time of the initial point and all subsequent point to be the process start time.
    According to the example text above, it's perfectly valid, because receivers with reset detection can calculate the correct rate.

Also, both the reset and gaps and supplementary guidelines, actually write quite explicitly what to do when the attributes set re-appears: if you use per-attributes start time, then use the last collection time as the start-time. If you use process start-time, you can use that.

If that is the case, what do you see as "blocker" for adding Delete() to the API?

I have one issue I need some help with to resolve.
If you have defined a synchronous UpDownCounter instrument, and defined a view with attribute filter, which removes one or more attributes.
Example:
Say I have "write-requests.in-flight", and I have two attributes I use to update that counter:
add({node=node02, region=us}, 2)
If I defined view with attribute filter, so I only keep region, then I actually do:
add({region=us}, 2)

So in memory, I have the current number of in-flight requests per region.

When the user will do delete({region=us, node=node02}) since, for example, the node went down, the storage for that view doesn't have such attributes, as it only keeps an attribute set per region. So I'm left with an inconsistent state, where the number of in-flight requests for us is wrong.

@jack-berg @dashpole @jmacd what do you think?
Does this mean, because of views, we'll never be able to remove attributes, be it explicitly or implicitly, to save up on unused attributes in memory? (Which is what #2711 is about)

@dashpole
Copy link
Contributor

Also, both the reset and gaps and supplementary guidelines, actually write quite explicitly what to do when the attributes set re-appears: if you use per-attributes start time, then use the last collection time as the start-time. If you use process start-time, you can use that.

I would love for that to be the case. Where do you see that?

When the user will do delete({region=us, node=node02}) since, for example, the node went down, the storage for that view doesn't have such attributes, as it only keeps an attribute set per region. So I'm left with an inconsistent state, where the number of in-flight requests for us is wrong.

If using delta, I think delete() would be a no-op, as we are basically just aggregating Add() calls from each collect.

If using cumulative temporality, the only thing I can think of is that the counter would have to be reset in that case with a new start time and counter value.

@asafm
Copy link
Contributor Author

asafm commented Mar 16, 2023

Also, both the reset and gaps and supplementary guidelines, actually write quite explicitly what to do when the attributes set re-appears: if you use per-attributes start time, then use the last collection time as the start-time. If you use process start-time, you can use that.

I would love for that to be the case. Where do you see that?

Ok, so after reading the reset and gaps 6 - 8 times (Either it's written in a super complicated way, or I simply can't get it), you are correct. It is only written explicitly in the supplementary guidelines, and I quote:

Consider whether the SDK maintains individual timestamps for the individual stream, or just one per process. In this example, where a thread can die and start counting page faults from zero, the valid behaviors at T5 are:

  1. If all streams in the process share a start time, and the SDK is not required to remember all past streams: the thread >restarts with zero sum. Receivers with reset detection are able to calculate a correct rate (except for frequent restarts relative to the collection interval), however the precise time of a reset will be unknown.
  2. If the SDK maintains per-stream start times, it signals to the receiver precisely when a stream started, making the first observation in a stream more useful for diagnostics. Receivers can perform overlap detection or duplicate suppression and do not require reset detection, in this case.

When the user will do delete({region=us, node=node02}) since, for example, the node went down, the storage for that view doesn't have such attributes, as it only keeps an attribute set per region. So I'm left with an inconsistent state, where the number of in-flight requests for us is wrong.

If using delta, I think delete() would be a no-op, as we are basically just aggregating Add() calls from each collect.

If using cumulative temporality, the only thing I can think of is that the counter would have to be reset in that case with a new start time and counter value.

Reset means you place 0 for the value, but the software doesn't know that, it thinks the value is the current storage value as it has attributes per region. Say you delete node02, next a request has finished, it will do add(-1, {region=us, node=node02}) so now the UpDownCounter for {region=us} would be -1?
I don't think it's possible.

That's quite a problematic issue.

@dashpole
Copy link
Contributor

Just getting back from leave.

Reset means you place 0 for the value, but the software doesn't know that, it thinks the value is the current storage value as it has attributes per region. Say you delete node02, next a request has finished, it will do add(-1, {region=us, node=node02}) so now the UpDownCounter for {region=us} would be -1?
I don't think it's possible.

You should only delete an attribute when you know for sure that no more observations will be made with it, right? Any subsequent observations would be treated as "brand new" after a delete, so that behavior seems intended.

@asafm
Copy link
Contributor Author

asafm commented Jun 26, 2023

@dashpole I'll start by re-pasting my problematic scenario from above:


I have one issue I need some help with to resolve.
If you have defined a synchronous UpDownCounter instrument and defined a view with an attribute filter, which removes one or more attributes.
Example:
Say I have write-requests.in-flight, and I have two attributes I use to update that up-down-counter:
add({node=node02, region=us}, 2)
If I defined view with attribute filter, so I only keep region, then what really happens is:
add({region=us}, 2)

So in memory, I have the current number of in-flight requests per region.

When the user calls delete({region=us, node=node02}) since, for example, the node went down, the storage for that view doesn't have such attributes, as it only keeps an attribute set per region. So I'm left with an inconsistent state, where the number of in-flight requests for us is wrong.


I know for sure I won't report any observations for node02 in the region us.
The problem here is that only have the aggregated result of write-requests.in-flight, since that UpDownCounter keeps the data only at region level due to the attributes filter defined at the view.
Say that counter is 3 for regions=us: 1 is from node01 and 2 is from node02.
When you delete delete({region=us, node=node02}) - you don't have those attributes to delete, since you only keep it per region. So you have no idea you need to do -2 and set the value for region=us` to 1.
That's the big problem I have @dashpole

@gouthamve
Copy link
Member

Hi, this is a fairly common usecase and the Prometheus SDK has a very elegant API for this imo.

One use-case:

In Mimir, we expose a number of metrics per tenant (active series, samples received, queries run, rules executed, etc.). New tenants are created and old tenants are cleaned up and this is a continuous process.

If we don't clean up the stale tenant metrics, the cardinality exposed will continuously keep growing and this is undesireable.


In Prometheus SDK we achieve it by using a custom Collector: https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#Collector

For example, you can have a map representing active labelsets --> value, and in the Collect call, can iterate through this map and represent the metrics. If we remove elements from this map, then the metrics go away.

@asafm
Copy link
Contributor Author

asafm commented Apr 3, 2024

@gouthamve I'm not an expert on this (yet). I can tell you from what I know:

  1. You can achieve partially what you wrote using AsyncCounter and AsyncGauge and AsyncUpDownCounter. You can the metrics where ever you want and grab them using a callback. When you stop reporting them in the callback - they are effectively gone :)
  2. Promehteus client SDK has a rather simple approach. Each instrument is backed a single storage. For counter it's always a LongCounter. In OTel you have for example the concept of Views. They allow you the instrument: Have only 2 attributes instead of 5. Your code can record 5 attributes but if the view said to select only 2 of them, then the data will be aggregated to those two. You can even change the instrument storage. For example, you can select last value. So combine this and you will find out it's not easy to support this.

I still want this a lot, since Histogram is not solvable - no async histogram. I personally haven't gotten around this yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted
Projects
None yet
Development

No branches or pull requests

8 participants