Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/elasticsearch] Emit _doc_count for metric documents in OTel mode when data point attribute _doc_count is true #35348

Merged
merged 8 commits into from
Sep 25, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 27 additions & 2 deletions exporter/elasticsearchexporter/model.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ type dataPoint interface {
Attributes() pcommon.Map
Value() (pcommon.Value, error)
DynamicTemplate(pmetric.Metric) string
DocCount() uint64
}

const (
Expand Down Expand Up @@ -284,8 +285,11 @@ func (m *encodeModel) upsertMetricDataPointValueOTelMode(documents map[uint32]ob
if err != nil {
return err
}

docCount := dp.DocCount()

// documents is per-resource. Therefore, there is no need to hash resource attributes
hash := metricOTelHash(dp, scope.Attributes(), metric.Unit())
hash := metricOTelHash(dp, scope.Attributes(), metric.Unit(), docCount)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need docCount as a hashing key? Maybe we could add the doc counts for existing hashes and not consider docCount as a hashing key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need docCount as a hashing key?

I think we'll need it. If 2 metrics with the same dimension have different doc counts, they cannot share the same document, as there is only 1 doc count in 1 document.

Maybe we could add the doc counts for existing hashes

Could you elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 2 metrics with the same dimension have different doc counts

Having a hard time imagining how this could practically happen. Doesn't this mean that we would drop the metric anyway since they are duplicates w.r.t dimensions?

Maybe we could add the doc counts for existing hashes

I was thinking that if all attributes are same then merging them would be the way to go (merging histograms + doc count values) but honestly, I can't imagine a case when this could happen in practical scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this mean that we would drop the metric anyway since they are duplicates w.r.t dimensions?

Good point, that appears to be the case for TSDB. @felixbarny did mention something about hashing doc count, not sure if this case was considered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of that issue, I think it would be easier to map metrics named _doc_count to the top-level _doc_count field. We don't need a _doc_count for all metrics, and it makes it harder to group as many metrics to the same document as possible, because these metrics may differ just by their _doc_count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new implementation that is based on the _doc_count: true attribute, I'm also wondering whether we can remove the doc count from the hash. If there are multiple doc counts for a given doc, I think it's fine to chose one of them (for example the first, last, or max value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I forgot to make that change.

var (
document objmodel.Document
ok bool
Expand All @@ -296,6 +300,7 @@ func (m *encodeModel) upsertMetricDataPointValueOTelMode(documents map[uint32]ob
document.AddTimestamp("start_timestamp", dp.StartTimestamp())
}
document.AddString("unit", metric.Unit())
document.AddInt("_doc_count", int64(docCount))

m.encodeAttributesOTelMode(&document, dp.Attributes(), true)
m.encodeResourceOTelMode(&document, resource, resourceSchemaURL, true)
Expand Down Expand Up @@ -340,6 +345,10 @@ func (dp summaryDataPoint) DynamicTemplate(_ pmetric.Metric) string {
return "summary_metrics"
}

func (dp summaryDataPoint) DocCount() uint64 {
return dp.Count()
}

type exponentialHistogramDataPoint struct {
pmetric.ExponentialHistogramDataPoint
}
Expand Down Expand Up @@ -367,6 +376,10 @@ func (dp exponentialHistogramDataPoint) DynamicTemplate(_ pmetric.Metric) string
return "histogram"
}

func (dp exponentialHistogramDataPoint) DocCount() uint64 {
return dp.Count()
}

type histogramDataPoint struct {
pmetric.HistogramDataPoint
}
Expand All @@ -379,6 +392,10 @@ func (dp histogramDataPoint) DynamicTemplate(_ pmetric.Metric) string {
return "histogram"
}

func (dp histogramDataPoint) DocCount() uint64 {
return dp.HistogramDataPoint.Count()
}

func histogramToValue(dp pmetric.HistogramDataPoint) (pcommon.Value, error) {
// Histogram conversion function is from
// https://github.com/elastic/apm-data/blob/3b28495c3cbdc0902983134276eb114231730249/input/otlp/metrics.go#L277
Expand Down Expand Up @@ -475,6 +492,10 @@ func (dp numberDataPoint) DynamicTemplate(metric pmetric.Metric) string {
return ""
}

func (dp numberDataPoint) DocCount() uint64 {
return 1
}

var errInvalidNumberDataPoint = errors.New("invalid number data point")

func (m *encodeModel) encodeResourceOTelMode(document *objmodel.Document, resource pcommon.Resource, resourceSchemaURL string, stringifyArrayValues bool) {
Expand Down Expand Up @@ -830,7 +851,7 @@ func metricECSHash(timestamp pcommon.Timestamp, attributes pcommon.Map) uint32 {
return hasher.Sum32()
}

func metricOTelHash(dp dataPoint, scopeAttrs pcommon.Map, unit string) uint32 {
func metricOTelHash(dp dataPoint, scopeAttrs pcommon.Map, unit string, docCount uint64) uint32 {
hasher := fnv.New32a()

timestampBuf := make([]byte, 8)
Expand All @@ -842,6 +863,10 @@ func metricOTelHash(dp dataPoint, scopeAttrs pcommon.Map, unit string) uint32 {

hasher.Write([]byte(unit))

buf := make([]byte, 8)
binary.LittleEndian.PutUint64(buf, docCount)
hasher.Write(buf)

carsonip marked this conversation as resolved.
Show resolved Hide resolved
mapHashExcludeDataStreamAttr(hasher, scopeAttrs)
mapHashExcludeDataStreamAttr(hasher, dp.Attributes())

Expand Down
Loading