Skip to content

[RFC] Standardization of azure-eventhub Input Metadata Field #40561

Open

Description

Abstract

Provide a brief summary of the RFC's purpose.

The goal of the RFC is to standardize the name and the content of the azure-eventhub field.

Introduction

Explain the background, context, and motivation for the proposal.

Since its inception five years ago, the azure-eventhub input stored the "event hub metadata" (event hub name, consumer group, offset, and more with the input v2) in the azure field of type object.

However, since many integrations use the azure field as the root element for their specific fields (i.e. azure.activitylogs, etc), these integrations usually rename the azure field with the metadata as azure-eventhub to keep the metadata alongside the actual data.

Here is an example:

{
    "azure-eventhub": {
      "sequence_number": 21916518,
      "partition_id": "1",
      "consumer_group": "$Default",
      "offset": 9955743838336,
      "eventhub": "mbranca815",
      "enqueued_time": "2024-08-20T09:10:01.486Z"
    }
}

Here are a few integrations that rename azure field with metadata into azure-eventhub:

And others who do not rename the field:

  • application_gateway
  • firewall_logs
  • azure_functions
  • azure_frontdoor
  • azure_openai

The older integrations perform the rename azure > azure-eventhub, but the more recent integrations do not.

There are at least two practical problems here:

  • The input stores the metadata in a field that most integrations rename as the first step in the default pipeline.
  • All recent integrations do not rename the field, creating inconsistencies and potential conflicts.

Proposal

Detail the proposed changes, including technical specifications, diagrams, and examples if necessary.

I suggest:

  1. Adopting the current defacto standard name azure-eventhub as the official metadata field name.
  2. Documenting all the existing field content.
  3. Change the input to store the metadata in the azure-eventhub field.
  4. Change the input to make the azure-eventhub field optional to save storage, if required (default enabled).
  5. Make sure all existing integrations work with azure-eventhub field.

Existing field content

The metadata field contains the following information.

Field Description Notes
azure-eventhub.eventhub Event hub name
azure-eventhub.consumer_group Name of the consumer group
azure-eventhub.enqueued_time Timestamp of the time the message was published on the event hub
azure-eventhub.offset Message offset in the event hub partition
azure-eventhub.sequence_number Message sequence number in the event hub partition
azure-eventhub.partition_id The partition ID of the message since v2
azure-eventhub.partition_key The partition key of the message since v2 (optional)

Rationale

Justify the proposal by discussing the problem it solves and why this solution is chosen over alternatives.

Name

  • It is used for the majority of integrations.
  • It is backward compatible.
  • Since it's the same name as the input, conflicts are probably low.

If I could go back in time when the input was created, with today's experience I would call this field something like azure_eventhub_metadata. However, the azure-eventhub is good enough to represent the semantics.

Changing the field name would cause a breaking change that doesn't feel worth it, given the secondary role of the metadata field from the users' perspective.

Impact

Describe the expected impact on users, systems, and any potential side effects.

Since all integrations will use azure-eventhub field, we expect a reduction in mapping conflicts from

the azure field.

Security Considerations

Address any security implications of the proposal.

No security implications so far.

Backward Compatibility

Explain any effects on existing systems or versions.

We need to double-check if the rename processor in the existing integrations works correctly when there is no azure field in the message.

Implementation

Outline the steps needed for implementation, including timelines, milestones, and responsible parties.

Tasks

Conclusion

Summarize the key points and restate the importance of the proposal.

Key Points Summary

  • Proposal Purpose: Standardize the azure-eventhub field name and content across integrations for consistency.

  • Background: Historical inconsistencies arose as the azure field was renamed to azure-eventhub in various implementations, causing confusion.

  • Current Issues: Varied naming has led to difficulties in field mappings and increased conflict risks among older and newer integrations.

  • Proposed Changes: Adoption of azure-eventhub as the official field name, documentation of existing field content, making the field optional, ensuring backward compatibility.

  • Expected Impact: Reducing mapping conflicts and enhancing harmony across diverse integrations through standardization.

  • Implementation Steps: Clear plan for execution, including updates to input settings, adding rename processors, and documenting existing metadata.

Importance of the Proposal

  • Ensures consistency and clarity in handling Azure Event Hub metadata across integrations.
  • Addresses ongoing conflicts, improving ease of integration across the ecosystem.

References

List any external references or documents cited in the RFC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions