-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geo fields into attribute registry #1116
base: main
Are you sure you want to change the base?
Conversation
Are there any comments from other maintainers regarding this new namespace? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to look into these further and think about changing the attributes to namespaces.
**Description:** <Describe what has changed.> This PR adds an initial implementation for the MaxMind GeoIP provider for later usage in the `geoipprocessor`. The processor is still a nop, as no provider can be configured yet. Providers configuration will be added in #33268. - Internal package for temporal attributes conventions reference (should be removed in favor of open-telemetry/semantic-conventions#1116): `processor/geoipprocessor/internal/convention/attributes.go` - [geoip2-golang](https://github.com/oschwald/geoip2-golang) package used as a client for data retrieval. See recommended MaxMind clients: https://dev.maxmind.com/geoip/docs/databases#api-clients **Link to tracking Issue:** <Issue number if applicable> #32663 **Testing:** <Describe what testing was performed and which tests were added.> Database files are generated before running the unit tests [using MaxMind writer.](https://github.com/maxmind/MaxMind-DB/tree/0a9c1aa26cd7b91bee2efe27e7d174797d8211a6/test-data#how-to-generate-test-data). The generation time of those files seems not to be an issue (0.189s): ``` $ go test -v === RUN TestInvalidNewProvider --- PASS: TestInvalidNewProvider (0.00s) === RUN TestProviderLocation === PAUSE TestProviderLocation === CONT TestProviderLocation === RUN TestProviderLocation/nil_IP_address === RUN TestProviderLocation/unsupported_database_type === RUN TestProviderLocation/no_IP_metadata_in_database === RUN TestProviderLocation/all_attributes_should_be_present_for_IPv4_using_GeoLite2-City_database === RUN TestProviderLocation/subset_attributes_for_IPv6_IP_using_GeoIP2-City_database --- PASS: TestProviderLocation (0.00s) --- PASS: TestProviderLocation/nil_IP_address (0.00s) --- PASS: TestProviderLocation/unsupported_database_type (0.00s) --- PASS: TestProviderLocation/no_IP_metadata_in_database (0.00s) --- PASS: TestProviderLocation/all_attributes_should_be_present_for_IPv4_using_GeoLite2-City_database (0.00s) --- PASS: TestProviderLocation/subset_attributes_for_IPv6_IP_using_GeoIP2-City_database (0.00s) PASS ok github.com/open-telemetry/opentelemetry-collector-contrib/processor/geoipprocessor/internal/provider/maxmindprovider 0.189s ``` Testing database files are removed on sucessful test execution. **Documentation:** <Describe the documentation added.> README.md file
Would it be possible to add a use case for those? Do we expect them to be on span/metrics attributes? If the goal is to record events, they may go into the payload and should not be defined in the registry then. |
In the OpenTelemetry collector we are adding a processor that would add those attributes to any trace, metric or log. The processor looks for an IP within the resource attributes and appends the related geolocation attributes. The component is still a nop (we are just adding support for multiple geolocation providers): https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/geoipprocessor Additional context: open-telemetry/opentelemetry-collector-contrib#32663 |
thanks for the clarification @rogercoll Looking into the collector: semantic-conventions/docs/general/attributes.md Lines 143 to 150 in 9adff43
Is you intention to enrich individual telemetry signals or resources? |
@lmolkova The intention of the GeoIP processor is to have a flexible processor that can derive geo information from an IP. What the source of the IP is (i.e. which attribue it's coming from) and what the destination (i.e. namespace) depends on the use case. For example, on web server access logs the geoIP processor can be used to derive the geo information for each individual request's source.ip, allowing endusers to visualize where (geographically) requests come from. But there could also be other use cases, that would require to read the IP from other attributes (either signal level or resource), etc. |
@AlexanderWert thanks for the context! So if I understand correctly the intention is to
I.e. we do need to define them as attributes and not event payload fields. Correct ? Looking only from the semconv perspective, it could be somewhat misleading. I can have more than one IP on the telemetry item (e.g. I assume embedding could be useful here and we could embed the geo namespace under related IPs? E.g. if I configure geo-processor for Effectively, Thoughts? |
@lmolkova you are correct, geo namespace is usually embedded into In ECS we don't use geo namespace as standalone one in the root of events |
Yes, that's absolutely right! I think we should consider it as a two steps process to add geo attributes to semconv:
BTW, @trisch-me what's the status with |
An option to embed not the full namespace but just particular fields (for example create |
Thanks again for the context @AlexanderWert and @trisch-me ! So maybe we can move forward by adding some disclaimer into the group brief like
? Also, it would be great if we could start embedding some common attributes under |
Geo fields can carry data about a specific location related to an event. | ||
This geolocation information can be derived from techniques such as Geo IP, or be user-supplied. | ||
attributes: | ||
- id: city.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we have a dedicated field for village/town/county name?
Can we look into popular Maps API and see how this and other geo attributes would align with their abstractions?
E.g. Google Maps or Azure Maps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's either remove the attribute for now or consider more general term. Azure maps use municipality
, google uses locality
.
ChatGPT also prefers locality
Yes, the official term for the name of a city, town, village, or similar populated place is often referred to as a "locality" or "place name." In some contexts, it's also called a "geographic name" or "toponym."
If you’re working with geographic data, “locality” is commonly used to encompass all such divisions, but “place name” is frequently used in casual contexts.
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
related: #1228 |
Remove: - name - region.name (in favour of region.iso_code) - country.name (in favour of country.iso_code)
Failing CI seems to be unrelated:
|
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
I reckon this PR is still on review (unstale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with some suggestions. The key part is geo.city.name
(which I'm suggesting to remove or rename to geo.locality.name
groups: | ||
- id: registry.geo | ||
type: attribute_group | ||
prefix: geo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we no longer use prefix in semconv - should now be part of each attribure name
prefix: geo |
(under their own namespace) SHOULD document what geo attributes describe in the scope of that convention. | ||
|
||
attributes: | ||
- id: city.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- id: city.name | |
- id: geo.city.name |
(and in other places)
Geo fields can carry data about a specific location related to an event. | ||
This geolocation information can be derived from techniques such as Geo IP, or be user-supplied. | ||
attributes: | ||
- id: city.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's either remove the attribute for now or consider more general term. Azure maps use municipality
, google uses locality
.
ChatGPT also prefers locality
Yes, the official term for the name of a city, town, village, or similar populated place is often referred to as a "locality" or "place name." In some contexts, it's also called a "geographic name" or "toponym."
If you’re working with geographic data, “locality” is commonly used to encompass all such divisions, but “place name” is frequently used in casual contexts.
Fixes #1033
Changes
Adds geo fields into the attribute registry into its own file.
This is another take on #351, which basically copied the fields into both
client
andserver
. Here I aim for adding the fields only into the registry and not using it yet. At the same time, we have open-telemetry/build-tools#240 with a few suggestions from @trisch-me.So this PR will add the fields into the registry - then once open-telemetry/build-tools#240 is implemented, we could use these geo fields by updating
client.yaml
andserver.yaml
and embed these fields into those.Merge requirement checklist
[chore]