Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User.id for authenticated user id #1104

Open
heyams opened this issue May 31, 2024 · 41 comments · May be fixed by #1456
Open

User.id for authenticated user id #1104

heyams opened this issue May 31, 2024 · 41 comments · May be fixed by #1456
Labels

Comments

@heyams
Copy link
Contributor

heyams commented May 31, 2024

Area(s)

area:user

Is your change request related to a problem? Please describe.

enduser.id has been deprecated and replaced with user.id. #731

enduser.id had this old description:
image

user.id has this new description:
image

The new description is confusing now. Is it for authenticated user id or anonymous user id? What are your thoughts on creating a new attribute called user.anonymous_id?

Our telemetry solution tracks both authenticated user id and anonymous user id.

Describe the solution you'd like

  1. update the user.id description to make it clear that it's intended for authenticated user id.
  2. create a new attribute call user.anonymous_id for the anonymous user id.

Describe alternatives you've considered

n/a

Additional context

n/a

@trisch-me
Copy link
Contributor

hey @heyams - user namespace is not bounded to auth domain, it could be anything - user in operation system, user in database etc. How did you use enduser.id field before for authenticated and anonymous users?
Also previously enduser.id had both client_id and username in it, now user namespace has dedicated fields for it, i.e. id and name

@thompson-tomo
Copy link

thompson-tomo commented Jun 3, 2024

What I would propose is that we introduce an additional property user.authenticated which describes if the user is an logged in or not. Alternatively we could add in user.authenticstionscheme which could be anonymous, basicauth, openid

We could also add a user.authenticstionprovider ie Facebook, local, domain

Potentially we could also add in user.authorized which would be useful in the case where an action fails due to user lacking the authorization to complete the task/activity.

@trisch-me
Copy link
Contributor

there was a discussion about additional sub-namespace for user such as user.auth.* and add there appropriate fields.
We have discussed about user.auth.domain field. Other fields could be also added under this sub-domain as well if needed

@thompson-tomo
Copy link

Ok with the idea of sub domains, how about we use this to track a discussion about implementing
user.auth.authenticated
to address the gap raised in this issue and we/I create a seperate issue to track extending user.auth.* to include other useful aspects from the oidc jwt.

@MSNev
Copy link
Contributor

MSNev commented Jun 3, 2024

Previous discussions (from the client rum sig) #443

@thompson-tomo
Copy link

Ok I see valid points in that discussion how about We introduce the following:

  • User.auth.Authenticated so that we can now if the user is authenticated
  • User.session so we can track an anonymous user and continue the session when they become authenticated.

For the device attributes how about we also introduce a session attribute?

@heyams
Copy link
Contributor Author

heyams commented Jun 4, 2024

Ok I see valid points in that discussion how about We introduce the following:

  • User.auth.Authenticated so that we can now if the user is authenticated
  • User.session so we can track an anonymous user and continue the session when they become authenticated.

For the device attributes how about we also introduce a session attribute?

Sessions only lasts as long as the browser is open, and it's a different concept.
What do you think about @trisch-me's suggestion having a sub namespace under user

user.auth.* => user.auth.authenticated can be used to track if it's authenticated, and then user.auth.id can be used for the authenticated user id, if user.auth.authenticated is false, same id can be used for anonymous user id?

alternatively, we can add user.authenticated boolean attribute, when this is true, user.id can be used for authenticated user id; otherwise, anonymous user id?

@trisch-me @MSNev thoughts on this?

@thompson-tomo
Copy link

thompson-tomo commented Jun 4, 2024

not a fan of user.authenticated as it will be too limiting especially if you want to track an anonymous user become an authenticated. So fully support implementing user.auth.authenticated especially as it would enable us to see a user which has failed authentication.

to be honest i don't see an issue an issue with tracking the users session via a user.session attribute given rather than a user.id attribute. Scenarios would be:

  • If we are wanting to track a visitor to the site using the user.session attribute we would get all traces generated by that user and the hope is that when a user finishes their session they close their browser hence terminating the session.
  • If we are wanting to trace an anonymous user and all the traces generated over time. For me this would be better connected to the device and as such we should be using the device attributes. With this the device id would be long lived and would cater for scenarios where multiple users share the same device/browser hence using a user.id might prove to be inaccurate.

@atreat
Copy link

atreat commented Jun 11, 2024

I think having a separate attribute for an anonymous user id makes sense. I'd keep user.id to be as concise as possible and formalize the description of this attribute to be specific to an authenticated user.

For an unauthenticated user, I'd recommend a separate attribute that we can try to name (user.anon_id, user.anonymous_id, user.transient_id, etc.). I'm not too particular on what this is called but prefer to keep it flat instead of a sub-namespace.


I think anonymous users are a feature that should be considered separate from user sessions because:

  1. It's possible that an anonymous user identifier is generated and is sticky, so we can follow that same user across multiple sessions.
  2. It's possible that an anonymous user eventually authenticates and identifies themselves. In this case a single session has telemetry that contains the anonymous identifier and the authenticated identifier. It'd be possible to create a new session when the authentication occurs, but that would prevent the opportunity to understand what led the user to authenticate.
  3. Depending on the application, you may have multiple anonymous users within a single session. If an application is running on a kiosk, it's possible that the application remains open while multiple users walk up, interact, and then walk away. It'd be up to the application developer to decide if they create a new session or create a new anonymous user id. This is a subjective decision by the developer, but I like that the convention of the anonymous identifier provides that flexibility.

@thompson-tomo
Copy link

thompson-tomo commented Jun 12, 2024

In the case of following a "user" across multiple sessions how can we be fairly certain it is the same user? For instance a new user at the kiosk. Propose tracking the actual device as we are using some identifier stored on the device. The key thing for me is the context is bound to a device.

I feel that we should leave it to developers to decide what triggers the start of a new session, be it time based or a user clicking start on a kiosk. Perhaps we look at adding a convention such as device.session so we can track all traces which have occurred since the app was launched.

I think the approach of being able to trace all the activity coming from a device, drilling it down to a session & then taking it that step further to see the data related to an individual user. The final drill down is to see what was down while authenticated including who they are.

@atreat
Copy link

atreat commented Jun 12, 2024

I think the approach of being able to trace all the activity coming from a device, drilling it down to a session & then taking it that step further to see the data related to an individual user.

Agree with this completely. I think having separate identifiers for the device and session provide value. I was just trying to provide examples where an unauthenticated user may not match up 1:1 with a device or a session. Hoping that a convention specific to an "unauthenticated user" gives app developers flexibility to model their telemetry to fit their use case.


In the case of following a "user" across multiple sessions how can we be fairly certain it is the same user? For instance a new user at the kiosk.

These should be thought of as separate examples. In the kiosk use case you would likely not make your unauthenticated user id sticky. In applications where it's more safe to assume that the user is the same (an app on a mobile phone), it would be more useful to persist a longer lived value for their identifier.

I would recommend a session identifier alongside an anonymous user id in both these examples. In kiosk mode, the application may decide to keep a session open for multiple customers. When in pocket mode, an application may decide to identify a potential customer for multiple sessions.

@thompson-tomo
Copy link

thompson-tomo commented Jun 13, 2024

So let me try and summarise the current state of where we at as I see it in short Form:

  • user.session a way in which we can see all traces from a user regardless of being authenticated or not
  • device.session a way in which we can see all traces from a device since launch or manually started a new session
  • user.id an id which points to a registered user (doesn't mean authenticated) and is shared across devices

Open question

  • Do we need to add an anonymous user id as proposed or is introducing a sticky device.id a better approach given the lack of certainty about the user remaining the same.

I am of the later thought especially if we can also release guidance on session track which includes the following examples at a Min

  • Kiosk app
  • Mobile app
  • 100% anonymous web user
  • An authenticated web user for their entire journey including prior to logging in.

@MSNev
Copy link
Contributor

MSNev commented Jun 14, 2024

@thompson-tomo

Do we need to add an anonymous user id as proposed or is introducing a sticky device.id a better approach given the lack of certainty about the user remaining the same.

No, the device.id is not the same as an anonymous user id, they are and need to be kept separate. The device.id is specific for the device that (one or more) users are using

@thompson-tomo
Copy link

No, the device.id is not the same as an anonymous user id, they are and need to be kept separate. The device.id is specific for the device that (one or more) users are using

Yes I am aware that multiple unauthenticated users could use the one device.

The thing which I am questioning is why we call it anonymous user id, when the user can't come back, multiple people could be involved given that we have no reliable way of knowing when the user switches and instead I propose that we refer to it as user.session &/or device.session depending on the use case.

The key thing is allowing using a combination of fields depending on the use case to achieve maximum coverage and the seven scenarios described.

@heyams
Copy link
Contributor Author

heyams commented Jul 1, 2024

No, the device.id is not the same as an anonymous user id, they are and need to be kept separate. The device.id is specific for the device that (one or more) users are using

Yes I am aware that multiple unauthenticated users could use the one device.

The thing which I am questioning is why we call it anonymous user id, when the user can't come back, multiple people could be involved given that we have no reliable way of knowing when the user switches and instead I propose that we refer to it as user.session &/or device.session depending on the use case.

The key thing is allowing using a combination of fields depending on the use case to achieve maximum coverage and the seven scenarios described.

As I have mentioned earlier that session only lasts as long as the browser is open, and it's a different concept.

@thompson-tomo
Copy link

Yes I am aware that a session lasts only as long as the browser/app is open. What I am failing to see is how can a anonymous user id be safely reused?

For me the tests should be:

  • If it is stored on the device then for me it becomes device.id as there can be no certainty that the same anonymous user is present
  • if the id is generated on app launch then that is device.session as you can not kn9w who is using the app/website
  • If the user is known ie logged in then it is user.session.

Based on the above logic all the id's become complementary & defined scope. Most importantly for me it enables us to see all activity coming from a device during a session and that can be split based on the user.session with those sessions being able to be split based on the authenticated user id

@MSNev
Copy link
Contributor

MSNev commented Jul 8, 2024

In the client space you can have

  • An anonymous session.id (which expires at different intervals and in a browser environment it IS maintained for a period of time AFTER the browser is closed). Multi users CAN have the same session id (but it's not common)
    • General semantics on how the session is managed for OTel on clients is discussed here
    • As sessions are anonymous they (may) also be linked across devices for things like multi-factor operations, user "starts" to log into a application on device A and continues on device B
  • An anonymous user id which identifies the current "user" across ALL sessions, this ID persisted across browser instances
    • This generally persisted as a cookie for 1 year, so the same anonymous id is used for the same user when the same cookie jar is used (cookies are not cleared or in a shared computer environment if different users are physically logged into a computer), note in a browser environment we have zero access to identify any authenticated user
    • Another way to think of the anonymous user id is a random (anonymous) identifier which does not and can not be used to identify the "real" end user (directly).
  • A unique device.id, normally derived from the device instance and is not always anonymous, when not specified by the application / runtime this is not available. and it is NOT the same as the user.
  • An Authenticated user id (previously enduser.id now user.id) where the organization has decided to explicitly identify the user, generally derived by authentication tokens and the value stored is specific to the organization (user email, name or generic id), it is entirely up to the application / web page to determine the level of PII that they are willing to include which CAN be used to identify the real end user.

So when there is a single user of a computer then (if provided)

  • the user.id (authenticated) would identify the same real user
  • the session.id will vary depending on usage, and is used to link the active period of usage across many page loads
  • the anonymous user id will always be the same (for the period of the cookie), also used for applications where the user doesn't actually log into the application ie. anonymous browsing of any home page for example, until the user "logs in" they are 100% anonymous.
  • the device.id would effectively identify the physical device that the user is currently using and would generally NOT change.

And for when multiple users are using the same (shared) hardware

  • the user.id (authenticated) would identify the same real user that is actually logged in (so if previous user logs out then this value would change)
  • the session.id will vary depending on usage,
    • if the "session tracking" cookie times out then this will be different
    • If in the "shared" environment the user "logs out" of the device and "logs in" as themselves then the session will always be different
    • However, if it is just different users sharing the same logged in environment then the same session.id would be reported for the same application
  • Similar for the anonymous user id
    • If they "log out" of the device and "log in" as a different user environment then it would be the different between users, but the same as it was for all previous times this user has accessed a site
    • And if they really are sharing the same device environment then all users would be reported as the same anonymous user.id (but they will generally have different session id's)
  • the device.id will always be the same when run on the same device

So we SHOULD NOT confuse the concept of "users" and "sessions" as for client environments they can and are often different.

So the "user" attributes identify "who" is doing something (both anonymously and explicitly identified), which the "session" identifies "what" is occurring, so its technically possible to identify across a sequence of requests that how an end user is using a system so it's possible to answer questions like

  • User A used the system for X pages, visiting A, B, C without closing the browser (or within a period of activity -- say within 30 minutes)
  • User A interacts with the system generally X times (sessions) a day for short periods

@trisch-me
Copy link
Contributor

related discussion about user.id
#1172

@trisch-me
Copy link
Contributor

After reading all the discussions I am in favor of just having user.anonymous_id in addition to user.id
We should update user.id description saying that it also represents authenticated user if there is an auth context. Because sometimes user is just a user, for example file.owner is just a user, who has created that file. It doesn't have direct auth connotation but might have indirect i.e. user has been logged in while creating that file.

@jsuereth
Copy link
Contributor

To add my discussion here from the meeting:

My concern is around the use of anonymous being ambiguous and possibly misleading.

This attribute means: "We don't know the identity of the user, so we invented an ID to track behavior, e.g. for RUM".

What this does NOT mean is "We have an anonymous identifier (removed personally identifying information)".

I'd prefer phrase this in some way to make it clear what's happening. E.g. "user.unknown_id", "anonymous_user.id", "user.unauth_id". I believe @MSNev had a good recommendation.

@heyams
Copy link
Contributor Author

heyams commented Jul 22, 2024

I recognize the potential for confusion with the term anonymous; it might not be clear and could lead to misunderstandings. @trask has proposed holding a vote to help decide on a new name for this attribute.

The following options were proposed in today's semantic conventions SIG and after the discussion with my team at Microsoft:

❤️ user.pseudo_id
🎉 user.tracking_id
👍 user.unauth_id
🚀 user.auth_id for authenticated user id and then use user.id as any other user id including this anonymous user id
👀 user.anonymous_user_id
😕 user.anonymous_id

description: a consistent id to track a best-effort unique user regardless the authentication state.

Note: I didn't add user.unknown_id because it can be a known user.

It's ok to vote for multiple options. Please vote.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 22, 2024

I think there are two problems leading to confusion:

  1. user is wider than website user. Attempt to add a generic attribute in user that's only applicable to browsers would be confusing for other types of users
  2. user.id|name|hash are not specific enough.

From browser perspective, it sound like user login should be populated in user.name (?). Anonymized (hashed) should be populated in user.hash (?) and then it's not clear how user.id would be used.

E.g. we can do:

user.name = lmolkova
user.hash = 864342fc7c9b552c2bea0513c9a47942 // md5("lmolkova")
user.id = 686f96e7-23d9-4c13-b5c0-7bc249d3f058 // guid recorded in my cookies

would it be helpful if we did this instead?

user.name = lmolkova
user.hash = 864342fc7c9b552c2bea0513c9a47942 // md5("lmolkova")
user.anonymous_id = 686f96e7-23d9-4c13-b5c0-7bc249d3f058 // guid recorded in my cookies
user.id = ? // nobody knows, probably same as user.name?

Yes, it'd make it more obvious for browser-specific case, but it would make things in user namespace even more confusing in general.


TL;DR: do we really need a new attribute? Can we reuse user.id for an anonymous user id in browsers?

I think it's the same option as "🚀 user.auth_id", but without introducing an attribute for authenticated user id - we have user.name for it.

It'd be great to have a md file for user in the context of browser/website that describes which user properties are applicable and how they should be populated.

@MSNev
Copy link
Contributor

MSNev commented Jul 22, 2024

@lmolkova It's not just the browser space, it's clients in general.
And there are scenarios where an application may wish to record both authenticated and unauthenticated id's of the system.

Generally, for the browser scenario (specifically Azure Monitor), there is the user.id (anonymous / random guid -- always present) and an optional user.auth_id (string) populated by whatever the application wants, sometimes its their email, sometimes it's their object id, I don't believe I've ever seen this as their name.

@lmolkova
Copy link
Contributor

do we need to record login and some other id for authenticated user? i.e. why do we need user.auth_id if login was recorded in user.name ?

@MSNev
Copy link
Contributor

MSNev commented Jul 23, 2024

do we need to record login and some other id for authenticated user? i.e. why do we need user.auth_id if login was recorded in user.name ?

Yes, some companies WANT to record the actual person who did the work for their internal auditing.
And it's not always their name.

Which is why I voted to "reclaim" user.id and the "random" identifier and introduce a user.auth_id so this could be used as required by the application. Failing that option keeping the user.id as the authenticated one and having a user.uaid for the random one would work.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 23, 2024

Yes, some companies WANT to record the actual person who did the work for their internal auditing.
And it's not always their name.

Would it be better if it was called user.login instead of name ? I.e. unique, but human-readable identifier

@MSNev
Copy link
Contributor

MSNev commented Jul 23, 2024

Would it be better if it was called user.login instead of name ?

While the example I gave was "associated" with the authenticated details (object id or email), it's not necessarily (100 %) the "login" it could be anything, and just as @jsuereth doesn't like calling it anonymous calling "some" (potentially) user identifying id the "login" is also not correct...

What should be recorded in a field called "login", should it be the username they entered during initialization, their associated (primary) email address (what happens when they sign in with a phone number) or some random OTP via a secondary (multifactor) device... Or even worse, they sign in with some 3rd party integration (for facebook, google, microsoft, etc) it's the app internally associates that id with an application "id" (like just a number)... So NO I don't like using login as a term for this.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 23, 2024

ok, so from the browser perspective:

  • user.name is not quite applicable for web users - there could be multiple logins/ids for the same user and they are not really names.
  • user.id is confusing since there could be multiple ids (anonymous, authorized, hash, etc)

The point I'm making is that by adding a new attribute to this namespace will make things even more confusing.


The things we need to record for browser users:

  • anonymous/persistent/guest/visitor id - always available or can be generated, does not change when user logs in. Not PII
  • authenticated user id - unique something identifying a specific user in the system - if user is authenticated it's the same no matter how user logged in. If user wasn't able to authenticate, it's not populated. Might be PII.
  • login/name/etc - something human-friendly, but not essential for RUM. Likely to be PII.

It we reuse user namespace, I think the least confusing option would be to

  • remove user.id
  • add user.anonymous|guest|visitor.id and user.authenticated.id

An alternative would be to define attributes in a new/different namespace. E.g.:

  • anonymous|guest|visitor_user.id and authenticated_user.id
  • client.anonymous|visitor|guest_user.id and client.authenticated_user.id

@trisch-me
Copy link
Contributor

Proposal from @heyams where we always have user.id and additionally user.auth_id seems more straightforward and applicable for different usemcases. I also would like to propose the idea of creating a sub-namespace auth and put there fields related to authentication, so use user.auth.id instead of original user.auth_id

@trask
Copy link
Member

trask commented Jul 29, 2024

It we reuse user namespace, I think the least confusing option would be to

  • remove user.id
  • add user.anonymous|guest|visitor.id and user.authenticated.id

I like this.

I think there was some concern about the term anonymous, so maybe

  • user.authenticated.id
  • and one of user.guest.id, user.visitor.id, user.pseudo.id, user.tracking.id

@mjwolf
Copy link
Contributor

mjwolf commented Jul 30, 2024

It we reuse user namespace, I think the least confusing option would be to

  • remove user.id
  • add user.anonymous|guest|visitor.id and user.authenticated.id

I think user.id needs to be kept for the OS user use case. User ID is a well-defined concept, without any other qualifiers.
For example, from the POSIX specification for getpwuid: https://pubs.opengroup.org/onlinepubs/9699919799/functions/getpwuid.html, this refers to "user id"/uid, many times without any further qualification on user.

For a more concrete example of a security use case, Falco alerts can have a field user.uid, defined as just "user ID". I think it would make sense to map this to user.id in the registry, there's no qualifier or other namespace that would really make sense.

@lmolkova
Copy link
Contributor

lmolkova commented Jul 30, 2024

let's separate user.id conversation so we can make progress on user.authenticated.id since it seems we have a consensus there.

I believe my concerns on user.id are captured in #1172 - it has a very limited scope (OS user id), but a very generic name.

@heyams
Copy link
Contributor Author

heyams commented Jul 31, 2024

@lmolkova if we agree on using user.authenticated.id, which new attribute should we use for the anonymous ID, considering the potential removal of user.id? @trisch-me raised a good point here.

What do you think about user.auth.id or having a sub-namespace under user, such as user.auth?

It seems we have reached a consensus via poll to have an authenticated user ID along with another attribute for a different ID:

image
image

Now, it's just a matter of naming it.

@lmolkova
Copy link
Contributor

let's use user.id for unauthenticated for the time being. It may change as an outcome of #1172.

I suggest user.authenticated.id because we don't recommend abbreviations

## Name Abbreviation Guidelines
Abbreviations MAY be used when they are widely recognized and commonly used.
Examples include common technical abbreviations such as `IP`, `DB`, `CPU`,
`HTTP`, `URL`, or product names like `AWS`, `GCP`, `K8s`.
Abbreviations SHOULD be avoided if they are ambiguous, for example, when they apply
to multiple products or concepts.

auth is an abbreviation, it's ambiguous and can be read as authenticated, authorized, authentication, etc. Using authenticated is explicit and follows the guidelines.

@trisch-me
Copy link
Contributor

I would propose using authentication instead of authenticated. Latter implies an activity, as in user has been authenticated. But I would like to introduce other statis attributes related to the authentication, such as user.authentication.domain, which was skipped because we need an auth sub-namespace for it

@Zenithar
Copy link

Zenithar commented Aug 19, 2024

👋 - I was shimming into this thread while looking for standard authentication span tag conventions. What about client authentication (workload authentication) vs user authentication (workforce authentication)?

Do you also register a client.authentication.* namespace? For example, the trust model in OAuth is based on client and user identities. Or I can have a workload authentication based on mTLS, transporting a user authentication context, such as it represents an on-behalf-of intent.

I would extend the authentication namespace to this. :

# String identifier to describe the authentication methods associated to the context
# Example: 
# user.authentication.methods = "pwd,mfa" (ref - https://www.rfc-editor.org/rfc/rfc8176#section-2) 
# client.authentication.methods = "mtls"
<identifiable>.authentication.methods = <string list> 

# String identifier to describe the subject identifier (aka user_id)
# Exemple: 
# user.authentication.identity.subject = "arn:aws:iam::123456789012:user/johndoe"
# client.authentication.identify.subject = "spiffe://example.org/ns/default/sa/default"
<identifiable>.authentication.identity.subject = <string>

# Pseudo-anonymised subject for privacy 
# Exemple: 
# user.authentication.identity.subject_hash = "5b8491046bd5db5e945654dcc60343b367f181cc642a449c150ddd42e1e4b880" # HEX(HMAC-SHA256($key, $subject)) 
<identifiable>.authentication.identity.subject_hash = <string>

With <identifiable> as a user or a client.

By the way, I would not recommend using user.authentication.id as id is too generic and lets the convention users store anything that could match their understanding. By doing this, convention users would be invited to use the identifiable object datastore ID (PK, Mongo ID, etc.) as a span tag, which will propagate technical implementation information. At the same time, the subject is not a datastore-dependent value holding all the necessary information to look up the associated identity.

Secondly, using an identity sub-namespace offers extension points that could be used according to the associated authentication.methods (mtls => public key fingerprint, client certificate fingerprint, client certificate SANs; private_jwt => public key fingerprint; etc.).

@thompson-tomo
Copy link

I would propose using authentication instead of authenticated. Latter implies an activity, as in user has been authenticated.

I agree with authentication as if the authentication has failed this needs to be captured as well.

@heyams
Copy link
Contributor Author

heyams commented Aug 28, 2024

👋 - I was shimming into this thread while looking for standard authentication span tag conventions. What about client authentication (workload authentication) vs user authentication (workforce authentication)?

Do you also register a client.authentication.* namespace? For example, the trust model in OAuth is based on client and user identities. Or I can have a workload authentication based on mTLS, transporting a user authentication context, such as it represents an on-behalf-of intent.

I would extend the authentication namespace to this. :

# String identifier to describe the authentication methods associated to the context
# Example: 
# user.authentication.methods = "pwd,mfa" (ref - https://www.rfc-editor.org/rfc/rfc8176#section-2) 
# client.authentication.methods = "mtls"
<identifiable>.authentication.methods = <string list> 

# String identifier to describe the subject identifier (aka user_id)
# Exemple: 
# user.authentication.identity.subject = "arn:aws:iam::123456789012:user/johndoe"
# client.authentication.identify.subject = "spiffe://example.org/ns/default/sa/default"
<identifiable>.authentication.identity.subject = <string>

# Pseudo-anonymised subject for privacy 
# Exemple: 
# user.authentication.identity.subject_hash = "5b8491046bd5db5e945654dcc60343b367f181cc642a449c150ddd42e1e4b880" # HEX(HMAC-SHA256($key, $subject)) 
<identifiable>.authentication.identity.subject_hash = <string>

With <identifiable> as a user or a client.

By the way, I would not recommend using user.authentication.id as id is too generic and lets the convention users store anything that could match their understanding. By doing this, convention users would be invited to use the identifiable object datastore ID (PK, Mongo ID, etc.) as a span tag, which will propagate technical implementation information. At the same time, the subject is not a datastore-dependent value holding all the necessary information to look up the associated identity.

Secondly, using an identity sub-namespace offers extension points that could be used according to the associated authentication.methods (mtls => public key fingerprint, client certificate fingerprint, client certificate SANs; private_jwt => public key fingerprint; etc.).

👍

(Next Monday is a holiday in the U.S.A)
I will share this in the next next Monday's Semconv SIG. Here is my finding so far:

existing user namespaces in semantic-conventions repo: 

======================================
**What do we have currently**:

User: 
	-id
	-name
	-hash
	...
	
	
Enduser (deprecated due to ECS https://www.elastic.co/guide/en/ecs/current/ecs-user.html)
	-id
	-name

process.real_user.id
process.saved_user.id
process.user.id

======================================
**What do we want to accomplish**:

1. clarify `User` namespace, is it too broad? should user namespace be used with nesting, 
    e.g. `os.user, client.user, service.user, server.user, browser.user, db.user`

2. capture end user in a different namespace:
	* app.user (app can be a process, service, or client, mobile app, web app, what is an app)
	* enduser (it's clear that this is for the end user, not for db, process, or service)
	
	app
		- name
		- user
			- id // maybe PII
			- name // not PII
			- anonymous_id // not PII
			- hash // not PII
			
	or 
	
	Enduser
		- id
		- name
		- anonymous_id
		- hash
3. `authentication` would be a sub-namespace under `<parent_namespace>.user`, e.g. `db.user.authentication`

Feel free to offer feedback or discuss it in the SIG meeting.

@trisch-me
Copy link
Contributor

trisch-me commented Sep 30, 2024

Hey @heyams. thanks for info. To answer your questions:
User namespace should definitely be used under other namespaces. We do this already in ECS. The question for me - do we want to allow user namespace to be independent and be used as a root namespace. In order to use this nesting we need embed feature, which will be implemented in tooling.
I think the main problem here is if we have generic enough fields for multiple use cases and this is what we should solve in the first place

Regarding authentication I'm in favor of having it under user namespace and move all related fields there.

@lmolkova
Copy link
Contributor

Based on the discussion in Semconv SIG on 9/30:

  • We need a way to capture multiple users on one telemetry item. E.g. OS user that runs the process, identity used to authorize calls to remote services, the end-user the operation is performed for.
  • The plain user namespace is somewhat abstract - OTel semantic conventions should put user/identity-related information under another namespace. It can be done by defining domain-specific user-related attributes (manually or with some future automation that can embed namespace under another one).
  • Arguably user-tracking is browser/mobile/etc concern and should be defined in the corresponding namespace(s) only.

Action items:

  • bring back deprecated enduser namespace (or define a new namespace that would represent browser/mobile end-user identity) (@heyams )
  • check top-level user namespace use-cases in ECS (@trisch-me)

We'll have another discussion on enduser attributes naming for tracking/anonymous id and authenticated id.

@trisch-me
Copy link
Contributor

I have checked for ECS - within Elastic our usual case is actually to use user in the root level, without additional parent namespace. It also makes querying the data easier - you can just search for user.* fields

We do use user with parent namespace in those cases where it's ambiguous - for example process has multiple types of users, therefore every user has it's own name, i.e. process.real_user, process.saved_user etc.

Or in cases of one user (actor) performs operation on another user (target) we need to namespace the users to distinguish them.

In most cases the context provides enough understanding of the type of user being referenced, making additional namespacing unnecessary. Multiple usage of the same user, or any other namespace, will be supported in embed by using an alias field such as as, during embedding of the namespace.

I was also thinking about differences between the multiple domains, my suggestion is to make a comparison to understand where do we have differences/unclear field usage. This would help us determine if, in fact, they are not as different as we initially thought. As discussed during the meeting, it might turn out that users/instrumentation could simply skip fields that are not applicable to their specific use case.

@heyams heyams linked a pull request Oct 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: In Progress