Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance requested: static SQL queries may contain sensitive values #436

Closed
dyladan opened this issue Oct 23, 2023 · 7 comments · Fixed by #976
Closed

Guidance requested: static SQL queries may contain sensitive values #436

dyladan opened this issue Oct 23, 2023 · 7 comments · Fixed by #976
Assignees

Comments

@dyladan
Copy link
Member

dyladan commented Oct 23, 2023

The current database semconv states the following:

Should be collected by default only if there is sanitization that excludes sensitive information.

source:

Should be collected by default only if there is sanitization that excludes sensitive information.

In JS, most of our instrumentations solve this by taking advantage of the parameterized API provided by SQL clients. For example, the user may call sql.query('SELECT * FROM mydb WHERE userid = ?', userId) or similar. In this case we collect the string 'SELECT * FROM mydb WHERE userid = ?'. This presents the following problems:

  1. It is still possible for the string in a parameterized query to contain some non-parameterized values, which may contain sensitive data. Without parsing the string, we have no way of knowing.
  2. If the query is not parameterized, we collect it directly and it may still contain sensitive data.

In at least some SIGs such as Java, they have handled this by parsing the SQL and removing values, but in JS this is difficult due to bundle size restrictions and lack of good parsers available. I suspect this may also affect other SIGs. What is the recommendation of the Semantic Conventions group?

@trask
Copy link
Member

trask commented Oct 23, 2023

this is tricky, I don't love any of these options:

  • disable statement capturing by default and make it opt-in (not a great user experience as most users want statement capture)
  • potentially capture sensitive data
    • I think this is a worse option for database statements compared to URLs. URLs have been getting (potentially) logged by http servers for a long time, so I think it is less surprising to developers that if they include sensitive data in their URLs that the sensitive data may get logged somewhere.
  • require implementing parsing/sanitization logic across all languages

@dyladan
Copy link
Member Author

dyladan commented Oct 23, 2023

of those options, I think (1) is the best? I'm not entirely solid in that opinion though. It would also help if there was a sanitization processor (there might be) for the collector. A half-step to (1) might be to collect by default but issue a warn log that the user has to explicitly disable in config? There are issues with that as well though because SDK logging is entirely disabled by default in JS and it isn't common to enable it unless there is a problem.

(3) is really tough. There are likely good parsers in some languages, but not all, and requiring them to be included as a part of the bundle is tough in resource constrained environments (which is common in at least JS).

@dyladan
Copy link
Member Author

dyladan commented Oct 23, 2023

We could also disable collection of static queries by default. In this case parameterized queries would be assumed to be safe, which potentially can still be untrue, but in most cases is probably a reasonable assumption.

@lmolkova
Copy link
Contributor

lmolkova commented Feb 9, 2024

Related: #877

@trask
Copy link
Member

trask commented Apr 24, 2024

currently the database semconv says that db.query.text (previously db.statement)

SHOULD be collected by default only if there is sanitization that excludes sensitive information.

I think what's missing that would help answer @dyladan's original question above is an explicit recommendation on whether to trust that parameterized queries have already been sanitized.

my initial thought is that capturing parameterized queries by default is probably a good trade-off between risk and reward, meaning

  • much lower risk of parameterized queries having sensitive data compared to non-parameterized queries
  • high reward of capturing parameterized queries, as parameterized queries are the recommended approach, and database spans without query text are much less useful

this might be an apt analogy in terms of risk / reward(?)

  • db parameterized query : url path
  • db non-parameterized query : url query string

@trask
Copy link
Member

trask commented May 1, 2024

@dyladan can you confirm that #976 would resolve this issue from your perspective? thanks

@dyladan
Copy link
Member Author

dyladan commented May 1, 2024

Yes I can confirm that it does

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

4 participants