Skip to content

fix(query): various fixes for passwords and secrets queries #8017

Draft
cx-andre-pereira wants to merge 26 commits intoCheckmarx:masterfrom
cx-andre-pereira:Fixs_to_Passwords_and_Secrets
Draft

fix(query): various fixes for passwords and secrets queries #8017
cx-andre-pereira wants to merge 26 commits intoCheckmarx:masterfrom
cx-andre-pereira:Fixs_to_Passwords_and_Secrets

Conversation

@cx-andre-pereira
Copy link
Copy Markdown
Contributor

@cx-andre-pereira cx-andre-pereira commented Mar 27, 2026

Reason for Proposed Changes

  • This PR is an "expansion" on the fix(query): ensures 'passwords and secrets' queries do not flag fields in 'Proto' files (.proto)  #8013, where the focus was lower False Positives associated with ".proto" files. Here fixes for issues that arose during testing done to "passwords and secrets" queries in the aforementioned PR will be the focus.

  • As testing went on more and more small issues/improvements that could be fixed/added to the passwords and secrets regex rules and tests came up.

  • Every query and allow rule was tested individually to ensure there are test files for each and everyone, additionally every regex was altered in some way so as to improve functionality, intent clarity or both.

  • Since the changes are plentiful they will be exposed together with the proposed changes associated in the following section instead of exposing all issues at once here.

Proposed Changes

Test Files

  • Starting with the test files, all of them were updated to include a comment(s) at the start of the file or, in the case of JSON files, a "metadata"(s) field(s); these are used to state exactly what the test file should flag for or why it does not flag in the case of allow rules and negative tests. The format is relatively consistent for :

    • Positive-tests:
      • query_name - query_id positive-test then the lines that should flag will be marked as positiveX for each line starting with X=1 then 2,3...; in the case of JSON or samples where comments can't be added close to the target line (like pos14), the lines will be indicated as (line Z/Y...).
        • If there is more than a single relevant query/allow-rule the format will be positive-test - #n where n will start as 1 and increase for each query, then each line will be marked with the associated " - #n" value associated with the query that should flag in said line.
          • Some of the really short tests do not indicate target line in any way since the trigger line is obvious.
    • Negative-tests:
      • Here the format can either be just like the positive test but with the "negative" keyword and a new section between quotes with details on why the test does not flag:
        • query_name - query_id negative-test (<details on why test does not flag>)
      • OR:
        • Generic Negative Test - <details on why test does not flag>
      • Since often times there is no line explicitly close to a query's target regex, many negative tests do mention/have a comment on a specific line(s).
    • Allow-rule-tests (should be negative):
      • Finally for the allow rules a new field is added since they do not have unique uuids of their own:
        • query_name - query_id - <allow_rule_name> allow-rule-test
      • Note that allow rules applied to all queries have a "query_name" == "Global allow rule".
      • Target lines are marked as "negativeX" if the allow rule is the only relevant one for a given sample, otherwise the generic " - #n" referencing policy mentioned before is used.
  • The format might have some other nuances but they are, as is the format of the comments in its entirety, easily interpreted since it was intended as such.

  • Four tests were shortened since they were unnecessarily big: negative28, negative33, negative51 and negative53.

  • Eleven new negative tests were added (62-74); all of these tests are relative to allow rules that did not have a single valid test to ensure they are working properly.

  • Originally I was going to point out individually when and how each allow rule was seemingly added without any relevant test but just note that it mostly comes down to allow rules from the original "rewrite passwords and secrets query to use regex based strategy" commit, for those I made sure to check the original tests for a sample that might have been removed later on but that was never the case; or from commits where allow rules were added without touching the tests at all (like fix(passwords_and_secrets): add allow rules to fp results #6051).

  • Regardless the fact of the matter is that I did not remove a single test, all I did was add test that were missing in the current master branch in the first place so the test scope has only increased and now with assurance that every single rule has at least one test associated with it.


Common Regex Issues

  • Moving on to the actual regex, I will start by mentioning some of the most common issues I found, in the regex_rules git diff it is clearly nearly every regex suffered at least a slight change and that is mostly due to the following problems:

    • 1 - Unnecessary groups ( ). Since the golang regex engine does not support group referencing through its RE2 syntax engine, as stated in the documentation - backreference (NOT SUPPORTED)(for efficiency sake), there are only 2 cases where a group is necessary, if we want to catch full words but still have them be non mandatory like : "secret_?(key)?" or to have more than a single word be allowed in a given regex like: "...[:=]\\s*(write|read|none)", it can also serve both these purposes at once like so: "secret_?(key|value)?". Currently many of the regex use groups for seemingly no reason and so all such instances have been removed.

    • 2 - Ensuring "-" is not used as a meta character. In a lot of character classes ([abc...]), the "-" is used carelessly creating a range of characters to include when it seems very apparent it was meant to represent only itself. For all instances the range was short going only from the decimal value "43" to "46" on the ASCII table, but this meant the comma , number 44 character was being included implicitly. This could cause issues and FPs through future regex changes and so I made sure to backslash-escape every instance of a hyphen character that is inside a character class ensuring the intent is obvious, additionally I kept the comma , support so as to make sure no regression is triggered since it has been included in a lot of the regex for a while, even if unknowingly. Note that when included as the last character in a given character class like line 324 it does not create a range but if any character is added at the end of said character class the range would be set if not for this change.

    • 3 - Use of a (:|=) group instead of equivalent [:=] character class. This is a very simple change, I simply aligned the use of the [:=] character class over the equivalent group for every regex since the large majority already used the character class over the group it didn't make sense to have 2 functionally identical bits of the regex with different syntax.

    • 4 - Removed character class from [_]?. A lot of the regex allow for an optional underscore between key words but the character class around it is completely useless so I have simplified all instances of this sort to _?

    • 5 - Standardized the ['\"] character class to always follow that pattern and not be [\"'] for consistency sake once again.


Removed Allow Rules

  • Before going through regex that had changed unrelated to 1-5, I will explain the reasons for the complete removal of the following allow rules:
    • Avoiding CF AllowUsersToChangePassword

      • This allow rule is tested in the negative33 test. The test contains the test excerpt from when it was originally added as well as a new direct value attribution to false. The truth is that, as I understand it, the Avoiding TF resource access(Generic Password) and the Avoiding Boolean's(Global) cover all scenarios in which the removed allow rule would be useful.
    • Avoiding next_token Var (duplicate)

    • The most straight forward fix, the allow rule is simply duplicated as introduced in the original commit.

    • !Ref is a cloudFormation reference


Changes in Regex (excluding common issues)

  • For the final section the regex with changes unrelated to 1-5 will be detailed:

    • Generic Password - "Avoiding Terraform 'optional' statement" and "Avoiding Terraform 'try' statement" allow rules - In these regex I removed the enforced end of line$, it does not align with any of the other regex and made these 2 excessively restrictive.

    • Generic Secret - added a backslash-escape on the left square bracket "\\[" just for intent enforcement, the bracket is literal and this way that is more obvious, additionally the "entropies["group"]" field was updated to 2 instead of 3 since a group was removed from the regex.

    • Generic Secret - "Avoiding TF resource access" allow rule - updated to be inline with all other "Avoiding TF resource access" allow rules. (more details - as stated here they were supposed to be "identical" in the first place) This change is easier to see with the clean (non string) regex before : \s*([\.\)\]\$] vs after : \s*([\[\.\)\]\}\$], this is the section right before the |(:\s*null))|null) excerpt and is now aligned with the generic password equivalent rule.

    • Generic Secret - "Avoiding CloudFormation Parameters Descriptions" allow rule - changed the main regex from ([A-Za-z0-9/~^_!@&%()=?*${}+-.'\"]+) to .* this is aligned with the generic passwords "Avoiding description field" allow rule, assuming we want to ignore everything inside the description field no matter what even including literal quotes "'\"" then there should be no issue in simply using the dot wildcard and making the regex intent obvious.

    • Generic Secret - "Avoiding description field" - in this rule I changed the name itself to "Avoiding Secrets from Variable Interpolation" to better describe the scope of the rule, I found that it catches False Positives for samples unrelated to "Azure Key Vault" like the negative60.tf aws_msk_scram_secret_association resource sample (line39).

    • K8s Environment Variable Password - this query had character classes defined as [\"|'], clearly the intent was to have a group like (\"|'), it does not make sense to catch a literal vertical line | here, additionally the detectLineGroup field was updated after removing 3 unnecessary groups.

    • Slack Token and Generic Token - "Avoiding Slack Token" allow rule - both of these regex have the same issue as the last query, they use a character class like this xox[p|b|o|a], where it is now even more obvious the intent was to use a group with the vertical lines as alternation operators.

    • Generic Access Key - in this query I added a backslash-escape to the left square bracket [ for clearer intent just like before (this was done in other queries but I won't mention it anymore going forward)

    • Generic Private Key - moved the "specialMask" regex to be after the allowRules array just like all other queries.

    • Generic Token and Encryption Key - "Avoiding TF resource access" allow rules updated to be in line with the generic password and generic secret equivalents.

    • CloudFormation Secret Template - Fixed the id of this query so it is no longer identical to the Twilio API Key query, currently any scan will always flag for the API Key even if the target was flagged by the regex in this query.

    • Generic Password on YAML files when value in tuple - this query shows yet another misuse of the vertical line inside a character class: [\\n|\\r] and [^\\n|\\r] were fixed to [\\n\\r] and [^\\n\\r], as well as the specialMask regex.

  • Note all references to CloudFormation were adjusted to be written with that specific casing CloudFormation, since before we had things like Cloudformation,CLoudFormation and cloudFormation(for example the removed !Ref allow rule).

I submit this contribution under the Apache-2.0 license.

… stoping passwords and secrets flags on said files
… commented lines, new allow rules prevent proto files fields from flagging
…generic api key enforces trimmed line starting with 'access'
…Generic Token', improved 2 TF reousrce access rules, added missing positive test for 'Encryption Key' query, added samples (similar to neg28) in negative61 for TF resource access allow rule in 'Encryption Key' query (was also missing test)
…les individually, added tests for cases missing them, minor improvements to a lot of regex
@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Mar 30, 2026

⚠️ GitGuardian has uncovered 7 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

Since your pull request originates from a forked repository, GitGuardian is not able to associate the secrets uncovered with secret incidents on your GitGuardian dashboard.
Skipping this check run and merging your pull request will create secret incidents on your GitGuardian dashboard.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
4266022 Triggered Generic Password 88eee32 assets/queries/common/passwords_and_secrets/test/positive38.yaml View secret
- - Generic High Entropy Secret 5d246c4 assets/queries/common/passwords_and_secrets/test/negative64.tf View secret
- - Generic Password 5d246c4 assets/queries/common/passwords_and_secrets/test/negative72.tf View secret
20161077 Triggered Generic High Entropy Secret 745e832 assets/queries/common/passwords_and_secrets/test/positive34.yaml View secret
- - Generic High Entropy Secret 5d246c4 assets/queries/common/passwords_and_secrets/test/negative66.json View secret
- - Generic Password 5d246c4 assets/queries/common/passwords_and_secrets/test/negative73.yaml View secret
4266006 Triggered Generic Password 5d246c4 assets/queries/common/passwords_and_secrets/test/negative32.yaml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant