Replies: 7 comments 10 replies
-
Thanks @makubacki for putting this out there and it looks like we are close to having something that can run automatically for each PR, improve overall code and contribution quality, and bring some additional best to the projects in Tianocore. Since Monday September 5th is a US holiday the next Tools and CI meeting will be held on September 12th. I look forward to more discussion and progress in that forum. |
Beta Was this translation helpful? Give feedback.
-
To provide some updates to my original post, here's the initial set of queries from https://github.com/github/codeql I recommend as a starting point:
|
Beta Was this translation helpful? Give feedback.
-
To follow up on this point:
The CodeQL CLI can be used as follows to wrap around the edk2 build process ( codeql database create cpp-database --language=cpp --command="stuart_ci_build -c .pytool/CISettings.py -p MdeModulePkg -a IA32,X64 TOOL_CHAIN_TAG=VS2019 Target=DEBUG --clean --skip-post-build" --overwrite The following command can be used to generate a SARIF file (called codeql database analyze cpp-database codeql\cpp\ql\src\Security\CWE\CWE-457\ConditionallyUninitializedVariable.ql --format=sarifv2.1.0 --output=query-results.sarif SARIF logs can be read by log viewers such as the Sarif Viewer extension for VS Code. |
Beta Was this translation helpful? Give feedback.
-
Adoption of CodeQL in edk2This RFC proposes adoption of CodeQL as a static analysis tool used in the TianoCore edk2 project. IntroductionCodeQL is open source and free for open-source projects. It is maintained by GitHub and naturally has excellent integration with GitHub projects. CodeQL generates a "database" during the firmware build process that enables queries to run against that database. Many open-source queries are officially supported and comprise the vulnerability analysis performed against the database. These queries are maintained here - https://github.com/github/codeql. Queries are written in an object-oriented query language called QL. CodeQL provides:
LGTM is often paired with CodeQL so you may read about that in documentation. LGTM was acquired by GitHub a few years ago and all of the functionality has been moved to GitHub code scanning. You can read more about this process on the GitHub blog post. CodeQL is an actively maintained project. Here is a comparison of edk2 commit activity versus CodeQL for reference: Because CodeQL does maintain a strong open-source presence, the TianoCore community should be able to file issues and pull requests into the project. CodeQL Usage in edk2CodeQL provides the capability to debug the actual queries and for our (Tianocore) community to write our own queries and even contribute back to the upstream repo when appropriate. In other cases, we might choose to keep our own queries in a separate TianoCore repo or within a directory in the edk2 code tree. This is all part of CodeQL Scanning and this page has information concerning how to configure CodeQL scanning within a GitHub project such as edk2. Information on the particular topic of running additional custom queries is documented here in that page. In addition, CodeQL presents the flexibility to:
Dismissing CodeQL AlertsThe following documentation describes how to dismiss alerts:
CodeQL in Pull RequestsThe proposal is to enable CodeQL in a step-by-step fashion. The goal with this approach is to make steady progress enabling CodeQL to become more comprehensive and useful while not impacting day-to-day code contributions. Throughout the process described in this section, CodeQL Code Scanning will be a mandatory status check for edk2 pull requests. It is proposed to make no changes to the PR evaluation process already in place that only builds packages with changes in the pull request. Query Target ListThe first step is to define a target list of queries for edk2. While CodeQL has the ability run against several languages (including Python), I propose we enable C/C++ first and then return to evaluate Python analysis. I propose the following as an initial candidate list:
CodeQL query files (.ql files) contain metadata about the query. For example, cpp/conditionally-uninitialized-variable states the following about the query:
We can automatically include queries against these criteria using "query filters". For example, this could include any That would mean new queries checked into the CodeQL repo could cause unexpected build breaks. Since edk2 favors consistency in CI results, I propose we start with the fixed query set proposed at the top of this section.
Suggesting New QueriesIt is proposed new queries be enabled by sending an RFC to the TianoCore development mailing list (devel@edk2.groups.io) with the query link and justification for adopting the query in edk2. Anyone is welcome to suggest new queries. Enable One Query at a TimeEnabling the candidate list of queries immediately will trigger hundreds of alerts. Therefore, I recommend we enable one query at a time. The PR to enable each query can be accompanied by the code fixes for the query to pass. If a query is deemed fruitless during enabling testing, it can simply be rejected. The goal is to enable an effective set of queries that improve the codebase. As the list of enabled queries builds, the total CodeQL coverage will increase against active PRs. I recommend we start with a query that has a very low alert frequency to enable the initial GitHub Action and build momentum for enabling additional queries. It is proposed that each query be enabled in a "query staging branch" where the query is added to the edk2 query suite and each package contributes their patches to the branch to fix any issues necessary for the query to be successful. Once that branch is ready, it can be converted into a patch series that enables the query for the codebase in a single pull request. That being said, we can configure the severities that cause pull request failure. If we find a useful informational query, that could be enabled without impacting the pull request status result. As of this RFC, those queries are not proposed. edk2 Pull Request ExampleI put together a PR 3179: Test CodeQL that demonstrates a basic CodeQL GitHub action. You can see the CodeQL link in the PR Checks area (direct link to results): You can also see the run on the edk2/actions page: This initial test was mainly to ensure permissions were set up to allow the action to run and to settle other logistics in getting the action (first GitHub action in edk2) setup and working properly. edk2 Pull Request TimeWe can roughly expect CodeQL to take an hour to complete against a pull request. Since this would be the longest running task in a pull request in parallel to other jobs, pull requests should be expected to take around an hour to have all status checks finished. This is not considered an issue since:
CodeQL has the ability to ignore certain file paths and I recommend we leverage that to avoid running CodeQL on pull requests that only change the maintainers.txt file for example. CodeQL Scheduled BuildsWhile pull requests will only build packages deemed necessary based on the PR evaluation heuristics used today, the plan is to have a 24 scheduled build that will build all of the packages and place the results in GitHub Code Scanning. We will use this process to understand if there any differences discovered. Based on the results, we might choose to adjust our strategy for running CodeQL in pull requests to prevent future discrepancies. If differences are present, it is proposed maintainers of the affected package resolve those issues. CodeQL LocallyThe CodeQL CLI can be used as follows to wrap around the edk2 build process ( codeql database create cpp-database --language=cpp --command="stuart_ci_build -c .pytool/CISettings.py -p MdeModulePkg -a IA32,X64 TOOL_CHAIN_TAG=VS2019 Target=DEBUG --clean --skip-post-build" --overwrite The following command can be used to generate a SARIF file (called codeql database analyze cpp-database codeql\cpp\ql\src\Security\CWE\CWE-457\ConditionallyUninitializedVariable.ql --format=sarifv2.1.0 --output=query-results.sarif SARIF logs can be read by log viewers such as the Sarif Viewer extension for VS Code. Once an edk2 query suite (.qls) file is created, that same file can be used for pull requests and by developers locally using the CodeQL CLI to run the same set of queries with the |
Beta Was this translation helpful? Give feedback.
-
Quick status update: The changes are ready to enable CodeQL for edk2. The plan is to start with a single query that has all alerts suppressed. From a results standpoint, this will be similar to not running a query but allow us to determine the GitHub action is working as expected before enabling additional queries. Unfortunately, this has led to results with an occasional filesystem error on Ubuntu 20.04.5 LTS images. We've also reproduced the issue locally. The failure rate is too high to push the CodeQL change. A bug has been filed here for now. I'll update again when there's new info. |
Beta Was this translation helpful? Give feedback.
-
cpp/conditionally-uninitialized-variable fix here: https://github.com/Erich-McMillan/edk2/tree/users/erich/codeql-cve457 Moving onto cpp/overflow-buffer |
Beta Was this translation helpful? Give feedback.
-
To close this thread. I am adding a wiki page. v1 patch is here: https://edk2.groups.io/g/devel/message/96437 Should be merged soon and then CodeQL information for the project will be tracked there. |
Beta Was this translation helpful? Give feedback.
-
Hello, I am working on integrating CodeQL with edk2 and I would like to have a conversation about vulnerability/static analysis and CodeQL.
About CodeQL
CodeQL is open source and free for open-source projects. It is maintained by GitHub and naturally has excellent integration with GitHub projects. CodeQL generates a "database" during the firmware build process that enables queries to run against that database. Many open-source queries are officially supported and comprise the vulnerability analysis performed against the database. These queries are maintained here - https://github.com/github/codeql.
Queries are written in an object-oriented query language called QL. CodeQL provides a command-line (CLI) interface, a VS Code extension to help write queries and run queries, GitHub action support for repo integration via code scanning, in addition to other features.
LGTM is often paired with CodeQL so you may read about that in documentation. LGTM was acquired by GitHub a few years ago and all of the functionality has been moved to GitHub code scanning. You can read more about this process on the GitHub blog post.
CodeQL and edk2
CodeQL provides the capability to debug queries themselves and for our (Tianocore) community to write our own queries and even contribute back to the upstream repo when appropriate. In addition, it presents the flexibility to build databases locally, relatively quick test queries locally for a fast feedback loop, and the ability customize the files and queries used in the edk2 project on a project-specific basis.
edk2 Integration Example
I put together a PR 3179: Test CodeQL that demonstrates a basic CodeQL GitHub action. You can see the CodeQL link in the PR Checks area (direct link to results):
You can also see the run on the edk2/actions page:
This initial test was mainly to flush out permissions and other logistics in getting the action (first GitHub action in edk2) setup and working properly.
Next Steps
I agreed to present CodeQL at the upcoming Tianocore Tools, CI, and code base construction meeting so I'm hoping to collect feedback and here and include that in the meeting discussion.
Others I've seen interested in similar topics in the past: @felixpgithub, @mdkinney, @bcran, @heatd, @spbrogan
Beta Was this translation helpful? Give feedback.
All reactions