Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

first cut cve pipeline and refactor fetcher #2

Open
wants to merge 18 commits into
base: add_structure
Choose a base branch
from

Conversation

Scanteianu
Copy link
Owner

@Scanteianu Scanteianu commented Nov 22, 2023

This PR is an MVP/POC level VDR Creation for Temurin.

The finished report is in data/vdr.json
Data downloaded from NIST and OJVG is also saved in data directory, sometimes as intermediate representations, in order to allow for offline testing and report re-creation when the original site is unavailiable.

  • tools to fetch OJVG reports and parse them to a json which then gets converted to cyclonedx vulnerabilities
  • tools to talk to NIST API to fetch extra information about cves when available
  • unit tests for tools
  • pipeline files to create a vdr from the NIST/OJVG websites

Things which are not yet here:

  • cve deduplication in case a cve shows up twice
  • better parsing of NIST affects versions/decoupling from Oracle jdk affects versions
  • backup plans for when data is not found or when NIST access is denied

Code organization:

  • individual components for reaching out to APIs and interpreting responses are in the cve_reporter package
  • pipelines are in the top level directory
  • tests are in the tests directory

Copy link

@tellison tellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if, in the long term, you will be better served with an intermediate representation of the vulnerabilities before you build a BOM model. It may allow us to visualize and debug the incoming data (OpenJDK list, NIST data, affected ranges, etc) before you finally combine them into a single BOM for consumption.

cvePipeline.py Outdated

bom = report.get_base_bom()
#todo: take date as arg or figure out other way to seed
vulns = fetch_vulnerabilities.fetch_cves('2023-01-17')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see the fetching of CVEs result in a serialized JSON file. That way we can study it to check this part of the pipeline (which is highly likely to be affected by external OpenJDK website changes) is working as expected.

The pipeline can continue, and eventually with the option of running from the serialized fetched vulnerabilities file - so we don't have to run the full pipeline each time, or we can run from a patched file, etc.

Copy link
Owner Author

@Scanteianu Scanteianu Jan 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is now an intermediate json representation which we can dump to a file

Comment on lines 66 to 83
affects = BomTarget(
ref=component
)
for v in affected_versions:
affects.versions.add(v)
vuln = Vulnerability(
id=id,
source=VulnerabilitySource(name="National Vulnerability Database", url=link),
#todo: dummy date
published=datetime.fromisoformat(date),
updated=datetime.fromisoformat(date),
description="",
recommendation=""
)
vuln.affects.add(affects)
vulnerabilities.append(vuln)
print(vuln)
return vulnerabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to build a BOM here, or just return a simple data structure that captures the info scraped from the website?

If BOMs provide all you need then fine, but simplify where you can at this stage IMHO.

for metrics in cve["metrics"]["cvssMetricV31"]:
#todo: do we need recommendations from NIST as well?
relevant = {}
relevant["source"] = metrics["source"]
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tellison is this the kind of intermediate data structure you were thinking about for representing the data before populating it into the BOM itself? (I know this is on the nist side, i can eventually move the ojvg side to a similar thing as well)

resp_dict["description"] = description
resp_dict["versions"] = extract_versions(cve["configurations"])
return resp_dict
def extract_versions(cve_configs):
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tellison i think this is dubious, but i don't have a better way of finding an affects version. I'm open to suggestions here (i've basically manually parsed out the oracle jdk version, minus update, but we can special case that, and i'm assuming it's 1:1 with open jdk). There's code to extract it from openjvg, but they publish it at the top of the webpage, and the webpage can contain multiple cves, so i'm not sure that's the best place to get information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants