-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first cut cve pipeline and refactor fetcher #2
base: add_structure
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if, in the long term, you will be better served with an intermediate representation of the vulnerabilities before you build a BOM model. It may allow us to visualize and debug the incoming data (OpenJDK list, NIST data, affected ranges, etc) before you finally combine them into a single BOM for consumption.
cvePipeline.py
Outdated
|
||
bom = report.get_base_bom() | ||
#todo: take date as arg or figure out other way to seed | ||
vulns = fetch_vulnerabilities.fetch_cves('2023-01-17') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see the fetching of CVEs result in a serialized JSON file. That way we can study it to check this part of the pipeline (which is highly likely to be affected by external OpenJDK website changes) is working as expected.
The pipeline can continue, and eventually with the option of running from the serialized fetched vulnerabilities file - so we don't have to run the full pipeline each time, or we can run from a patched file, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is now an intermediate json representation which we can dump to a file
cvereporter/fetch_vulnerabilities.py
Outdated
affects = BomTarget( | ||
ref=component | ||
) | ||
for v in affected_versions: | ||
affects.versions.add(v) | ||
vuln = Vulnerability( | ||
id=id, | ||
source=VulnerabilitySource(name="National Vulnerability Database", url=link), | ||
#todo: dummy date | ||
published=datetime.fromisoformat(date), | ||
updated=datetime.fromisoformat(date), | ||
description="", | ||
recommendation="" | ||
) | ||
vuln.affects.add(affects) | ||
vulnerabilities.append(vuln) | ||
print(vuln) | ||
return vulnerabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to build a BOM here, or just return a simple data structure that captures the info scraped from the website?
If BOMs provide all you need then fine, but simplify where you can at this stage IMHO.
for metrics in cve["metrics"]["cvssMetricV31"]: | ||
#todo: do we need recommendations from NIST as well? | ||
relevant = {} | ||
relevant["source"] = metrics["source"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @tellison is this the kind of intermediate data structure you were thinking about for representing the data before populating it into the BOM itself? (I know this is on the nist side, i can eventually move the ojvg side to a similar thing as well)
resp_dict["description"] = description | ||
resp_dict["versions"] = extract_versions(cve["configurations"]) | ||
return resp_dict | ||
def extract_versions(cve_configs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tellison i think this is dubious, but i don't have a better way of finding an affects version. I'm open to suggestions here (i've basically manually parsed out the oracle jdk version, minus update, but we can special case that, and i'm assuming it's 1:1 with open jdk). There's code to extract it from openjvg, but they publish it at the top of the webpage, and the webpage can contain multiple cves, so i'm not sure that's the best place to get information
This PR is an MVP/POC level VDR Creation for Temurin.
The finished report is in
data/vdr.json
Data downloaded from NIST and OJVG is also saved in
data
directory, sometimes as intermediate representations, in order to allow for offline testing and report re-creation when the original site is unavailiable.Things which are not yet here:
Code organization:
cve_reporter
packagetests
directory