-
-
Notifications
You must be signed in to change notification settings - Fork 234
Collect Mozilla #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect Mozilla #393
Changes from all commits
e374aea
cdbb989
27423e7
c0faf67
59b420a
d1f8315
7aad267
df9786d
d7ced8b
05d93b2
aed84bc
c60ab41
58bcdb2
5997b26
e355706
62a2789
e6d652c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,3 +17,5 @@ lxml>=4.6.4 | |
gunicorn>=20.1.0 | ||
django-environ==0.4.5 | ||
defusedxml==0.7.1 | ||
|
||
Markdown==3.3.4 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
import re | ||
from typing import List | ||
from typing import Set | ||
|
||
import yaml | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from bs4 import BeautifulSoup | ||
from markdown import markdown | ||
from packageurl import PackageURL | ||
|
||
from vulnerabilities.importer import Advisory | ||
from vulnerabilities.importer import GitImporter | ||
from vulnerabilities.importer import Reference | ||
from vulnerabilities.importer import VulnerabilitySeverity | ||
from vulnerabilities.helpers import is_cve | ||
from vulnerabilities.helpers import split_markdown_front_matter | ||
from vulnerabilities.severity_systems import SCORING_SYSTEMS | ||
|
||
REPOSITORY = "mozilla/foundation-security-advisories" | ||
MFSA_FILENAME_RE = re.compile(r"mfsa(\d{4}-\d{2,3})\.(md|yml)$") | ||
pombredanne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
class MozillaImporter(GitImporter): | ||
def __enter__(self): | ||
super(MozillaImporter, self).__enter__() | ||
|
||
if not getattr(self, "_added_files", None): | ||
self._added_files, self._updated_files = self.file_changes( | ||
recursive=True, subdir="announce" | ||
) | ||
|
||
def updated_advisories(self) -> Set[Advisory]: | ||
files = self._updated_files.union(self._added_files) | ||
files = [ | ||
f for f in files if f.endswith(".md") or f.endswith(".yml") | ||
] # skip irrelevant files | ||
|
||
advisories = [] | ||
for path in files: | ||
advisories.extend(to_advisories(path)) | ||
|
||
return self.batch_advisories(advisories) | ||
|
||
|
||
def to_advisories(path: str) -> List[Advisory]: | ||
""" | ||
Convert a file to corresponding advisories. | ||
This calls proper method to handle yml/md files. | ||
""" | ||
mfsa_id = mfsa_id_from_filename(path) | ||
if not mfsa_id: | ||
return [] | ||
|
||
with open(path) as lines: | ||
if path.endswith(".md"): | ||
return get_advisories_from_md(mfsa_id, lines) | ||
if path.endswith(".yml"): | ||
return get_advisories_from_yml(mfsa_id, lines) | ||
|
||
return [] | ||
|
||
|
||
def get_advisories_from_yml(mfsa_id, lines) -> List[Advisory]: | ||
pombredanne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
advisories = [] | ||
data = yaml.safe_load(lines) | ||
data["mfsa_id"] = mfsa_id | ||
|
||
fixed_package_urls = get_package_urls(data.get("fixed_in")) | ||
references = get_yml_references(data) | ||
|
||
if not data.get("advisories"): | ||
return [] | ||
|
||
for cve, advisory in data["advisories"].items(): | ||
# These may contain HTML tags | ||
summary = BeautifulSoup(advisory.get("description", ""), features="lxml").get_text() | ||
|
||
advisories.append( | ||
Advisory( | ||
summary=summary, | ||
vulnerability_id=cve if is_cve(cve) else "", | ||
impacted_package_urls=[], | ||
resolved_package_urls=fixed_package_urls, | ||
references=references, | ||
) | ||
) | ||
|
||
return advisories | ||
|
||
|
||
def get_advisories_from_md(mfsa_id, lines) -> List[Advisory]: | ||
yamltext, mdtext = split_markdown_front_matter(lines.read()) | ||
data = yaml.safe_load(yamltext) | ||
pombredanne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
data["mfsa_id"] = mfsa_id | ||
|
||
fixed_package_urls = get_package_urls(data.get("fixed_in")) | ||
references = get_yml_references(data) | ||
cves = re.findall(r"CVE-\d+-\d+", yamltext + mdtext, re.IGNORECASE) | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for cve in cves: | ||
references.append( | ||
Reference( | ||
reference_id=cve, | ||
url=f"https://cve.mitre.org/cgi-bin/cvename.cgi?name={cve}", | ||
) | ||
) | ||
|
||
description = html_get_p_under_h3(markdown(mdtext), "description") | ||
|
||
return [ | ||
Advisory( | ||
summary=description, | ||
vulnerability_id="", | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
impacted_package_urls=[], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can you get no impacted packages and fixed packages? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upstream doesn't provide with a list of impacted package. We might consider all packages until this package as impacted but I'm not sure about this. There are many other importers that do this too. Eg: https://github.com/nexB/vulnerablecode/blob/f254b0d4ac54b70c648055a7e8eda16c05dce0f9/vulnerabilities/importers/alpine_linux.py#L194 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point. We need a ticket though, as this may need to be interpreted as "all versions before this version are vulnerable" and it warrant some research and discussions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ouch, is this going to be redundant now, just like the suse_backport importer. We now have a improved notion of fixed/resolved/patched package now. Due to #436 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to merge this sooner than later. ... See also #449 (comment) |
||
resolved_package_urls=fixed_package_urls, | ||
references=references, | ||
) | ||
] | ||
|
||
|
||
def html_get_p_under_h3(html, h3: str): | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
soup = BeautifulSoup(html, features="lxml") | ||
h3tag = soup.find("h3", text=lambda txt: txt.lower() == h3) | ||
p = "" | ||
if h3tag: | ||
for tag in h3tag.next_siblings: | ||
if tag.name: | ||
if tag.name != "p": | ||
break | ||
p += tag.get_text() | ||
return p | ||
|
||
|
||
def mfsa_id_from_filename(filename): | ||
match = MFSA_FILENAME_RE.search(filename) | ||
if match: | ||
return "mfsa" + match.group(1) | ||
|
||
return None | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
def get_package_urls(pkgs: List[str]) -> List[PackageURL]: | ||
package_urls = [ | ||
PackageURL( | ||
type="mozilla", | ||
pombredanne marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# pkg is of the form "Firefox ESR 1.21" or "Thunderbird 2.21" | ||
name=pkg.rsplit(None, 1)[0], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this split and the one below are rather cryptic, magic and likely error prone, please use a plain loop rather than a list comprehension and try to find a more readable and explicit way to do things. partition and rpartition are usually more robust and easier to grok. Also avoid doing the multiple time the same op (here for the name and version) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've updated it to use a for loop. I've used rsplit to be consistent with the upstream. This is how it looks now: def get_package_urls(pkgs: List[str]) -> List[PackageURL]:
for pkg in pkgs:
if pkg:
# pkg is of the form "Firefox ESR 1.21" or "Thunderbird 2.21"
name, version = pkg.rsplit(None, 1)
package_urls = [
PackageURL(
type="mozilla",
name=name,
version=version,
)
]
return package_urls
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rpartition is a more robust splitter than rsplit. Your rsplit will fail when there are no spaces
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there's no space, name, _, version = pkg.rpartition(' ')
if not name:
name,version = version,name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this works fine. Or you could do this slightly more explicit:
|
||
version=pkg.rsplit(None, 1)[1], | ||
) | ||
for pkg in pkgs | ||
if pkg | ||
] | ||
return package_urls | ||
|
||
|
||
def get_yml_references(data: any) -> List[Reference]: | ||
""" | ||
Returns a list of references | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Currently only considers the given mfsa as a reference | ||
""" | ||
# FIXME: Needs improvement | ||
# Should we add 'bugs' section in references too? | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Should we add 'impact'/severity of CVE in references too? | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# If yes, then fix alpine_linux importer as well | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Otherwise, do we need severity field for adversary as well? | ||
|
||
severities = ["critical", "high", "medium", "low", "none"] | ||
severity = "none" | ||
if data.get("impact"): | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
impact = data.get("impact").lower() | ||
for s in severities: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure I get why you do things this way. What would be data examples? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are all unique impacts
I did that to make look like cvssv3.1_qr but now as we have generic_textual (#415) as well, perhaps I can just put the entire impact line over there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just noticed that entries like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is at best an artistic scale.... I would like to see what it looks like for recent values, say no less than 5 years old as this will weed out old things like FF 2.0 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's create a scoring system for mozilla using these values:
in https://github.com/nexB/vulnerablecode/blob/main/vulnerabilities/severity_systems.py |
||
if s in impact: | ||
severity = s | ||
break | ||
|
||
return [ | ||
Reference( | ||
reference_id=data["mfsa_id"], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Create a variable for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the Is there something wrong with this approach? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why pass down a mapping when you can pass named arguments? def get_reference(mfsa_id, impact):
"""
Return a reference for tan mfsa and is impact value.
"""
known_impacts = set(["critical", "high", "medium", "low", "none"])
impact = impact or ''
impact = impact.lower()
severity = known_impacts.get(impact)
severities=[]
if severity:
severities = [VulnerabilitySeverity(scoring_systems["generic_textual"], severity)]
return Reference(
reference_id=mfsa_id,
url=f"https://www.mozilla.org/en-US/security/advisories/{mfsa_id}",
severities=severities,
) Also do not return a list when you return a single value. I appreciate planning for the future, but that's usually not needed. We can always refactor later if needed. And add other impacts that may be funky to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. def get_yml_references(mfsa_id, impact, advisories) -> List[Reference]:
"""
Return a list of references
Considers the following as references:
- mfsa
- cve
- bugzilla or direct reference
"""
known_impacts = {
"severe": "critical",
"critical": "critical",
"critical (local)": "critical",
"high": "high",
"moderate": "moderate",
"medium": "moderate",
"moderate (on a multiuser computer)": "moderate",
"low": "low",
"critical (firefox 2.0 not affected in default configuration)": "critical",
"low (critical in gecko 1.9.1 and earlier)": "low",
"moderate to critical": "moderate",
}
impact = impact or ''
impact = impact.lower()
severity = known_impacts.get(impact)
severities = []
if severity:
severities=[VulnerabilitySeverity(scoring_systems["generic_textual"], severity)]
else:
warnings.warn(f'{mfsa_id}: unknown severity "{severity}"')
# Add mfsa as reference
references = [
Reference(
reference_id=mfsa_id,
url=f"https://www.mozilla.org/en-US/security/advisories/{mfsa_id}",
severities=severities,
)
]
advisories = advisories or {}
for advisory in advisories:
# Add advisory cve as reference
if is_cve(advisory):
references.append(
Reference(
reference_id=advisory,
url=f"https://cve.mitre.org/cgi-bin/cvename.cgi?name={advisory}",
)
)
advisory_data = advisories[advisory]
# Some description contain cve reference too, eg mfsa2017-12, mfsa2019-01. Add them
# FIXME: Replace after https://github.com/nexB/vulnerablecode/pull/439 is merged
cves = set(re.findall(r"CVE-\d+-\d+", advisory_data["description"]))
for cve in cves:
references.append(
Reference(
reference_id=cve,
url=f"https://cve.mitre.org/cgi-bin/cvename.cgi?name={cve}",
)
)
# Add "bugs" as reference
bugs = advisory_data.get("bugs", {})
for bug in bugs:
# bug_ids could be a single int like 1584216 or a str like 1584216,1584218
urls = str(bug.get("url", "")).split(",")
for url in urls:
url = url.strip()
if url[:4].lower() == "http":
references.append(Reference(url=url))
else:
references.append(
Reference(
reference_id=url,
url=f"https://bugzilla.mozilla.org/show_bug.cgi?id={url}",
)
)
return references
I would be needing the references list as now I've also added the bugs and cve itself as a reference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you pushed these updates? |
||
url="https://www.mozilla.org/en-US/security/advisories/{}".format(data["mfsa_id"]), | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
severities=[VulnerabilitySeverity(scoring_systems["generic_textual"], severity)], | ||
) | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
import os | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add add some true unit tests too for each of your functions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. Replacing these with true unit tests There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant add unit tests, not replace these entirely. Both are useful |
||
import shutil | ||
import tempfile | ||
import zipfile | ||
from unittest.mock import patch | ||
|
||
from django.test import TestCase | ||
|
||
from vulnerabilities import models | ||
from vulnerabilities.import_runner import ImportRunner | ||
from vulnerabilities.importers.npm import categorize_versions | ||
|
||
BASE_DIR = os.path.dirname(os.path.abspath(__file__)) | ||
TEST_DATA = os.path.join(BASE_DIR, "test_data/") | ||
|
||
|
||
@patch("vulnerabilities.importers.MozillaImporter._update_from_remote") | ||
class MozillaImportTest(TestCase): | ||
|
||
tempdir = None | ||
|
||
@classmethod | ||
def setUpClass(cls) -> None: | ||
cls.tempdir = tempfile.mkdtemp() | ||
zip_path = os.path.join(TEST_DATA, "mozilla.zip") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please avoid zip fixtures unless really needed, e.g. extremely rarely if ever There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was inspired by npm and rust test cases. Removing it now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we have zip in npm and rust we need to replace them too then... Do you mind to create a ticket for this> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created #442 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Hritik14 npm and rust are for git ... yours are just plain text, so please do not zip here. |
||
|
||
with zipfile.ZipFile(zip_path, "r") as zip_ref: | ||
zip_ref.extractall(cls.tempdir) | ||
|
||
cls.importer = models.Importer.objects.create( | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
name="mozilla_unittests", | ||
license="", | ||
last_run=None, | ||
data_source="MozillaImporter", | ||
data_source_cfg={ | ||
"repository_url": "https://example.git", | ||
"working_directory": os.path.join(cls.tempdir, "mozilla_test"), | ||
"create_working_directory": False, | ||
"remove_working_directory": False, | ||
}, | ||
) | ||
|
||
@classmethod | ||
def tearDownClass(cls) -> None: | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Make sure no requests for unexpected package names have been made during the tests. | ||
shutil.rmtree(cls.tempdir) | ||
|
||
def test_import(self, _): | ||
runner = ImportRunner(self.importer, 100) | ||
|
||
# Remove if we don't need set_api in MozillaImporter | ||
# with patch("vulnerabilities.importers.MozillaImporter.versions", new=MOCK_VERSION_API): | ||
# with patch("vulnerabilities.importers.MozillaImporter.set_api"): | ||
# runner.run() | ||
runner.run() | ||
|
||
assert models.Vulnerability.objects.count() == 9 | ||
assert models.VulnerabilityReference.objects.count() == 10 | ||
assert models.VulnerabilitySeverity.objects.count() == 9 | ||
assert models.PackageRelatedVulnerability.objects.filter(is_vulnerable=False).count() == 16 | ||
|
||
assert models.Package.objects.count() == 12 | ||
|
||
self.assert_for_package("Firefox ESR", "mfsa2021-06", "78.7.1") | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.assert_for_package("Firefox ESR", "mfsa2021-04", "78.7", "CVE-2021-23953") | ||
self.assert_for_package("Firefox for Android", "mfsa2021-01", "84.1.3", "CVE-2020-16044") | ||
self.assert_for_package("Thunderbird", "mfsa2014-30", "24.4") | ||
self.assert_for_package("Thunderbird", "mfsa2014-30", "24.4") | ||
self.assert_for_package("Mozilla Suite", "mfsa2005-29", "1.7.6") | ||
|
||
def assert_for_package( | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self, | ||
package_name, | ||
mfsa_id, | ||
resolved_version, | ||
vulnerability_id=None, | ||
impacted_version=None, | ||
): | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
vuln = None | ||
|
||
pkg = models.Package.objects.get(name=package_name, version=resolved_version) | ||
vuln = pkg.vulnerabilities.first() | ||
|
||
if vulnerability_id: | ||
assert vuln.vulnerability_id == vulnerability_id | ||
|
||
ref_url = f"https://www.mozilla.org/en-US/security/advisories/{mfsa_id}" | ||
assert models.VulnerabilityReference.objects.get(url=ref_url, vulnerability=vuln) | ||
|
||
assert models.PackageRelatedVulnerability.objects.filter( | ||
package=pkg, vulnerability=vuln, is_vulnerable=False | ||
) | ||
|
||
|
||
def test_categorize_versions_ranges(): | ||
# Populate if impacted version is filled | ||
pass | ||
Hritik14 marked this conversation as resolved.
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this instead?
For instance:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not been using
saneyaml
anywhere in the project. If that is preferred, I can add it to requirements and we can start using that but I don't see any specific reason to do so here.Regarding
rpartition
. We cannot have that either as a markdown is allowed to have a"\n---"\n
.Consider this
The official validator by mozilla parses it something like this
https://github.com/mozilla/foundation-security-advisories/blob/d43d09d204ab5da014e83b7d1743df289cefee92/check_advisories.py#L183-L208
Further, as it is a helper, we cannot have it mozilla specific. All it should do is to split the front matter and markdown.
I could have something like this
which prints
but doesn't look any better to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit more readable may be?
We could very much reuse it as-is as well. This is under an MPL license so we would have to do it in an orderly fashion with proper license tracking and code separation though.
If we do not copy it, we cannot reuse code from it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is required by multiple importers, let's move this to it's own PR. Opened #443