Skip to content

Collect Mattermost #397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 8, 2022
Merged

Conversation

Hritik14
Copy link
Collaborator

@Hritik14 Hritik14 commented Mar 22, 2021

The PR will be is up for review by Mar 24.
Fixes #224

EDIT:
The code is not very elegant but so is the mattermost advisory page.
They have been trying to make advisories more formatted and standard but
at the time being, the current script should hold, hopefully in the future
too.

Remaining tasks

  • [ ] Troubleshoot duplicate entries
    - As there is no vulnerability id on the page,
    everytime the importer is run it duplicates all
    the vulnerabilities.
  • Write tests

Signed-off-by: Hritik Vijay hritikxx8@gmail.com

@Hritik14 Hritik14 changed the title [WIP] Collect Mattermost Collect Mattermost Mar 25, 2021
@Hritik14 Hritik14 marked this pull request as ready for review March 25, 2021 01:37
@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 25, 2021

@Hritik14 you mean there are vulnerabilities with MMSA-xxxxx but no cve ids ? In that case we can set vulnerability_id to mmsa and evolve the models as neccessary

@Hritik14
Copy link
Collaborator Author

@sbs2001 As vulnerability_id is by definition designed only for cve and automated vulcon ids, adding MMSA-xxxx will raise errors here
https://github.com/nexB/vulnerablecode/blob/bfa005ece230ada9032c3c0b4a7bab7b59673e0b/vulnerabilities/data_source.py#L92-L94

Further, even if we stretch the is_cve check, mattermost has had advisory ids as 5.15.0.1 or na which would surely not fit in our current model.
Even worse, some MMSA-xxx are repeated.

@Hritik14
Copy link
Collaborator Author

@sbs2001 any update?
As I can think, this problem would exist in all the http based importers. Currently there's no way to know if a particular vulnerability is new if it doesn't contain an id.
As the following function already receives the advisory
https://github.com/nexB/vulnerablecode/blob/adf86fdc2a42f93a19f71f40c2ba7350d9d7c220/vulnerabilities/import_runner.py#L233-L245
Maybe we could assign the the vulcoid not based on the current time but on a certain hash of the given advisory. This would also improve consistency among multiple installations and would solve our duplicacy problem as well.
In anyway, imho this appears to be an issue of its own and can be overlooked here for now.

@Hritik14 Hritik14 force-pushed the collect_mattermost branch from 3e914a1 to ec4ee48 Compare March 29, 2021 05:54
@Hritik14
Copy link
Collaborator Author

Rebased.
@sbs2001 any update?

@pombredanne
Copy link
Member

Some comments, as you are raising interesting issues:

  1. if there is no CVE, we want to use a reference for the MMSA-xxx
  2. it is OK for these to be duplicated then since this is only a reference
  3. we need to have a way to ensure that we can avoid creating dupes on our side in these cases

@Hritik14
Copy link
Collaborator Author

@pombredanne

  1. Yes, I'm not leaving them completely. They go to the references.
  2. By duplicated, I meant every time you run the importer, the Vulnerabilities get duplicated. As there is no id for them, the vulcoid is generated fresh for all of them with different timestamps. If we could use the hashing technique I mentioned above, we could avoid the repetition.
  3. Same as 2. Hashing the advisories to create a vulcoid.

@pombredanne
Copy link
Member

pombredanne commented Mar 29, 2021

2. By duplicated, I meant every time you run the importer, the Vulnerabilities get duplicated. As there is no id for them, the vulcoid is generated fresh for all of them with different timestamps. If we could use the hashing technique I mentioned above, we could avoid the repetition.

rather than hashing (which is in someway a shortcut and an optimization) what would be a way to search for a possible pre-existing vulnerability to avoid creating a dupe? What would be the (existing) attributes and data that could be used to uniquely identify a vulnerability in this case?

@Hritik14
Copy link
Collaborator Author

which is in someway a shortcut and an optimization

What's wrong with an optimization? Using the advisory hash as vulcoid would ensure consistency across multiple installations and avoid duplicates in same installation as well.

What would be the (existing) attributes and data that could be used to uniquely identify a vulnerability in this case?

We could use the impacted packages, the package it was fixed in and the description to uniquely identify a vulnerability. Most of it is present in the Advisory.

https://github.com/nexB/vulnerablecode/blob/3688d89a88014f2e04475fda0179362e9ab54eae/vulnerabilities/data_source.py#L75-L90

Also, I think this deserves it's own place in the issue tracker.

@pombredanne
Copy link
Member

which is in someway a shortcut and an optimization

What's wrong with an optimization? Using the advisory hash as vulcoid would ensure consistency across multiple installations and avoid duplicates in same installation as well.

An optimization may hide the real thing when applied too early. Here what I am trying to get from you is what are the attributes that you would use to determine equality and identity. See also https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize

The other problem with using a content-based identifier (such as a hash of some fields content as a vulcoid) is that the id MAY change as the content change and this at the possible worst time: say you use a hash(impacted packages, fixed package, description) and initially there is no fixed package... later you come back and there is one now but the id would have changed with this change hence you would no longer be able to use this hash-based id for lookup just when you would need this to add the fixed package to an existing vulnerability.

We could use the impacted packages, the package it was fixed in and the description to uniquely identify a vulnerability. Most of it is present in the Advisory.

Therefore you do not need much of a hash for now, you can instead just use these set if items for identity: impacted packages, the package it was fixed in and the description. And you can be flexible if using all of these or only some in some cases.

@Hritik14
Copy link
Collaborator Author

Yes, you're right. We should come to the optimization after developing a proper structure.

you can instead just use these set if items for identity: impacted packages, the package it was fixed in and the description. And you can be flexible if using all of these or only some in some cases.

Considering this, if a change happens in the advisory, say in fixed packages, then when we re-encounter the same advisory we'd have 2 choices

  1. Treat it as completely new as some of the identifying keys have changed
  2. Update a previous one if we have a considerable high amount of match.

In the second case, we would need to define what a considerable high match is. That could be done at the importer side as it could be either description which is expected to change or the impacted packages. If both change and we don't have a non-vulcoid vul id, we must create a new one (case 1).
Also, if we store the URL of the vulnerability source as well, we could add that to what uniquely identifies a vulnerability.

Copy link
Collaborator

@sbs2001 sbs2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You weren't kidding when you said code is not very elegant

@sbs2001
Copy link
Collaborator

sbs2001 commented Mar 31, 2021

@pombredanne

what would be a way to search for a possible pre-existing vulnerability to avoid creating a dupe?

A crude approach
https://github.com/sbs2001/vulnerablecode/blob/8717f5f22552821ca170326328c343e6c16aa147/vulnerabilities/import_runner.py#L276

Hritik14 added 4 commits April 2, 2021 01:09
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
The code is not very elegant but so is the mattermost advisory page.
They have been trying to make advisories more formatted and standard but
at the time being, the current script should hold, hopefully in future
too.

Remaining tasks
	- [] Troubleshoot duplicate entries
		- As there is no vulnerability id  on the page,
		  everytime the importer is run it duplicates all
		  the vulnerabilities.
	- [] Write tests

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14 Hritik14 force-pushed the collect_mattermost branch from 13d4421 to 590bb91 Compare April 1, 2021 19:40
@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 2, 2021

@Hritik14 don't worry about duplicate vulcoids for now.

@pombredanne
Copy link
Member

@Hritik14 do you mind to rebase?

@pombredanne
Copy link
Member

@Hritik14 do you mind to rebase or pull to resolve the conflict?

@pombredanne
Copy link
Member

we would also need a test to merge this.

@pombredanne pombredanne added this to the v30.0 milestone Feb 2, 2022
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this first and fix it later.
Please enter an issue to track that

@pombredanne pombredanne dismissed sbs2001’s stale review February 8, 2022 10:32

We are merging first and will fix issues afterward.

@Hritik14 Hritik14 merged commit dd3c0e2 into aboutcode-org:main Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collect Mattermost security updates
3 participants