-
Notifications
You must be signed in to change notification settings - Fork 4
Where do I start Confidence
Confidence details the degree of certainty of a given observation. For instance:
- I am 80% confident that on
2015-03-20T00:00:01Z
example.com is dropping malware - I am 90% confident that partner-1's observation that
http://example.com/1.html
on2015-03-20T00:01:01Z
was being used as a phishing url - I am 100% confident that tinyurl.com was observed in a piece of unsolicited commercial email (eg: spam).
One of the primary use cases for confidence is in the generation of threat intelligence feeds. For example, You may want to generate a de-duplicated feed of indicators seen within the last seven days with a confidence of 3.5 or higher to be used in a network sensor. While judging confidence may be subjective; there's one simple pattern that can narrow down the answer rather quickly:
- would you trust the data author with root access on your firewall to block something? if no, it's not a 3 or higher.
- is there a better than 50/50 chance (a coin flip) that there's something suspect about the data? if yes. it's a 2 or higher. if no, it's less than a 1 and almost does not matter.
From there, you can very easily get to a 3 or 4 depending on your risk tolerance. With the WDIS Feeds concept, whitelists are used to help further reduce the risk of blocking something like google.com. With that, generally a 3 or 4 is OK as long as the feeds are extremely specific about the risk (eg: ipv4|ipv6 addresses have a port-list, protocol and timestamp associated with them).
- highly vetted data by known, trusted security professionals
- vetting relationship has been consistent for more than 2 years
- very specific data (eg: ip+port+protocol, or a specific url, or malware hash)
- can typically be used via traffic mitigation processes (null-routing, firewall DROP, etc) with very little risk in collateral damage.
- vetted data by known, trusted security professionals
- data that has been vetted by a human or set of known and proven processes
- vetting relationship has been consistent and in-place for at-least 1 year
- data feed has been observed for at-least a year
- data should be highly specific (eg: port/protocols, prefixes should be as narrow as possible)
- can typically be used via traffic mitigation processes (null-routing, firewall DROP, etc) with very little risk in collateral damage.
- semi-vetted data by a security professional or trusted analytics process
- data that has under-gone some either machine or human vetting (eg: checked against a whitelist automatically)
- could be leveraged in traffic mitigation processes (eg: dns sink-holing), contains slight risk of collateral damage, but still severely mitigated by native whitelisting process.
- machine generated data or enumerated data
- some feeds might fall in the category if the author is lazy, or trying to cram too much into the feed
- examples might include a domains list where the author is simply taking a botnet urls list and posting just the domains as a feed
- carries risk when used in automatic mitigation processes
- machine generated / enumerated data
- examples include:
- auto-enumerated name-servers from domains
- infrastructure resolved from domain data
- carries significant risk when used in automatic mitigation processes