-
Notifications
You must be signed in to change notification settings - Fork 60
ParsingFeeds
CIF ships with many [Open-source Intelligence (OSINT) feeds preconfigured](with many Open-source Intelligence (OSINT) feeds preconfigured.). It is expected that additional feeds will be added to the pre-configured OSINT feeds. Additionally, read the tutorial on how to create a new feed config file.
CIF ships with a utility named cif-smrt. cif-smrt has two primary capabilities; fetching and parsing. cif-smrt has the ability to fetch files using http(s) and from the local file system. cif-smrt has the ability to parse files using the following built-in parsers: regex, json, xml, rss, html, text, cif.
cif-smrt is a service that processes any configuration files found in /etc/cif/rules/default/ with a file extension .yml. cif-smrt is configured to run hourly with a random 30 minute offset.
YAML is the syntax used to generate CIF feed configuration files for cif-smrt.
All parameters can be a Global parameter or a Feed parameter. If the parameter is specified twice, the Feed parameter will supersede the Global parameter.
# this is a template cif-smrt configuration file. the purpose of this file
# is to copy it to a newly named file and edit it as needed
#
# cp /etc/cif/rules/example/regex_example.yml /etc/cif/rules/default/filename.yml
# parser: instruct cif-smrt to use which type of parser
# values: csv, pipe, regex, json, delim, rss, xml, html, text
parser: regex
# values within default apply to all feeds
defaults:
# provider: short name of the source, normally the fqdn of the source URL
provider: feeds.example.com
# altid_tlp: traffic light protocol (TLP) of the alternet id
# (red, amber, green, white)
altid_tlp: amber
# tlp: traffic light protocol (TLP) of the observable
# (red, amber, green, white)
tlp: amber
# confidence: confidence in the observable (65,75,85,95)
confidence: 75
# values within the friendly name apply only to that feed
feeds:
# friendly name for feed
regex_example:
# remote: URL or filepath on host to feed source
remote: https://feeds.example.com/scanners.csv
# pattern: regex pattern to parse and capture the feed data
pattern: '^(\S+),(\S+)$'
# values: captured groups in the regex
values:
- observable
- lasttime
# tags: tag(s) describing the data (https://goo.gl/OCK8yc)
tags:
- scanner
- suspicious
# application: application associated with the identified port
# (ssh, smtp, http, imap, ftp, sip, vnc, irc)
application: ssh
# portlist: Port or a hyphen seperated range of ports
# (22, 25, 6667-7000)
portlist: 22
# protocol: (tcp, udp)
protocol: tcp
# description: text description of the observable
description: 'hosts seen scanning ssh servers'
| Parameter Name | Values | Description | Required | |---|---|---|---|---| | parser | <string> | regex, csv, html, pipe, rss, delim, json, rss, text | no [default: regex] | | pattern | <string> | Perl regex with capturing | no | | values | <string> | Used with pattern, map; | no | | provider | <string> | Friendly name of entity providing the feed |yes | | remote | <string> | http(s) URL of feed | yes | | confidence | <int> | See Confidence | yes | | tags | <string> | See Tags | yes | | description | <string>| Text description | no | | group | <string> | everyone,staff,admin | yes | | tlp | <string> | white, green, amber, red | no | | altid | <string> | usually a url pointing to the original data point (as a reference id) | no | | altid_tlp | <string> | white, green, amber, red | no |
parser: regex
defaults:
tlp: amber
provider: 'dshield.org'
tags: scanner
feeds:
scanners:
remote: http://feeds.dshield.org/block.txt
confidence: 75
pattern: ^(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)\t\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b\t(\d+)
values:
- observable
- mask
Parameter Name | Values | Description |
---|---|---|
pattern | <string> | a regex string that splits up a line feed |
values | - <value> |
nested series entry indicator that maps to the regex extracted values |
parser: delim
defaults:
confidence: 85
tlp: amber
provider: malwaredomains.com
feeds:
domains:
remote: http://mirror3.malwaredomains.com/files/domains.zip
pattern: '[\t|\f]'
values:
- null
- null
- observable
- description
- provider
- null
tags:
- exploit
- malware
Parameter Name | Values | Description |
---|---|---|
delimiter | <string> | a sudo-regex that splits up the feed |
values | - <value> |
nested series entry indicator that maps to the parsed columns |
parser: xml
defaults:
confidence: 50
tlp: amber
provider: gist.githubusercontent.com-giovino
feeds:
domains:
remote: https://gist.githubusercontent.com/giovino/3584e069cfe0c04cb5ab/raw/481bf543dfbd6cc523778312a03b6f5d3f99ba21/gistfile1.xml
node: root
map:
- assessment
- address
values:
- tags
- observable
Parameter Name | Values | Description |
---|---|---|
map | - <value> |
nested series entry indicator of xml elements |
values | - <value> |
nested series entry indicator of xml element contents |
parser: json
defaults:
provider: phishtank.com
tlp: amber
application:
- http
- https
confidence: 85
tags: phishing
protocol: tcp
remote: http://data.phishtank.com/data/online-valid.json.gz
altid_tlp: green
feeds:
urls:
otype: url
map:
- submission_time
- url
- target
- phish_detail_url
- details
values:
- lasttime
- observable
- description
- altid
- additional_data
Parameter Name | Values | Description |
---|---|---|
map | - <value> |
nested series entry indicator of json keys |
values | - <value> |
nested series entry indicator of json values |
Additional example feed configuration files can be found here.
Parameter Name | Values | Description | Queryable | Required |
---|---|---|---|---|
adata | <string> | Additional data - string, json, csv | no | |
altid | <string> | usually a url pointing to the original data point (as a reference id) | no | no |
altid_tlp | <string> | white, green, amber, red | no | no |
application | <string> | ? | yes | no |
asn | <string> | Autonomous System Number | yes | no |
asn_desc | <string> | Autonomous System Description | no | no |
cc | <string> | Two Letter Country Code | yes | no |
citycode | <string> | ? | no | no |
confidence | <int> | See Confidence | yes | ? |
content | ? | ? | ? | no |
description | <string> | Text description | yes | no |
disabled | <string> | Values: true, false | no | no |
end | <int> | ? | no | no |
firsttime | ? | ? | yes | ? |
group | ? | ? | yes | ? |
header | ? | ? | no | no |
ignore | ? | ? | no | ? |
lasttime | ? | ? | yes | ? |
latitude | double | ? | no | ? |
longitude | double | ? | no | ? |
limit | ? | ? | no | ? |
map | ? | ? | no | ? |
mask | ? | ? | no | ? |
metrocode | ? | ? | no | ? |
node | <string> | XML node | no | no |
null | ? | ? | no | ? |
observable | <string> | IPv4, IPv6, FQDN, URI, Hash, Email address, Binary | yes | Yes |
otype | <string> | IPv4, IPv6, FQDN, URI, Hash, Email address, Binary | yes | no |
parser | <string> | default (?), csv, html, pipe, rss, delim, json, rss, text | no | ? |
password | ? | ? | no | ? |
pattern | <string> | Perl regex with capturing | no | no |
peers | ? | ? | no | ? |
portlist | <int> | 22 or 80,443 or 6660-7000 | yes | no |
prefix | ? | ? | no | ? |
protocol | <int> <string> | 1,6,17 or icmp, tcp, udp | no | no |
provider | <string> | Friendly name of entity providing the feed | yes | yes |
rank | ? | ? | no | ? |
rdata | ? | ? | yes | ? |
reference | ? | ? | no | ? |
related | ? | ? | no | ? |
remote | <string> | http(s) URL of feed | no | yes? |
reporttime | ? | ? | yes | ? |
rir | ? | ? | no | ? |
skip | <string> | Regex patter of line to skip (/^<word>/) | no | no |
start | <int> | ? | no | no |
store_content | <int> | 0 = no, 1 = yes - used for text parsing do you want to store the line of text as additional data | no | no |
subdivision | ? | ? | no | ? |
tags | <string> | See Tags | yes | yes |
timezone | ? | ? | no | ? |
title | ? | ? | no | ? |
tlp | <string> | white, green, amber, red | no | no |
username | ? | ? | ? | ? |
values | <string> | Used with pattern, map; | no | no |
? | ? | ? | ? | ? |
$ /opt/cif/bin/cif-smrt -h
Usage: /opt/cif/bin/cif-smrt [OPTIONS] [-D status|start|stop|restart|reload]
Options:
-C, --config=FILE specify cofiguration file, default: /etc/cif/cif-smrt.yml
-d, --debug turn on debugging (max verbosity)
-v+, --verbosity turn up verbosity
-h, --help this message
-r, --rule=STRING specify a rule or a rules directory, default: /etc/cif/rules/default
-f, --feed=STRING specify a feed (within a rule)
-R, --remote=STRING specify a remote to connect to, default http://localhost:5000
-T, --token=STRING specify a default token/apikey to use
--not-before=STRING specify a time to begin processing the data "[today|yesterday|X days ago]"
--limit=INT limit parsing to a subset of records (useful for debugging)
--proxy specify a proxy address for cif-smrt to use in fetching feeds
--https-proxy specify a proxy for cif-smrt to use for feeds hosted on https
Daemon Options:
-D, --daemon run as daemon
-u, --user run daemon as user, default: cif
-g, --group run daemon as group, default: cif
-p, --pid pidfile location, default: /var/run/smrt.pid
--randomstart random start delay, default: 30 min
--interval runtime interval, default: 60 min
--testmode run now, overrides randomstart
--logfile: logfile location, default: /var/log/cif-smrt.log
--logging: turn on logging [to file]
Notification Options:
--notify: turn on notification, default: off.
--notify-to: default: root@localhost
--notify-from: default: cif
--notify-subj: default: [cif-smrt] ERROR
--notify-level: default: error
Advanced Options:
-M, --meta apply metadata processors, default: 0
-c, --clean clear cache
-P, --cache cache location, default /var/smrt/cache
Examples:
/opt/cif/bin/cif-smrt -C /etc/cif/cif-smrt.yml
/opt/cif/bin/cif-smrt -C /etc/cif/cif-smrt.yml -p /var/run/smrt.pid -D start
/opt/cif/bin/cif-smrt -r /etc/cif/rules/default -D start