Skip to content
This repository has been archived by the owner on May 23, 2019. It is now read-only.

ParsingFeeds

Gabriel Iovino edited this page May 11, 2016 · 66 revisions

Introduction

CIF ships with many [Open-source Intelligence (OSINT) feeds preconfigured](with many Open-source Intelligence (OSINT) feeds preconfigured.). It is expected that additional feeds will be added to the pre-configured OSINT feeds. Additionally, read the tutorial on how to create a new feed config file.

Cif-smrt

CIF ships with a utility named cif-smrt. cif-smrt has two primary capabilities; fetching and parsing. cif-smrt has the ability to fetch files using http(s) and from the local file system. cif-smrt has the ability to parse files using the following built-in parsers: regex, json, xml, rss, html, text, cif.

cif-smrt is a service that processes any configuration files found in /etc/cif/rules/default/ with a file extension .yml. cif-smrt is configured to run hourly with a random 30 minute offset.

File Syntax

YAML is the syntax used to generate CIF feed configuration files for cif-smrt.

File Format

All parameters can be a Global parameter or a Feed parameter. If the parameter is specified twice, the Feed parameter will supersede the Global parameter.

# this is a template cif-smrt configuration file. the purpose of this file
# is to copy it to a newly named file and edit it as needed
#
# cp /etc/cif/rules/example/regex_example.yml /etc/cif/rules/default/filename.yml

# parser: instruct cif-smrt to use which type of parser
#   values: csv, pipe, regex, json, delim, rss, xml, html, text
parser: regex

# values within default apply to all feeds
defaults:

  # provider: short name of the source, normally the fqdn of the source URL
  provider: feeds.example.com

  # altid_tlp: traffic light protocol (TLP) of the alternet id
  #   (red, amber, green, white)
  altid_tlp: amber

  # tlp: traffic light protocol (TLP) of the observable
  #   (red, amber, green, white)
  tlp: amber

  # confidence: confidence in the observable (65,75,85,95)
  confidence: 75

# values within the friendly name apply only to that feed
feeds:
  # friendly name for feed
  regex_example:

    # remote: URL or filepath on host to feed source
    remote: https://feeds.example.com/scanners.csv

    # pattern: regex pattern to parse and capture the feed data
    pattern: '^(\S+),(\S+)$'

    # values: captured groups in the regex
    values:
      - observable
      - lasttime

    # tags: tag(s) describing the data (https://goo.gl/OCK8yc)
    tags:
      - scanner
      - suspicious

    # application: application associated with the identified port 
    #  (ssh, smtp, http, imap, ftp, sip, vnc, irc)
    application: ssh

    # portlist: Port or a hyphen seperated range of ports
    #  (22, 25, 6667-7000)
    portlist: 22

    # protocol: (tcp, udp)
    protocol: tcp

    # description: text description of the observable
    description: 'hosts seen scanning ssh servers'

Common Parameters

| Parameter Name | Values | Description | Required | |---|---|---|---|---| | parser | <string> | regex, csv, html, pipe, rss, delim, json, rss, text | no [default: regex] | | pattern | <string> | Perl regex with capturing | no | | values | <string> | Used with pattern, map; | no | | provider | <string> | Friendly name of entity providing the feed |yes | | remote | <string> | http(s) URL of feed | yes | | confidence | <int> | See Confidence | yes | | tags | <string> | See Tags | yes | | description | <string>| Text description | no | | group | <string> | everyone,staff,admin | yes | | tlp | <string> | white, green, amber, red | no | | altid | <string> | usually a url pointing to the original data point (as a reference id) | no | | altid_tlp | <string> | white, green, amber, red | no |

Text Files

parser: regex
defaults:
  tlp: amber
  provider: 'dshield.org'
  tags: scanner

feeds:
  scanners:
    remote: http://feeds.dshield.org/block.txt
    confidence: 75
    pattern: ^(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)\t\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b\t(\d+)
    values:
      - observable
      - mask
Parameter Name Values Description
pattern <string> a regex string that splits up a line feed
values - <value> nested series entry indicator that maps to the regex extracted values

Delimited Text Files

parser: delim
defaults:
  confidence: 85
  tlp: amber
  provider: malwaredomains.com

feeds:
  domains:
    remote: http://mirror3.malwaredomains.com/files/domains.zip
    pattern: '[\t|\f]'
    values:
      - null
      - null
      - observable
      - description
      - provider
      - null
    tags:
      - exploit
      - malware
Parameter Name Values Description
delimiter <string> a sudo-regex that splits up the feed
values - <value> nested series entry indicator that maps to the parsed columns

XML Files

parser: xml
defaults:
  confidence: 50
  tlp: amber
  provider: gist.githubusercontent.com-giovino

feeds:
  domains:
    remote: https://gist.githubusercontent.com/giovino/3584e069cfe0c04cb5ab/raw/481bf543dfbd6cc523778312a03b6f5d3f99ba21/gistfile1.xml
    node: root
    map:
      - assessment
      - address
    values:
      - tags
      - observable
Parameter Name Values Description
map - <value> nested series entry indicator of xml elements
values - <value> nested series entry indicator of xml element contents

JSON Files

parser: json
defaults:
  provider: phishtank.com
  tlp: amber
  application:
    - http
    - https
  confidence: 85
  tags: phishing
  protocol: tcp
  remote: http://data.phishtank.com/data/online-valid.json.gz
  altid_tlp: green

feeds:
  urls:
    otype: url
    map:
      - submission_time
      - url
      - target
      - phish_detail_url
      - details
    values:
      - lasttime
      - observable
      - description
      - altid
      - additional_data
Parameter Name Values Description
map - <value> nested series entry indicator of json keys
values - <value> nested series entry indicator of json values

More examples

Additional example feed configuration files can be found here.

Appendix

All Parameters

Parameter Name Values Description Queryable Required
adata <string> Additional data - string, json, csv no
altid <string> usually a url pointing to the original data point (as a reference id) no no
altid_tlp <string> white, green, amber, red no no
application <string> ? yes no
asn <string> Autonomous System Number yes no
asn_desc <string> Autonomous System Description no no
cc <string> Two Letter Country Code yes no
citycode <string> ? no no
confidence <int> See Confidence yes ?
content ? ? ? no
description <string> Text description yes no
disabled <string> Values: true, false no no
end <int> ? no no
firsttime ? ? yes ?
group ? ? yes ?
header ? ? no no
ignore ? ? no ?
lasttime ? ? yes ?
latitude double ? no ?
longitude double ? no ?
limit ? ? no ?
map ? ? no ?
mask ? ? no ?
metrocode ? ? no ?
node <string> XML node no no
null ? ? no ?
observable <string> IPv4, IPv6, FQDN, URI, Hash, Email address, Binary yes Yes
otype <string> IPv4, IPv6, FQDN, URI, Hash, Email address, Binary yes no
parser <string> default (?), csv, html, pipe, rss, delim, json, rss, text no ?
password ? ? no ?
pattern <string> Perl regex with capturing no no
peers ? ? no ?
portlist <int> 22 or 80,443 or 6660-7000 yes no
prefix ? ? no ?
protocol <int> <string> 1,6,17 or icmp, tcp, udp no no
provider <string> Friendly name of entity providing the feed yes yes
rank ? ? no ?
rdata ? ? yes ?
reference ? ? no ?
related ? ? no ?
remote <string> http(s) URL of feed no yes?
reporttime ? ? yes ?
rir ? ? no ?
skip <string> Regex patter of line to skip (/^<word>/) no no
start <int> ? no no
store_content <int> 0 = no, 1 = yes - used for text parsing do you want to store the line of text as additional data no no
subdivision ? ? no ?
tags <string> See Tags yes yes
timezone ? ? no ?
title ? ? no ?
tlp <string> white, green, amber, red no no
username ? ? ? ?
values <string> Used with pattern, map; no no
? ? ? ? ?

cif-smrt usage documentation

$ /opt/cif/bin/cif-smrt -h

Usage: /opt/cif/bin/cif-smrt [OPTIONS] [-D status|start|stop|restart|reload]

 Options:
    -C,  --config=FILE       specify cofiguration file, default: /etc/cif/cif-smrt.yml
    -d,  --debug             turn on debugging (max verbosity)
    -v+, --verbosity         turn up verbosity
    -h,  --help              this message
     
    -r, --rule=STRING       specify a rule or a rules directory, default: /etc/cif/rules/default
    -f, --feed=STRING       specify a feed (within a rule)
    -R, --remote=STRING     specify a remote to connect to, default http://localhost:5000
    -T, --token=STRING      specify a default token/apikey to use
    --not-before=STRING     specify a time to begin processing the data "[today|yesterday|X days ago]"
    
    --limit=INT             limit parsing to a subset of records (useful for debugging)
    
    --proxy                 specify a proxy address for cif-smrt to use in fetching feeds
    --https-proxy           specify a proxy for cif-smrt to use for feeds hosted on https
    
 Daemon Options:
    -D, --daemon            run as daemon
    -u, --user              run daemon as user, default: cif
    -g, --group             run daemon as group, default: cif
    -p, --pid               pidfile location, default: /var/run/smrt.pid
    
    --randomstart           random start delay, default: 30 min
    --interval              runtime interval, default: 60 min
    
    --testmode              run now, overrides randomstart
    
    --logfile:              logfile location, default: /var/log/cif-smrt.log
    --logging:              turn on logging [to file]
    
 Notification Options:
    --notify:               turn on notification, default: off.
    --notify-to:            default: root@localhost
    --notify-from:          default: cif
    --notify-subj:          default: [cif-smrt] ERROR
    --notify-level:         default: error
    
 Advanced Options:
    -M, --meta              apply metadata processors, default: 0
    -c, --clean             clear cache
    -P, --cache             cache location, default /var/smrt/cache

 Examples:
    /opt/cif/bin/cif-smrt -C /etc/cif/cif-smrt.yml
    /opt/cif/bin/cif-smrt -C /etc/cif/cif-smrt.yml -p /var/run/smrt.pid -D start
    /opt/cif/bin/cif-smrt -r /etc/cif/rules/default -D start
Clone this wiki locally