-
Notifications
You must be signed in to change notification settings - Fork 60
CIF Architecture Overview
This page presents an overview of the CIF architecture and explains how data moves through the system.
- How CIF fetches, parses and normalizes data
- How CIF post-processes data
- How CIF stores data
- How the CIF API allows data to be queried and submitted
- How CIF permissions data
- How CIF produces feeds of data
cif-worker
^ +
| |
ZMQ-PUB
| |
+ v
cif-smrt +---> apache2 <---> cif-starman <---> cif-router
^ + + ^
| | | |
HTTP HTTP
| | | |
+ v v +
client elasticsearch
cif-smrt is a service that runs every hour with a random start time within a thirty minute window. cif-smrt uses configuration files found in /etc/cif/rules/default
as the instructions to specify on what to download, how to parse and how to normalize.
- cif-smrt uses LWP::UserAgent to fetch the data
- cif-smrt uses RegEx, HTML::TableExtract, JSON::XS, XML:RSS, String::Tokenizer, and XML::LibXML to parse the data
- cif-smrt normalizes the data to a JSON data structure
- cif-smrt submits the JSON data structure to the CIF RESTful API interface
/etc/cif/rules/default/*.cfg
+
|
|
|
v
cif-smrt +---> apache2
+
|
|
|
v
cif-router
cif-worker is responsible for the post-processing of data; CIF ships with four post-processers:
- UrlResolver - extract the FQDN from a URL
- Resolver - resolve DNS records from a FQDN
- Spamhaus - query Spamhaus
- BGPWhitelist - create whitelisted CIDR ranges from IP addresses resolved from FQDNs tagged at "whitelist"
https://example.com/evil.htm +---> cif-worker
+
|
|
v
cif-router <----------+ example.com [lower confidence]
CIF uses ElasticSearch for it's data warehouse. ElasticSearch is a json document store where every field is indexed and searchable.
CIF uses Mojo::Base and Apache as the core for it's RESTful API (PSGI). The CIF API sits on top of the ElasticSearch API enforcing things like:
- User Permissions
- Data Limits
network +--> client +--> apache2 <--> cif-starman <--> cif-router
CIF stamps each record with a group id. CIF tokens (API keys) are associated with Groups and have read, write attributes. The CIF API ensures that users (API keys) are limited to only returning data it has been given read access to and limiting users from writing to the CIF data store.
The CIF SDK (client) is responsible for generating CIF feeds. The primary attributes of a feed are:
- Filtered by observable type (ipv4, fqdn, url, ipv6, email)
- De-duplicated or aggregated by observable
- Whitelisting data-sets applied
The CIF client makes a query to they CIF server to retrieve a overly broad data set and then reduces said data set by the attributes above before returning the data to the user.
Note: In an all-in-one CIF server where the CIF client is on the CIF server, all the processing is completed on a single host. In a distributed environment, the CIF client is able to reduce load on the CIF server by processing data on a separate client host.