Skip to content
forked from arkime/arkime

Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.

License

Notifications You must be signed in to change notification settings

nakominosu/arkime

Repository files navigation

Moloch is an open source large scale IPv4 PCAP capturing, indexing and database 
system.  A simple web interface is provided for PCAP browsing, searching, and 
exporting.  APIs are exposed that allow PCAP data and JSON formatted session
data to be downloaded directly. Simple security is implemented by using https
and HTTP digest password support.  Moloch is not meant to replace IDS engines but 
instead work along side them to store and index all the network traffic in standard
PCAP format, providing fast access.  Moloch is built to be deployed across many 
machines and can scale to handle multiple gigabits/sec of traffic.



IMPATIENT - Use ./easybutton-singlehost.sh to build a complete single machine 
moloch system.  Good for a demo and can be used as a starting point for a
real production deployment.



The Moloch system is composed of 3 components

1) capture - A single threaded C application that runs per network interface.  
             It is possible to run multiple capture processes per machine if
             there are multiple interfaces to monitor.
2) viewer - A node.js application that runs per capture machine and handles
            the web interface and transfer of PCAP files.
3) elasticsearch - The database/search technology powering Moloch


Wiki/FAQ - https://github.com/aol/moloch/wiki


Hardware requirements - Moloch is built to run across many machines for large
deployments.  What follows are rough guidelines for folks capturing large amounts of
data with high bit rates, obviously tailor for the situation.  It is not 
recommended to run the capture and elasticsearch processes on the same machines.

1) Moloch capture/viewer machines
   * One dedicated management network interface and CPU for OS
   * For each network interface being monitored recommend ~10G of memory 
     and another dedicated CPU
   * If running suricata or another IDS add an additional two (2) CPUs per interface, 
     and an additional 5G memory (or more depending on IDS requirements)
   * Disk space to store the PCAP files.  Recommend at least 10T, xfs 
     (with inode64 option set in fstab), raid 5, at least 4 spindles
   * If networks are highly utilized and running IDS then CPU affinity is required
   * disable swap by removing it from fstab

2) elasticsearch machines - some black magic here
   * 1/3 * Number_Highly_Utilized_Interfaces * Number_of_Days_of_History is a ROUGH
     guideline for number of elasticsearch instances (nodes) required. 
     (Example: 1/3 * 8 interfaces * 7 days = 18 nodes)
   * Each elasticsearch node should have ~30G-40G memory (20G - 30G [no more!] for 
     the java process, at least 10G for the OS disk cache)
   * Can have multiple nodes per machine (Example 64G machine can have 2 ES nodes, 22G for the
     java process 10G saved for the disk cache)
   * disable swap by removing it from fstab
   * obviously the more nodes, the faster responses will be
   * can always add more nodes, but hard to remove nodes


Example config for monitoring 8 GigE highly utilized networks for 7 days
- capture/viewer machines - 8 x PenguinComputing Relion 4724 with 48G of memory and 40T of disk, running Moloch and suricata
- elasticsearch machines - 10 x HP DL380-G7 with 64G of memory and 2T of disk running 2 nodes each


Note about security
* Elasticsearch provides no security, so iptables should be used allowing only 
  Moloch machines to talk to the elasticsearch machines (9200-920N) and for them
  to mesh connect (9300-930N)
* Moloch machines should be locked down, however they need to talk to each other (8005),
  to the elasticsearch machines (9200 - 920N), and the web interface needs to be open (8005)
* Moloch viewer should be configured to use SSL, easiest to use a single cert 
  with multiple DNs, make sure you protect the cert on the filesystem
* It is possible to set up a Moloch viewer on a machine that doesn't capture any 
  data that gateways all requests
* A shared password in the config file is used to encrypt password hashes AND 
  for inter Moloch communication, this means you should protect the config file.  
  (Encrypted password hashes are used so a new password hash can not be
  inserted into elasticsearch directly in case elasticsearch isn't secured :)




Building and Installing

Moloch is a complex system to build and install.  The following are rough
guidelines, updates are always welcome.


Installing Elasticsearch - Tested with 0.19.9
1) Prep the elasticsearch machines by increasing max file descriptors.  On CentOS and others this is done 
   by adding the following to bottom of /etc/security/limits.conf
* hard nofile 128000
* soft nofile 128000
2) If this is a dedicated machine disable swap by commenting out the swap lines in /etc/fstab
   and either reboot or use the swapoff command
3) Download elasicsearch http://www.elasticsearch.org/download/ at this time
   all development is done with 0.19.9 http://www.elasticsearch.org/download/2012/08/23/0.19.9.html
5) Uncompress
6) Install bigdesk and elasticsearch-head BEFORE pushing to all machines
     cd elasticsearch-*
     bin/plugin -install mobz/elasticsearch-head
     bin/plugin -install lukas-vlcek/bigdesk
6) Create/Modify elasticsearch.yml and push to all machines.  See moloch/db/elasticsearch.yml.sample
   - set cluster.name to something unique
   - set node.name to "${ES_HOSTNAME}" 
   - set node.max_local_storage_nodes to number of nodes per machine
   - add "index.cache.field.type: soft"
   - set path.data and path.logs
   - set gateway.type: local
   - set gateway.recover_after_nodes should match the number of nodes you will run 
   - set gateway.expected_nodes: to the number of nodes you will run
   - disable zen.ping.multicast
   - enable zen.ping.unicast and set the list of hosts
7) Create a elasticsearch launch script or use one of ones out there.  See moloch/db/runes.sh.sample for a simple one
     Make sure you call "ulimit -a" first 
     set ES_HEAP_SIZE=20G (or whatever number you are using, less then 32G) 
     set JAVA_OPTS="-XX:+UseCompressedOops"
     set ES_HOSTNAME to `hostname -s`
8) Start the cluster, waiting ~5s between starting each node
9) Use elasticsearch-head to look at your cluster and make sure it is GREEN
10) Inside the moloch/db directory edit the sessions.json file and run the "init.sh A_ES_HOSTNAME" script
11) Check elasticsearch-head again and make sure it is GREEN and now you should see some of the indexes



Building Capture
1) Install prerequisites standard packages
Centos: yum install pcre pcre-devel libuuid-devel pkgconfig flex bison gcc-c++ zlib-devel e2fsprogs-devel openssl-devel file-devel
Ubuntu: apt-get install libpcre3-dev uuid-dev libmagic-dev pkg-config g++ flex bison zlib-dev libffi-dev gettext libgeoip-dev
2) Building capture can be a pain because of OS versions.
2a) try ./easybutton-build.sh  which will download all the following, compile them statically, and run the local 
    configure script.
2b) Or if you want build yourself, or use some already installed packages then here are the pieces you need
* glib-2 version 2.22 or higher (2.22 is recommended for static builds) - http://ftp.gnome.org/pub/gnome/sources/glib
    wget http://ftp.acc.umu.se/pub/gnome/sources/glib/2.22/glib-2.22.5.tar.gz
    ./configure --disable-xattr --disable-selinux --enable-static
* yara - 1.6 or higher  - http://yara-project.googlecode.com
    wget http://yara-project.googlecode.com/files/yara-1.6.tar.gz
    ./configure --enable-static
* MaxMind GeoIP - (The OS version may be recent enough) - http://www.maxmind.com/app/c
    wget http://www.maxmind.com/download/geoip/api/c/GeoIP-1.4.8.tar.gz
    libtoolize -f # Only some platforms need this
    ./configure --enable-static
* libpcap - 1.1 or higher, most OS versions are older - http://www.tcpdump.org/#latest-release
    wget http://www.tcpdump.org/release/libpcap-1.3.0.tar.gz
    ./configure --disable-libglib
* libnids - 1.24 or higher
    wget http://downloads.sourceforge.net/project/libnids/libnids/1.24/libnids-1.24.tar.gz
    tar zxvf libnids-1.24.tar.gz
    cd libnids-1.24
    ./configure --disable-libnet --disable-glib2
3) run configure, optionally use the --with-* directives to use static libraries from build directories
4) make


Building Viewer
1) You'll need python 2.6 or higher, if still using Centos 5.x install a
   parallel version of python from the EPEL repository.  Make sure python2.6 is in
   your path first!
2) Install Node.js at least version 0.8.1
2a) From prebuilt binaries see various platform instruction here
       https://github.com/joyent/node/wiki/Installing-Node.js-via-package-manager
2b) From source  download, build, install nodejs - http://nodejs.org/dist/v0.8.3/node-v0.8.3.tar.gz
3) In the moloch/viewer directory run "npm install"


Configuration
1) Make sure you download a GeoIP files.  The free versions are available at
   http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
   http://www.maxmind.com/download/geoip/database/asnum/GeoIPASNum.dat.gz
2) Edit the config.ini file
3) In the viewer directory run the addUser.js to add users, use -admin if you want admin users 
   that can edit users from the web site.  This is a good test if elasticsearch and config.ini are right
      node addUser.js <userid> "<Friendly Name>" <password>
4) Edit the db/daily.sh script, and set that up in a cron on one machine


Running - If you made it this far, you are awesome!
On each capture machine you need to run at least one moloch-capture and one moloch-viewer.  
I like using good old inittab for this

  m1:2345:respawn:/home/moloch/capture/run.sh
  v1:2345:respawn:/home/moloch/viewer/run.sh

See sample versions in moloch/capture/run.sh.sample and moloch/viewer/run.sh.sample


Now point your browser to any moloch https://hostname:port

Wiki with FAQ is available at https://github.com/aol/moloch/wiki

About

Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 31.4%
  • Vue 25.5%
  • C 22.2%
  • Perl 11.8%
  • HTML 3.6%
  • Lua 2.0%
  • Other 3.5%