Skip to content

SysUsage is a system monitoring and alarm reporting tool. It can generate historical graph views of CPU, memory, IO, network and disk usage, and very much more.

License

Notifications You must be signed in to change notification settings

darold/sysusage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME
    SysUsage v5.7 - System Monitoring Tool

DESCRIPTION
    SysUsage is a tool used to continuously monitor a system and generate
    daily/weekly/monthly/yearly graphical report using rrdtool and sar.

FEATURES
    SysUsage generate graphical reports on all system activity information.
    His periodical reports allow you to keep track of the machine activity
    during his life and will be a great help for performance analysis and
    resources management.

    SysUsage can be run periodically from 10 seconds cycle in daemon mode to
    1 minute or more using crond.

    SysUsage can be run from a central server to call a ssh remote execution
    of the sysusage perl script so that collected data will be stored in
    this central place. You also will have just one place where rrdtool and
    related Perl modules need to be installed as well as just one place
    where sysusagegraph or sysusagejqgraph need to be executed.

  CPUs
            - CPUs distribution usage (user, nice, system).
            - CPUs global usage (total cpu used, iowait).
            - CPUs virtualized usage (steal, guest).

  Memory
            - Memory usage (with and without cache).
            - Swap usage (with and without cache).
            - Amount of memory need for current workload.
            - Posix share memory.
            - Hugepages utilisation
            - Active versus inactive memory
            - Dirty memeory that need to be written to disk

  I/O
            - Context switches per second.
            - Interrupts per second.
            - Page swapping.
            - Page I/O stats.
            - I/O request stats.
            - I/O block stats.

  Network
            - TCP connections per second.
            - TCP segments per second.
            - Number of socket in use (Total, TCP and UDP).
            - Number of socket in TIME_WAIT state.
            - Active network interface usage.
            - Active network interface bad packet, dropping, collision.

  Devices
            - CPU time for I/O on device.
            - Read/Write sectors on device.
            - Disk throughput on device.
            - I/O workload on device.       
            - Times for I/O requests issued to device.
            - Hard drive temperature if your hardward support it (with hddtemp).
            - MotherBoard/CPU/Remote temperature reported by sensors or sar.
            - Fan RPM reported by sensors.

  Files
            - Number of open file.
            - Number of file in a queue directory.
            - Disk space used on mounted partition.

  Process
            - Load average.
            - Process created per second.
            - Number of running process (ex: sendmail, httpd, oracle, etc.).
            - Number of running thread (ex: mysqld, amarok, etc.).
            - Number of task blocked waiting for I/O

  Notification
    You can have mail or Nagios notification when some monitored values are
    outside max/min threshold values for all type of monitoring.

  Plugins
    With SysUsage you can create your own monitoring plugins. Any script or
    program can be embeded in SysUsage provided that it return up to 3
    numeric values. The graphic title and labels are defined in the
    configuration file.

  Remote call
    SysUsage can be installed and run onto a central server that will be
    used to store statistics data by periodically calling sysusage on remote
    host using SSH. This central place will also be in charge to renderer
    HTML plages and graphics for all hosts. This will allow to simplify the
    SysUsage installation on remote host that will only require sysstat and
    rsysusage.

REQUIREMENT
  rrdtool
    You need to install rrdtool. All distribution may have a dedicated
    package for rrdtool. On CentOs/RedHat distributions, use the following
    command:

            yum install rrdtool rrdtool-perl

    on Debian/Ubuntu distributions use command:

            apt-get install rrdtool librrds-perl

    The sources can be found here:

            http://people.ee.ethz.ch/~oetiker/

    If you compile from sources and want to use the RRDs perl module
    embedded with it, you must use the following command to compile:

            make site-perl-install

    This installation is optional if sysusage is installed on a remote host.

  sysstat
    You also need sar to collect statistics. Sar is part of the sysstat
    package. For RPM like distributions:

            yum install sysstat

    and Debian like distributions:

            apt-get install sysstat

    The sources can always be found here :

            http://freshmeat.net/projects/sysstat/

    If you plan to use threshold notification you must have Net::SMTP
    installed.

            yum install perl-Net-SMTP-SSL

    or

            apt-get install libnet-smtp-ssl-perl

    Sources can be found on CPAN (https://metacpan.org/pod/Net::SMTP)

  Perl modules
    Sysusage can be run in a central place to collect remote sysusage
    statistics using ssh. The remote calls are proceed simultaneously using
    fork with the Proc::Queue Perl module.

    If you're plan tu use sysusagegraph instead of sysusagejqgrpah you will
    also need the GD and GD::Graph3D Perl modules. Note that the use of GD
    and GD::Graph is deprecated and sysusagegraph will be removed in next
    major release (6.0).

    All these modules are always available from CPAN (https://metacpan.org/)
    and may at least be installed on the central server. On remote host this
    is optional and depend if you want to run it on each server or by ssh
    from a central place.

  Nagios nsca client (optional)
    If you want to send message to Nagios you need to install
    nsca-2.7.2.tar.gz or a more recent version. You can get it here:

            http://sourceforge.net/projects/nagios/files/

  hddtemp and sensors (optional)
    If you want to monitor your hard drive temperature you must install a
    small utility called hddtemp. You can download it from
    http://download.savannah.gnu.org/releases/hddtemp/. Run it to see if
    your hard drive have a temperature sensor.

    You can also use sensors to monitor your cpu temperature and fan speed.
    If you harware support it run sensors-detect and load the required
    kernel modules at boot time.

INSTALLATION
  Quick install
    Simply run the following commands:

            perl Makefile.PL
            make && make install

    By default it will copy the perl programs into /usr/local/sysusage/bin
    and the HTML output will be done to /var/www/htdocs/sysusage/. The
    configuration file is /usr/local/sysusage/etc/sysusage.cfg and all RRD
    Bekerley DB databases from rrdtool will be saved under
    /usr/local/sysusage/rrdfiles.

    If you plan to run sysusage on different servers from a central place
    you may just want to install the rsysusage Perl script on remote hosts.
    So proceed as follow:

            perl Makefile.PL REMOTE=1
            make && make install

    It will copy the only the rsysusage into /usr/local/sysusage/bin and the
    configuration file under /usr/local/sysusage/etc/sysusage.cfg. The RRD
    data directory will be created under /usr/local/sysusage/rrdfiles but
    just to hold the *.cnt files relatives to the count of alert attempt on
    threshold exceed.

  Custom install
    You can overwrite all install path with the following Makefile.PL
    arguments. Here are the default values:

            BINDIR=/usr/local/sysusage/bin
            CONFDIR=/usr/local/sysusage/etc
            PIDDIR=/usr/local/sysusage/etc
            BASEDIR=/usr/local/sysusage/rrdfiles
            PLUGINDIR=/usr/local/sysusage/plugins
            HTMLDIR=/var/www/htdocs/sysusage
            MANDIR=/usr/local/sysusage/doc
            DOCDIR=/usr/local/sysusage/doc
            REMOTE=

    For example on a RedHat System you may prefer install SysUsage as this:

            perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
                    BASEDIR=/var/lib/sysusage HTMLDIR=/var/www/html/sysusage \
                    MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage

    If you are installing sysusage on a host that will be call by ssh from a
    central place, you may want to install just what is necessary and not
    more:

            perl Makefile.PL BINDIR=/usr/bin CONFDIR=/etc PIDDIR=/var/run \
                    MANDIR=/usr/man/man1 DOCDIR=/usr/share/doc/sysusage \
                    REMOTE=1

    This will just install the rsysusage Perl script, the configuration file
    and documentation. So that you don't need to install extra Perl modules
    and other graphics related things.

  Package/binary install
    In directory packaging/ you will find all scripts to build RPM,
    slackBuild and debian package. See README in this directory to know how
    to build these packages.

USAGE
    SysUsage consist in two main Perl scripts, sysusage and sysusagegraph.
    Once you have correctly installed and configured SysUsage the best way
    to execute them is by setting a cron job. If you prefer javascript
    graphics instead of GD::Graph images use sysusagejqgraph that is based
    on jqplot javascript library. This is the recommanded script as use of
    GD::Graph through sysusagegraph is deprecated.

  sysusage
    The script sysusage is responsible of collecting system informations at
    a given interval and store them into rrdtool database files.

    As it is very fast you can set running interval time to 1 minute. This
    is the default pooling interval used in configuration and graph reports.
    If you change this interval you must also change it in the configuration
    file otherwise your graph will be false. See the INTERVAL configuration
    directive.

    Here is how I use it with a default installation:

            */1 * * * * /usr/local/sysusage/bin/sysusage > /dev/null 2>&1

  rsysusage
    This script do the same things as the sysusage Perl script but instead
    of storing collected datas on file it will dump them to the standard
    output. This script is used instead of the sysusage Perl script by a ssh
    call from a central server where the local sysusage will store the
    statistics retrieved from multiple servers.

            /usr/local/sysusage/bin/rsysusage -r remote_hostname

    Where 'remote_hostname' is the hostname given in the [REMOTE ...]
    configuration section.

  sysusagegraph (deprecated) / sysusagejqgraph
    The perl script sysusagegraph is used to draw PNG graphs and write HTML
    file. As he knows the pooling interval given in the configuration file
    it can be run at any time. I used to run it each five minutes but you
    can run it each hours or more this is the same.

            */5 * * * * /usr/local/sysusage/bin/sysusagegraph > /dev/null 2>&1

    Since release v4.0 of SysUsage there's a JQuery plotting replacement of
    rrdGraph that only write HTML files with all javascript code to allow
    the client browser to draw the graphs. To enable this feature you just
    have to use sysusagejqgrpah instead.

            */5 * * * * /usr/local/sysusage/bin/sysusagejqgraph > /dev/null 2>&1

    There's some more resources javascript libraries and CSS files to
    install. The SysUsage installer will do the job for you. This remove the
    requirement of the GD, GD::Graph and GD::Graph3D Perl modules.

  sysusage.cfg
    If you have change the default installation path (/usr/local/sysusage)
    you may need to give these scripts the path to the configuration file as
    command line argument using -c option. To know what arguments can be
    passed use option -h or --help.

    Note that since version 3.0 the default configuration path in these
    scripts is set during installation. So you may not need anymore to edit
    these scripts or give the path of the configuration file as command line
    argument.

    See CONFIGURATION chapter for more information on howto configure your
    system monitoring.

  Daemon mode
    Crond is good for scheduling but not under the minute. If you want to
    monitor your system within an interval under the minute you may want to
    run sysusage in daemon mode. To do that, just change the INTERVAL to the
    desired timer in the configuration file and the DAEMON directive to 1.

  Debug mode
    Some time things don't appear as you wanted. The best way to see what's
    going wrong is to run sysusage in debug mode. This mode allow you to see
    all values extracted from sar and other tools. Use the --debug option
    for that, this mode prevent sysusage to store data in the rrdfiles.
    Command:

            /usr/local/sysusage/bin/sysusage --debug

    Please, run this command and check the result before sending bug report.

  Output
    Once sysusage and sysusagegraph are running since some cycles, run your
    favorite browser and take a look at the output directory. By default:

            http://my.server.dom/sysusage/

    If you have special URI and/or port remember to modify the URL
    configuration directive without that the web interface will not works.

CONFIGURATION
    During installation a default configuration file sysusage.cfg is
    generated. The default settings are good enougth to report essential
    information of your system, but if you want to monitor some processes,
    queue directories or some devices you must edit this file by hand.

    Here is the format of the configuration file and all directives. There
    is three section, the first one set the general parameters of the
    application, the second set the parameters related to SMTP or Nagios
    notification at threshold exceed and the last configure all type of
    system information you may want to monitor.

    Full sample of configuration file:

            [GENERAL]
            DEBUG       = 0
            DATA_DIR    = /usr/local/sysusage/rrdfiles
            PID_DIR     = /usr/local/sysusage/etc
            DEST_DIR    = /var/www/htdocs/sysusage
            SAR_BIN     = /usr/bin/sar
            UPTIME      = /usr/bin/uptime
            HOSTNAME    = /bin/hostname
            INTERVAL    = 60
            SKIP        = 12:00/14:00 20:00/06:00
            HDDTEMP_BIN = /usr/local/sbin/hddtemp
            SENSORS_BIN = /usr/bin/sensors
            DAEMON      = 0
            GRAPH_WIDTH = 550
            GRAPH_HEIGHT= 200
            FLAMING     = 0
            HIRES       = 0
            LINE_SIZE   = 2
            PROC_QSIZE  = 4
            RESRC_URL   =
            SSH_BIN     = /usr/bin/ssh
            SSH_OPTION  = -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
            SSH_USER    =
            SSH_IDENTITY=


            [ALARM]
            WARN_MODE   = 0
            ALARM_PROG  = /usr/local/sysusage/bin/sysusagewarn
            SMTP        = localhost
            FROM        = root@localhost
            TO          = root@localhost
            NAGIOS      = /usr/local/nagios/bin/submit_check_result
            UPPER_LEVEL = 1
            LOWER_LEVEL = 2
            URL         =

            [MONITOR]
            load:threshold_max_value
            blocked:threshold_max_value
            cpu:threshold_max_value
            cswch:threshold_max_value
            intr:threshold_max_value
            mem:threshold_max_value
            dirty:threshold_max_value
            swap:threshold_max_value
            work:threshold_max_value
            share:threshold_max_value
            sock:threshold_max_value
            socktw:threshold_max_value
            io:threshold_max_value
            file:threshold_max_value
            page:threshold_max_value
            pcrea:threshold_max_value
            pswap:threshold_max_value
            net:threshold_max_value
            tcp:threshold_max_value
            err:threshold_max_value
            disk:threshold_max_value
            proc:proc_name:threshold_max_value:threshold_min_value
            tproc:proc_name:threshold_max_value:threshold_min_value
            queue:path_queue_dir:threshold_max_value
            hddtemp:device:threshold_max_value
            dev:device(alias):threshold_max_value
            dev:device(alias):rpm_speed:raid_type:nb_disk
            work:threshold_max_value
            sensors:pattern:threshold_max_value
            temp:device:threshold_max_value
            fan:device:threshold_max_value
            huge:threshold_max_value

            [PLUGIN testplug]
            title:Sysage Test plugin
            menu:Database
            enable:no
            program:/usr/local/sysusage/plugins/plugin-sample.pl
            minThreshold:0
            maxThreshold:10
            verticalLabel:Number of seconds
            label1:Total seconds
            label2:
            label3:
            legend1:seconds
            legend2:
            legend3:
            remote:yes

            [REMOTE hostname1]
            enable:no
            ssh_user:monitor
            ssh_identity:/home/monitor/.ssh/id_rsa
            #ssh_options: -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey
            #ssh_command:
            remote_sysusage:/usr/local/sysusage/bin/rsysusage

            #[GROUP Web Servers]
            #hostname1
            #hostname2

  Section GENERAL
    DEBUG = 0|1
        This option is used to set debug mode. If set to 1 then sysusage and
        sysusagegraph just show what they do but don't create or send
        anything.

    DATA_DIR = /path/to/rrdfiles
        This option is used to set te ouput directory for all RRDTOOL
        database.

    PID_DIR = /path/to/piddir
        sysusage and sysusagegraph use a file to store the pid of the
        running process to prevent simultaneous run.

    DEST_DIR = /path/to/html_output
        Set the path to the directory where all HTML and graph files should
        be created.

    SAR_BIN = /path/to/sar_binary
        sysusage use sar, part of the sysstat distribution to grab system
        information so we need to know where it is.

    UPTIME = /path/to/uptime_binary
        sysusagegraph report the current uptime of the system using the
        uptime command. Used to set path to uptime binary.

    HOSTNAME = /path/to/hostname_binary
        All scripts of Sysusage distribution need to know the name of the
        host. They use hostname command for that.

    INTERVAL = pull_interval_in_second
        All RRDTOOL input use the given interval in second to store
        monitored values. Graph construction also use this interval to
        render things properly. By default Sysusage use an interval of 60
        seconds to have a better statistic report. You can change this but
        it's not recommanded. If you change this adjust your crontab to the
        same value. This value must between 10 and 300 seconds. If you want
        to be under the minute you must use the daemon mode to run sysusage.
        See DAEMON bellow.

    SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
        You can define here some time range where monitoring will not be
        done. Value is a list of begin_time/end_time separated by space or
        tabulation. Let's say you don't want to monitor the host during the
        night for some good reason, you can write it like that: 20:00/06:00

    HDDTEMP_BIN = /path/to/hddtemp_binary
        You can monitor your hard drive temperature if you have installed
        hddtemp utility. We need to know the path to hddtemp binary.

    SENSORS_BIN = /path/to/sensors_binary
        You can monitor your device temperature if you have installed
        lm_sensor utility. We need to know the path to sensors binary.

    DAEMON = 0 | 1
        You can monitor your system under the crond limitation of 1 minute
        by running sysusage in daemon mode with an INTERVAL between 10 end
        60 seconds.

    GRAPH_WIDTH and GRAPH_HEIGHT
        These are usefull if you want to resize graph dimension. Default is
        a width of 550 pixels and a height of 200.

    FLAMING
        This is for fun, if you want to have random flaming effect on graphs
        with only dataset set this directive to 1. Disable by default. Not
        used with JQuery graph renderer.

    HIRES
        Allow addition of hourly graph to have fine granularity of the data.
        This is disable by default. Set it to any integer between 1 to 23
        hours included to show data from past N hours to now. Not used with
        JQuery graph renderer as the Javascript library allow you to zoom
        into the resolution you want.

    LINE_SIZE
        By default the graph line size is 1 if you want graph with a more
        thick line set it to 2. This is rrd graph limitation (1 or 2). Not
        used with JQuery graph renderer.

    PROC_QSIZE
        Number of simultaneous remote sysusage call process that should be
        run. Default is 4 but it can be up to 15 or more depending of the
        hardware configuration. One per core is the lower value you may
        think about.

    RESRC_URL
        Images, javascripts and css ressources by default are search into
        the DEST_DIR directory so that in the HTML view they all stayed on
        the current main directory. You may want to place thoses resources
        on an other directory or an another place. Using this directive you
        can set any FQDN, absolute or relative URL for these resources.

    SSH_IDENTITY
        Used to set the default identity file to connect to all remote hosts
        without password. If undefined, sysusage will use the ssh system
        default value. You may want to use the default value unless you know
        exactly what's you are doing.

    SSH_OPTION
        Use set the default ssh options, that correspond to a passwordless
        authent:

                -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey

        with a five seconds connection timeout. You may want to increase
        this timeout on very slow network links.

        Do not change this value unless you know exactly what's you are
        doing.

    SSH_BIN
        Path to the ssh command is set here at install time.

    SSH_USER
        Used to defined the default ssh user that will be used to connect to
        all remote hosts.

  Section ALARM
    WARN_MODE = 0|1
        Used to disable/enable alert message during threshold exceed.

    ALARM_PROG = /path/to/sysusagewarn
        Used to set path to the external program responsible of sending
        alarm message. You can change it to your own, just take a look at
        the sysusagewarn usage to see what command line options are used by
        sysusage

    SMTP = smtp.server.net
        Name or Ip address of the SMTP server to contact. Default is none =>
        No smtp message is sent.

    FROM = sender@localhost
        Sender email addresse to use in the SMTP message.

    TO = destination@localhost
        Destination email address where the alarm message will be sent.

    NAGIOS = /usr/local/nagios/bin/submit_check_result
        Path to the external nsca program used to send check message to
        Nagios. Setting this will activate nagios check report. See at end
        of this file to see how to configure Nagios

    UPPER_LEVEL = 1
        Nagios check level to send when a high threshold limit is reached.
        Default is 1 => WARNING.

    LOWER_LEVEL = 2
        Nagios check level to send when a low threshold limit is reached.
        Default is 2 => CRITICAL.

    URL = Url of Sysusage report
        Used to overwrite the default URL of SysUsage report
        http://host.dom/sysusage/ especially if you have a special port or a
        different path. Example:
        http://hostname.domain:9080/Reports/Sysusage/

    SKIP = HH:MM/HH:MM HH:MM/HH:MM ...
        You can define here some time range where alarm notice will not be
        sent. Value is a list of begin_time/end_time separated by space or
        tabulation. Let's say you don't want to received notice during the
        night for some good reason, you can write it like that: 20:00/06:00

  Section MONITOR
    This section has two different format the first one is used to specify
    most of the monitoring target:

            type:threshold_max

    or

            type:threshold_max(attempt)

    type
        Type of system information you may want to monitor. It can takes
        around 30 differents values:

                load   => monitor load average
                blocked=> monitor task blocked waiting for I/O
                cpu    => monitor each cpu(s) user/nice/system usage
                       => monitor each cpu(s) total/iowait usage
                       => monitor each cpu(s) steal/guest usage
                cpuall => monitor global cpu(s) statistics
                cswch  => monitor context switches usage
                intr   => monitor number of interrupt per second
                mem    => monitor memory usage
                dirty  => monitor memory active/inactive/dirty memory
                share  => monitore Posix share memory usage (/dev/shm)
                swap   => monitor swap usage
                work   => monitor amount of memory needed for current workload
                sock   => monitor number of open socket
                socktw => monitor number of socket in TIME_WAIT state
                io     => monitor I/O request and block usage
                page   => monitor I/O page usage
                pswap  => monitor I/O page swap usage
                pcrea  => monitor number of process created per second
                proc   => monitor number of running process
                tproc  => monitor number of running thread
                file   => monitor number of open file
                queue  => monitor number of files in queue
                net    => monitor I/O network bytes on all network interfaces
                err    => monitor bad packet, drop and collision on interfaces
                tcp    => monitor number of tcp connection and segment
                disk   => monitor disk space usage
                dev    => monitor percentage of CPU time per device
                       => monitor average request queue length
                       => monitor I/O sectors read and write to device
                       => monitor time spent in queue (await)
                       => monitor time spent in servicing (svctm)
                sensors=> monitor fan and device temperature using sensors command
                hddtemp=> monitor disk drive temperature
                temp   => monitor device temperature using sar
                fan    => monitor fan rotation using sar
                huge   => monitor size of hugepages utilisation

        Note: the 'cpu' target monitoring type will report all statictics
        per cpu. This can represent a lot of informations if you several
        cpu. To limit statistics to total cpu only, you must replace default
        the 'cpu' target to 'cpuall' in your configuration file.

    threshold_max
                This is the maximum threshold value. Any value equal or upper
                than this one will generate SMTP and/or Nagios alert if you
                have enable it.

    attempt
        You can delay the call to the alarm program at threshold exceed by
        specifying the number of consecutive exceed attempt before the
        command will be called. Just specify the number of attempt between
        bracket just after the min and/or max threshold value. This setting
        is optional for both threshold value and the default is to send
        alarm immediatly.

    Specials cases
        There's a special case for 'disk' usage monitoring that allow
        exclusion of some mount point. This is usefull if you have hard link
        or some special device you don't need to monitor. Where exclusion is
        a semi- colon (;) separated list of mount point to exclude from
        monitoring.

                disk:ThresholdMax:exclusion

        Ex: disk:90:/home/mondo_image;/home/smb_mountpoint

        You can use regexp in your excluded path.

        The other directive with special syntax is 'dev'. It is construct as
        follow:

                dev:device(alias):rpm_speed:raid_type:nb_disk

        where device is sda, sdb or any device name (without the /dev/), the
        alias between parenthesis is the name that must be displayed in the
        user interface instead of the device name. For example:

                dev:sdc(ASM disk1):
                dev:sdb(/data):

        I you plan to use I/O workload report, SysUsage need to know the
        speed of the disk (RPM), the raid type (0,1,5,10) and the number of
        disk in the raid array to calculate the IOPS. For example if we have
        a 7200 RPM disk with 2 disk in raid 1, we will write thing like
        that:

                dev:sdc(ASM disk1):7200:1:2

        I/O workload is the relation between TPS (transfers per second) and
        IOPS (I/O operations measured in seconds) of a device. If the tps
        returned by sysstat reach the maximum theoretical IOPS, your storage
        subsystem is saturated. Here is the equation to calculate the
        maximum theoretical IOPS:

                d = number of disks
                dIOPS = IOPS per disk
                %r = % of read workload
                %w = % of write workload
                F = raid factor

                IOPS = (d *dIOPS) / (%r + (F * %w))

        the theoretical maximum IOPS for a RAID set (excluding caching of
        course). To do this you take the product of the number of disks and
        IOPS per disk divided by the sum of the %read workload and the
        product of the raid factor and %write workload. Where %read and
        %write are calculated from the following equation:

                %r = rd_sec / (rd_sec + wr_sec);
                %w = wr_sec / (rd_sec + wr_sec);

        This IOPS monitoring is build following the excellent article of
        Nick Anderson readable from Analyzing I/O performance in Linux.

    The second format is used to monitor running process, hard drive
    temperature or queue directory. It has the following format:

            type:target:threshold_max_value:threshold_min_value

    or

            type:target:threshold_max_value(attempt):threshold_min_value(attempt)

    type
        Type of system information you may want to monitor. It can takes
        these differents values:

                load, cpu, cswch, intr, mem, swap, work, share, sock, socktw, io, file,
                page, pcrea, pswap, net, tcp, err, disk, proc, tproc, queue, hddtemp,
                dev, work, sensors, temp, fan, huge, blocked, dirty

    target
        If type is 'proc' or 'tproc' target represent the name of the
        process to monitor. You can put a regexp as target to match exactly
        the required process. The number of running process are obtain by
        the system command line:

                ps -e -o command | grep -E "target" | grep -v grep | wc -l

        so you can replace the word target by the regexp to match and see if
        it returns the right number of process.

        The number of running thread are obtain by the system command line:

                ps -eL -o command | grep -E "target" | grep -v grep | wc -l

        If type is 'queue' this represent the full path of the directory to
        monitor. Sysusage will try to find and count any regular file in the
        target directory and will not follow sub directories.

        If type is 'hddtemp' the target represent the hard drive device to
        monitor, ex: /dev/sda. You can try it with the following command
        line:

                hddtemp -n /dev/sda

        This may return the actual temperature detected on the hard drive.

        If this is 'dev' this represent the device name to monitor. Ex: sda.
        Do not add the /dev/ before this will not work. You may want to
        change the device name in the graphic menu, this is possible by
        adding the device alias enclosed with parenthesis.

        For example lets say you're monitoring some EMCpower SAN device.
        Using sar the reported devices are dev120-48 and dev120-64. Once you
        have find what partition are mapped to these devices (reading
        /proc/partitions). In this example these devices are mounted as
        /cache1 and /cache2 so we want to see these mount points instead of
        device number in the graphical menu:

                dev:dev120-48(/cache1):90
                dev:dev120-64(/cache2):97

        in you sysusage.conf file will do the job. The threshold_max value
        is the max percentage of CPU used for this device before sending an
        alarm.

        If type is 'sensors' this represent the pattern to match to obtain
        temperature or fan speed information in the sensors program output.
        See chapter SENSORS to have more information.

        If type is 'temp' or 'fan' this represent the device number reported
        by sar to obtain temperature or fan speed information. To know what
        device number must be used, see result of command: sar -m ALL 1 1

    threshold_max
        This is the maximum threshold value. Any value equal or upper will
        generate an SMTP and/or Nagios alert if you have enable it.

    threshold_min
        This is the minimum threshold value. Any value equal or lower of
        this one will generate SMTP and/or Nagios alert if you have enable
        it. Min threshold should certainly only be used with 'proc' and
        'tproc' monitoring type. If you set it to 0 then you will be warn if
        any of the monitored process are down.

    attempt
        You can delay the call to the alarm program at threshold exceed by
        specifying the number of consecutive exceed attempt before the
        command will be called. Just specify the number of attempt between
        bracket just after the min and/or max threshold value. This setting
        is optional for both threshold value and the default is to send
        alarm immediatly.

        For example a load average monitoring defined like this

                load:12(3)

        will send an alarm when the system load average will exceed 12 after
        three consecutives attempts at the define interval. If the interval
        is 60 seconds, the alarm will be sent up to 180 second after the
        first exceed.

  Section PLUGIN
    This part enable the use of custom plugins. You can call any program or
    script provide that it return up to 3 numbers separated by a space
    character. See plugins/ directory for sample scripts.

    This section must include a name composed of any alphanumeric character
    that will be used to create the target file, for example:

            [PLUGIN testplug1] or [PLUGIN testplug2]

    The section allow the following configuration directives. They are
    composed of named directives followed by ':' or '=' and a value.

    enable
        Is used to disable temporary the plugin monitoring. Default is 'yes'
        enable. To disable write it enable:no

    program
        Is used to set the path to the program or script to execute as
        plugin. This program must print to STDOUT 1 to 3 numbers separated
        by a space character as result following the number of reports you
        want. So each plugin can have 1, 2 or 3 graphed data.

    title
        Is used to set the title of the report page and the index link.
        Default is set to "Sysusage plugin".

    menu
        Is used to store the plugin under a submenu of the plugins menu.
        Default is to store plugin under the "Others" submenu.

    maxthreshold
        This is the maximum threshold value. Any value equal or upper than
        this one will generate SMTP and/or Nagios alert if you have enable
        it.

    minthreshold
        This is the minimum threshold value. Any value equal or lower of
        this one will generate SMTP and/or Nagios alert if you have enable
        it.

    verticallabel
        This is used to set the vertical label of the graph.

    label1, label2, label3
        Are used to show a legend for each graphed data, label1 is for the
        first returned value, label2 for the second and label3 for the last.
        If you just have one value returned just omit the other labels.

    legend1, legend2, legend3
        These are use to set the units for Current, Avg and Max values.

    remote
        This directive must be set to 'no' to prevent execution of the
        plugin program by a issh call to sysusage in a remote context. This
        directive is activated by default ('yes').

  Section REMOTE
    This part allow to run sysusage on remote hosts from a central server.
    It use ssh to execute sysusage on the destination host with the -r
    option that force sysusage to not write anything to local data files but
    to print all result to stdout. As sysusage is run by cron job or daemon
    mode it can not authenticate interactively to remote host so you must
    give a ssh user and an identity file with the corresponding
    configuration option.

    This section must include the name or the ip address of the remote host
    that will be used to create the target data directory, for example:

            [REMOTE hostname] or [REMOTE host.domain.dom] or [REMOTE 192.168.1.14]

    The section allow the following configuration directives. They are
    composed of named directives followed by ':' or '=' and a value.

    Once you have installed sysusage on all remote host and exchange the SSH
    key certificat between the central host and all remote hosts, most of
    the time you just have to set the ssh_user directive to have it working.
    Use remote_sysusage directive if sysusage perl script is not installed
    on the same place than the central server.

  Section GROUP
    This section allow you to groups remote host report under a common
    groupname in the index page. Remote hosts will be ordered following
    their parent groups. The name of the group can be any string and the
    values in the section must be a list of remote servers defined in the
    REMOTE sections.

    For example if you are monitoring a cluster of web and database servers
    you can use the following declaration:

            [GROUP Web Servers]
            webhost1
            webhost2
            webhost3

            [GROUP Database Servers]
            dbhost1
            dbhost2

    Of course webhostN and dbhostN hosts must be declared in the remote
    section.

    enable
        Is used to enable/disable the remote host monitoring. Default is
        'yes' enable. Set it as 'enable=no' to disable it.

    ssh_user
        Used to defined the ssh user allowed to connect to remote host. By
        default the value set to SSH_USER configuration option in the
        GENERAL section will be used.

    ssh_identity
        Used to set the identity file to connect to remote host without
        password. By default the value set to SSH_IDENTITY configuration
        option in the GENERAL section will be used. Usually this is the
        private key that you've generated using ssh-keygen and most of the
        time file $HOME/.ssh/id_rsa. You may want to use the default value
        unless you know exactly what's you are doing.

    ssh_options
        Use to overwrite the default ssh options, that are:

                -o ConnectTimeout=5 -o PreferredAuthentications=hostbased,publickey

        The default options are set into the SSH_OPTIONS configuration
        option in the GENERAL section. You may want to use the default value
        unless you know exactly what's you are doing.

    ssh_command
        You can overwrite the complete ssh command using this directive,
        this will replace the ssh command, the ssh option, the ssh user and
        the host part. The sysusage remote command will not be replaced. You
        may want to use the default value unless you know exactly what's you
        are doing.

    remote_sysusage
        Use it to set the path to the rsysusage command that must be used on
        the remote host, SysUsage will automatically add the -r option to
        cause the remote execution mode.

THRESHOLD NOTIFICATION
  SMTP alert
    Sysusage use an external perl script to send SMTP alert and/or Nagios
    checks when a max or min threshold is reached. This program is named
    sysusagewarn. All options of the configuration file in section [ALARM]
    are use by sysusage to call this program. If they are correctly set you
    don't have to take care of the parameters given to this program. If you
    want to use this program outside sysusage, here are the command line
    options it understand:

            Usage: sysusagewarn -t subject -c current_value -v threshold_value
                            [-s smtp_srv] [-f from] [-d to] [-b hostname_prog]

            -t subject : Subject of the alarm
            -c value   : Current value monitored by sysusage
            -v value   : Threshold value used.
            -s host    : SMTP server name or ip where to send email.
            -f from    : Sender email address of the alarm message.
            -d to      : Destination address of the alarm message.
            -b path    : Path to program hostname. Default is /bin/hostname
            -n path    : Path to Nagios program submit_check_result. Default none. 
            -l value   : Alarm level (0=OK,1=WARNING,2=CRITICAL). Default: 1. 
            -r service : Nagios service name to used. Must be any sysusage type of
                         monitoring defined in the configuration file.
            -u url     : Url to HTML sysusage output to include in email.
                         Default: http://hostname.domain/sysusage/
            -h         : Output this message and exit

  NAGIOS alert
    SysUsage send check message to Nagios through an external command
    (submit_check_result). So you need to create the host and associate all
    sysusage service that you want to monitor with Nagios. The services name
    correspond to the type of monitoring. For example, if you have enable
    alarm on memory usage the service sent is 'mem'. There's also specials
    case with type of monitoring with multiple instance like network
    monitoring. You need to create a service per instance. For example type
    'net' will have 'net_eth0' and 'net_lo' and more if you have more
    network interface. To see if your sysusage alarm messages are well
    understood by Nagios take a look at the nagios.log file (default to
    /usr/local/nagios/var/nagios.log).

    To desactivate automatically an alarm reported to Nagios, SysUsage will
    send each time it run an OK request if every thing is correct for the
    monitored type.

SENSORS
    Monitoring of sensors output is based on regexp. To be clear enought
    here an example:

    Sensors output on my server:

            adt7463-i2c-0-2d
            Adapter: SMBus I801 adapter at 1480
            V1.5:        +3.23 V  (min =  +0.00 V, max =  +3.32 V)
            VCore:       +1.24 V  (min =  +1.10 V, max =  +1.49 V)
            V3.3:        +3.33 V  (min =  +2.80 V, max =  +3.78 V)
            V5:          +4.99 V  (min =  +4.25 V, max =  +5.75 V)
            V12:         +0.11 V  (min =  +0.00 V, max = +15.94 V)
            CPU_Fan:       0 RPM  (min =    0 RPM)
            fan2:       10671 RPM  (min = 8095 RPM)
            fan3:          0 RPM  (min =    0 RPM)
            fan4:          0 RPM  (min =    0 RPM)
            CPU Temp:    +69.5 C  (low  =  +2.0 C, high = +91.0 C)
            Board Temp:  +32.5 C  (low  =  +2.0 C, high = +83.0 C)
            Remote Temp: +31.2 C  (low  =  +2.0 C, high = +58.0 C)
            cpu0_vid:   +1.338 V

            adt7463-i2c-0-2e
            Adapter: SMBus I801 adapter at 1480
            V1.5:        +3.21 V  (min =  +0.00 V, max =  +3.32 V)
            VCore:       +1.28 V  (min =  +1.10 V, max =  +1.49 V)
            V3.3:        +3.32 V  (min =  +2.80 V, max =  +3.78 V)
            V5:          +4.95 V  (min =  +0.00 V, max =  +6.64 V)
            V12:         +0.11 V  (min =  +0.00 V, max = +15.94 V)
            CPU_Fan:    10843 RPM  (min = 8095 RPM)
            fan2:          0 RPM  (min =    0 RPM)
            fan3:       9642 RPM  (min = 8095 RPM)
            fan4:          0 RPM  (min =    0 RPM)
            CPU Temp:    +57.2 C  (low  =  +2.0 C, high = +91.0 C)
            Board Temp:  +35.2 C  (low  =  +2.0 C, high = +91.0 C)
            Remote Temp: +35.8 C  (low  =  +2.0 C, high = +58.0 C)
            cpu0_vid:   +1.338 V

    Following the sensors kernel module load you could have more or less
    output than that. To monitor all sensors CPUs temperature on my server I
    need to add the following lines into sysusage.cfg:

            sensors:CPU Temp:75
            sensors:Board Temp:45
            sensors:Remote Temp:45

    This will create 3 graphs based on lines matching 'CPU Temp', an other
    with lines matching 'Board Temp' and the last with lines matching
    'Remote Temp'. As I have 2 CPUs for each graph there will be 2 values.
    You can not report more than 3 values per graph, this is hard coded into
    sysusage. So if you have more CPUs you will not see more than 3 values.
    Here it will sent alarm when temperature exceed the given values
    (75,45,45).

    To monitor fan speed, I just add lines like this in the configuration
    file:

            sensors:fan2:11000:8095
            sensors:fan3:11000:8095

    This whil create 2 graphs for fan 2 and fan 3. With an alarm sent when
    speed exceed 11000 RPM or is lower than 8095 RPM.

    On my personal computer (/etc/sysconfig/lm_sensors => modprobe coretemp)
    sensors output is:

            coretemp-isa-0000
            Adapter: ISA adapter
            Core 0:      +53.0 C  (high = +78.0 C, crit = +100.0 C)

            coretemp-isa-0001
            Adapter: ISA adapter
            Core 1:      +50.0 C  (high = +78.0 C, crit = +100.0 C)

    To monitor CPU temprature, I just add this line in my sysusage.cfg:

            sensors:Core:70

    This will generate a graph with 2 graphed data for Core 0 and Core 1.

    Now that sysstat sar natively reports deviceis temperature and fan speed
    you don't need sensors anymore. Type 'temp' can be used instead and type
    'fan' for the fan speed. The target of these types is the device number,
    See sar -m TEMP or sar -m FAN to see which device number to monitor.

BUGS / FEATURE REQUEST
    Please report any bugs, remarqs and feature request using the Github
    interface at https://github.com/darold/sysusage/ or send a mail to the
    author.

LICENSE
    Copyright (C) 2003-2018 Gilles Darold

    This program is free software; you can redistribute it and/or modify it
    under the terms of the GNU General Public License as published by the
    Free Software Foundation; either version 3 of the License, or any later
    version.

    This program is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
    Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

AUTHOR
    Gilles Darold <gilles _|_At_|_ darold _|_DoT_|_ net>

ACKNOWLEGMENT
    I want ot thanks all the people who help to build this tool with a very
    special thank to Marat Dyatko for the web design contribution.

About

SysUsage is a system monitoring and alarm reporting tool. It can generate historical graph views of CPU, memory, IO, network and disk usage, and very much more.

Resources

License

Stars

Watchers

Forks

Packages

No packages published