|
| 1 | +# Flow data to Kafka streaming |
| 2 | + |
| 3 | +This script is implementing flow data aggregation and streaming of the |
| 4 | +results to Kafka stream in CBOR format. It can be easily changed to use |
| 5 | +a different format or sent different data. Consider this one as an |
| 6 | +example of what is possible and modify it to your needs. |
| 7 | + |
| 8 | +It has been tested with Flowmon system version 12.3. |
| 9 | + |
| 10 | +## Prerequisites |
| 11 | + |
| 12 | +To run it you would need to create a Python virtual environment on your |
| 13 | +Flowmon appliance and add necessary libraries which aren’t present on |
| 14 | +Flowmon system. This can be achieved by running the following commands: |
| 15 | + |
| 16 | +python3 -m venv kafka |
| 17 | + |
| 18 | +The kafka is a name of the virtual environment. If you use a different |
| 19 | +name, you will need to change that in the script as well. The following |
| 20 | +is changing directory, and you need to use name of virtual environment |
| 21 | +used above. The script then needs to be placed in this folder. |
| 22 | + |
| 23 | + cd kafka |
| 24 | + |
| 25 | +source bin/activate |
| 26 | + |
| 27 | +pip3 install kafka-python |
| 28 | + |
| 29 | +pip3 install cbor |
| 30 | + |
| 31 | +## Using the script |
| 32 | + |
| 33 | +The script is made to run every five minutes and you can add it to |
| 34 | +Flowmon user crontab by editing it with command “crontab -e”. It’s |
| 35 | +keeping its last timestamp in a file called last and if this one doesn’t |
| 36 | +exist it is created. |
| 37 | + |
| 38 | +When you need to test the script multiple times you would need to delete |
| 39 | +this file as it would round the current time to previous 5-minute |
| 40 | +interval to run analysis by nfdump console command. |
| 41 | + |
| 42 | +The command to get the aggregation is present in function get\_data. |
| 43 | + |
| 44 | +command = f"/usr/local/bin/nfdump -M |
| 45 | +/data/nfsen/profiles-data/live/'127-0-0-1\_p3000:127-0-0-1\_p2055' -r |
| 46 | +{timestamp} -A 'dstctry' -o 'fmt:%ts,%dcc,%td,%pkt,%byt,%pps,%bps,%fl' |
| 47 | +-6 --no-scale-number" |
| 48 | + |
| 49 | +The result in the SSH command line when tunning this command could look |
| 50 | +like following. |
| 51 | + |
| 52 | +Date first seen Dst Ctry Duration Packets Bytes pps bps Flows |
| 53 | + |
| 54 | +2023-11-30 11:29:22.585, 203, 302.210, 760, 58348, 2, 1544, 196 |
| 55 | + |
| 56 | +2023-11-30 11:29:39.261, 826, 271.966, 55, 4541, 0, 133, 15 |
| 57 | + |
| 58 | +2023-11-30 11:30:08.502, 372, 227.322, 189, 81984, 0, 2885, 13 |
| 59 | + |
| 60 | +2023-11-30 11:30:54.374, 250, 150.388, 351, 195125, 2, 10379, 22 |
| 61 | + |
| 62 | +2023-11-30 11:27:04.546, 840, 468.700, 4592, 1172714, 9, 20016, 1486 |
| 63 | + |
| 64 | +2023-11-30 11:30:06.511, 276, 200.593, 84, 10942, 0, 436, 5 |
| 65 | + |
| 66 | +2023-11-30 11:32:00.974, 100, 0.000, 1, 76, 0, 0, 1 |
| 67 | + |
| 68 | +2023-11-30 11:25:03.508, 0, 594.975, 829590,893719438, |
| 69 | +1394,12016900,14558 |
| 70 | + |
| 71 | +2023-11-30 11:29:44.087, 528, 297.434, 676, 204732, 2, 5506, 42 |
| 72 | + |
| 73 | +Summary: total flows: 16338, total bytes: 895447900, total packets: |
| 74 | +836298, avg bps: 12040141, avg pps: 1405, avg bpp: 1070 |
| 75 | + |
| 76 | +Time window: 2023-11-30 11:25:03 - 2023-11-30 11:35:00 |
| 77 | + |
| 78 | +Total flows processed: 16338, Blocks skipped: 0, Bytes read: 5883516 |
| 79 | + |
| 80 | +Sys: 0.028s flows/second: 569427.0 Wall: 0.010s flows/second: 1603966.2 |
| 81 | + |
| 82 | +The easiest way to get the command for aggregation is to run the query |
| 83 | +in the Monitoring Center Analysis where you get the results you are |
| 84 | +after. Do not forget to select all fields you want to use for |
| 85 | +aggregation, filter the data (if needed), select proper output format |
| 86 | +and limit on the number of results which are interesting for you. Also |
| 87 | +select the right profile and channels where you want to get the data |
| 88 | +from. |
| 89 | + |
| 90 | + |
| 92 | + |
| 93 | +Once you click on the black terminal window icon it will give the |
| 94 | +statistics command. This one would look like the above example so you |
| 95 | +can replace this command between quotas. Just change -R to -r |
| 96 | +{timestamp}” as it’s in the example so this can change the timestamp of |
| 97 | +analyzed data with each run. |
| 98 | + |
| 99 | +When you modify the command, you would need to modify the function |
| 100 | +process\_records as the record would have a different format based on |
| 101 | +the output you have selected. |
| 102 | + |
| 103 | +The script supports three arguments. |
| 104 | + |
| 105 | +\-i HOST, --host HOST IP address/hostname of the bootstrap server |
| 106 | + |
| 107 | +\-p PORT, --port PORT Port of the running boostrap server |
| 108 | + |
| 109 | +\-t TOPIC, --topic TOPIC Kafka topic to stream |
| 110 | + |
| 111 | +There is a log file located in the script folder (by default |
| 112 | +kafka/kafka-stream.log) which can help you with troubleshooting. It does |
| 113 | +require connection from external IP to bootstrap Kafka server configured |
| 114 | +so it can connect and send data for the specified topic. |
0 commit comments