For recognicing of netflow dublications. Based on data generated with bwNetFlow/flowpipeline.
This tool filters your flows based on:
- protocol
- port-combination
- similarity of flow-vectors "packages/bytes/?"-Tuple
- ip addresses (destination addresses) - check if they are equal
You need flows of two differnt sources.
Install the requirements for this prorgam.
pip install -r requirements.txtSelect the sorucefiles of your flows in json format.
python3 main.py -1 sourceA.json -2 sourceB.json -h show help
-1 sourcefile of flows in json format
-2 sourcefile of flows in json format
-o targetfile for duplicates flows in json format (if none -> no output data)
-p protocol filter for tcp/udp | udp is standard
-v if you want to plot the results - True/False | true is standard
# example for getting flows of dumpA and dumpB, analysing udp, plotting it and save the duplicates in outfile.json
$ python3 main.py -1 jsonData/b.json -2 jsonData/a.json -p tcp -v True -o outfile.json# returns all portcombinations of tcp flows
tcpPortList = tcpPort.returnPortList()
# generate vector based on defined files in distance.py
def convertToVector(self, netflow):
return (netflow["id"], netflow["Bytes"],netflow["Packets"], netflow["TimeReceived"])
# you can choose between euclidean, manhatten or cosine (works bad) similarity in the main.py
distanceOfFlowsObj = distanceOfFlowsObj.euclidean(flowList)
# for plotting the ids of the flows, change following line in the plotDist.py
self.ax.text(xs[i], ys[i], zs[i], netflows[i]["loc"])
# to
self.ax.text(xs[i], ys[i], zs[i], netflows[i]["loc"]+" [" + str(netflows[i]["id"])+"]")
# for analysing only few flows, you can use following lines in the main.py
# the program will only iterate through 15 port combinations
if breakUpCounter == 15:
break