Skip to content

Commit 627d92a

Browse files
authored
Update README.md
1 parent ef398f7 commit 627d92a

File tree

1 file changed

+94
-1
lines changed

1 file changed

+94
-1
lines changed

README.md

Lines changed: 94 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,95 @@
11
# Python-Project
2-
Part of CS241 and CS244 Courses in my Engineering 2nd Year at IIT Guwahati
2+
Part of CS241 and CS244 Courses in my Engineering 2nd Year at IIT Guwahati.
3+
Description of Australian Defence Force Academy-Linux Dataset (ADFA-LD) :
4+
Description of Australian Defence Force Academy-Linux Dataset (ADFA-LD) :
5+
1) The dataset was generated on Linux local server running on Ubuntu 11.04, offering a variety
6+
of functions such as file sharing, database, remote access and web server.
7+
2) Six types of attacks occur in ADFA-LD including two brute force password guessing
8+
attempts on the open ports enabled by FTP and SSH respectively, an unauthorised attempt to
9+
create a new user with root privileges through encoding a malicious payload into a normal
10+
executable, the uploads of Java and Linux executable Meterpreter payloads for the remote
11+
compromise of a target host, and the compromise and privilege escalation using C100
12+
webshell. These types are termed as Hydra-FTP, Hydra-SSH, Adduser, Java-Meterpreter,
13+
Meterpreter and Webshell respectively. You can find these attacks inside the folder
14+
“Attack_Data_Master”
15+
3) 833 and 4373 normal traces are generated for training and validation respectively, over a
16+
period during which no attacks occur against the host and legitimate application activities
17+
ranging from web browsing to document writing are operated as usual. These training and
18+
validation can be found in the “Training_Data_Master” and “Validation_Data_Master”
19+
folders, respectively.
20+
Assignment Task:
21+
1) Split the Attack data of each category (Hydra-FTP, Hydra-SSH, Adduser, Java-Meterpreter,
22+
Meterpreter and Webshell ) into 70% training data and 30 % test data. For instance there are
23+
are 10 folders in “Adduser” attack. Therefore, 7 of these folders are to be used for training
24+
and 3 folders are to be used for testing.
25+
2) For the Normal data, files in “Training_Data_Master” folder are to be used as training data
26+
and files in “Validation_Data_Master” folder are to be used as test data.
27+
3) Write a python script to find the frequency of occurences of all unique 3-grams, 5-grams
28+
and 7-grams system call sequences in the training data for both Attack data (across all
29+
categories of attack) and Normal data. For e.g., consider the following trace file
30+
corresponding to the Adduser attack.
31+
265 168 168 265 168 168 168 265 168 265 168 168 . . .
32+
Your script to list all 3-grams should produce the following output:
33+
265 168 168 -->3
34+
168 168 265 -->2
35+
168 265 168 -->3
36+
168 168 168 -->1
37+
265 168 265 -->1
38+
NOTE: To save time you can concatenate your entire training file for a particular class of
39+
attack and then run your script on the concatenated file instead of running it individually on
40+
each file.
41+
4) Perform the same task on files in the “Training_Data_Master” to obtain all the unique 3-
42+
grams, 5-grams and 7-grams.
43+
5) Once you have obtained the frequencies of all the unique n-grams terms in the training data,
44+
use the top 30% n-grams terms with the highest frequency to create a data set. For instance
45+
consider following results for Adduser data (1st File):
46+
('240', '102', '221') 7
47+
('204', '203', '5') 2
48+
('195', '199', '60') 1
49+
('5', '197', '45') 1
50+
('5', '195', '5') 12
51+
('6', '220', '4') 1
52+
('191', '5', '133') 9
53+
('13', '45', '5') 2
54+
('60', '5', '197') 4
55+
('3', '142', '7') 2
56+
Hydra-FTP data (2nd File):
57+
('3', '142', '7') 11
58+
('219', '311', '240') 4
59+
('240', '13', '240') 1
60+
('33', '168', '146') 2
61+
('6', '168', '102') 3
62+
('5', '197', '45') 1
63+
('5', '195', '5') 2
64+
('3', '91', '5') 8
65+
('42', '120', '197') 1
66+
('174', '54', '5') 2
67+
('6', '63', '6') 18
68+
Normal training data (3rd File):
69+
('195', '10', '41') 1
70+
('3', '142', '7') 3
71+
('91', '240', '196') 2
72+
('5', '195', '5') 2
73+
('3', '102', '7') 17
74+
('3', '195', '195') 14
75+
('4', '78', '240') 1
76+
('33', '195', '192') 2
77+
('5', '197', '45') 15
78+
('199', '45', '192') 1
79+
The top 30 % 3-grams terms with highest frequencies in Adduser, Hydra-FTP and Normal
80+
data are [('5', '195', '5'), ('191', '5', '133'), ('240', '102', '221')], [('6', '63', '6'), ('3', '142', '7'), ('3', '91',
81+
'5')] and [('3', '102', '7'), ('5', '197', '45'), ('3', '195', '195')], respectively. Designate ('5', '195', '5') as
82+
feature 1(F1), ('191', '5', '133') as feature 2 (F2) ..... and ('3', '195', '195') as F9. Then, the generated
83+
dataset should have 9 features and one class label ( Adduser, Hydra-FTP, Normal ) with each feature
84+
corresponding to frequency of occurences of one of these 9 features. For instance for the 1st File,
85+
the generated data should be
86+
Freq of F1, Freq of F2, ...., Freq of F9 ----->12, 9, 7, 0, 2, 0, 0,1,0, Adduser
87+
Freq of F1, Freq of F2, ...., Freq of F9 ----->2, 0, 0, 0, 3, 0, 17,15,14, Normal
88+
This will be the final training data which will be used to train various classifiers.
89+
6) Apply the same procedure to generate the test dataset from the test files of the attack data
90+
(for all attack types) and the normal files in the “Validation_Data_Master” using the top
91+
30% 3-grams terms with highest frequencies obtained during the training phase. The
92+
classifier model developed during the training phase will finally be validated on the Test
93+
dataset.
94+
NOTE : You can refer the paper availabe at http://ieeexplore.ieee.org/stamp/stamp.jsp?
95+
arnumber=6743952 for further reference on ADFA-LD dataset.

0 commit comments

Comments
 (0)