The code, data, and tools for our paper: "From a Global Perspective: Highly Applicable and Robust IPv4-IPv6 Address Association Method via Port Fingerprint"🤓
Before using this method, you need to install the following packages in advance:
numpy 1.26.4+
scikit-learn 1.5.1+
tqdm 4.66.5+
Pytorch 2.6.0
(1). We have fine-tuned DStack-Tokenizer and DStack-BERT (in the "model" folder), which are perfect for IPv4-IPv6 address association tasks. If you want to use our tools directly, please skip to step 3 directly ✅.
If you want to fine-tune these two tools again, please follow steps 1-3 in order~
(2). Module 1:"Port Detection and Service Information Acquisition" can be directly implemented based on Censys API, and the relevant code will not be shown in detail. Here we only introduce the reproduction methods of the three core modules (DStack-Tokenzier creation, DStack-BERT fine-tuning, port fingerprint construction and similarity calculation).
We build service information corpus based on training set data
Please replace the input data with your corpus and run the following code:
cd code
python DStack-Tokenizer.py
This part includes two steps: step2-1: constructing training sample pairs based on data augmentation and step2-2: fine-tuning the BERT model based on Simple Sentence Contrast Learning (SimSCL).
We're constructing training sample pairs through data augmentation based on service information corpus. Please replace our corpus ("train_info_month4.txt") with your corpus and run the fellowing code:
cd code
python data_augment.py
Please replace the tokenizer and training corpus with your own (hyperparameters can also be modified as needed), and then run the following code:
cd code
python SimSCL.py
This part includes two steps: port fingerprint construction and similarity calculation. You can easily run it with one click through the following code! 🚀
cd code
python SimFig.py
Important
Due to our data usage agreement with Censys, we are unable to provide our detection results for each target IP, such as port open status and service information. We recommend that users first apply for data access permissions from Censys or other institutions, and then use our method after obtaining relevant data.