AIS Data Analysis is a data science project focused on the analysis of Automatic Identification System (AIS) telemetry collected in the maritime area surrounding the Port of Ancona, Italy (2020–2023).
The project aims to explore, visualize, classify, and predict vessel behaviors using advanced data analysis techniques, geospatial tools, and machine learning models.
Key goals of the project include:
- Cleaning and structuring AIS data for efficient analysis
- Generating statistical and geospatial visualizations
- Performing vessel trajectory clustering
- Training classification models for vessel type prediction
- Forecasting future positions and navigation features
To install the necessary dependencies, run:
pip install -r requirements.txtCreate the following directories for your dataset:
mkdir -p "dataset/AIS_Dataset"
mkdir -p "dataset/AIS_Dataset_csv"Place the files named ais_stat_data_{year}.csv into the dataset/AIS_Dataset directory.
Run the following script to organize and prepare the CSV files:
python setupCSV.pyRun the following script to organize and prepare the vessels tracks CSV files, you need to select the year
python setupTracks.py --year 2020Initiate a preliminary analysis of the dataset using:
python analyzer_1.pyThe project workflows are organized into several phases and types of analysis:
- setupCSV.py: Imports raw AIS files and generates cleaned CSVs with standardized columns.
- setupTracks.py: Extracts and normalizes track trajectories from AIS records for plotting.
- setupSplitUnder10.py: Filters out tracks with fewer than 10 points.
- setupClassification.py: Builds a structured dataset ready for classification models.
- analyzer_1.py: Calculates descriptive statistics (e.g., average speed, counts) and generates basic plots (histograms, scatter plots).
- analyzer_2.0.1.py: Creates interactive maps and heatmaps based on geographic data.
- Uses contextily, folium, and geopandas to overlay data on real-world maps.
- clustering_1.0.1.py, clustering_1.0.2.py, clustering_1.0.3.py: Applies clustering algorithms (K-Means, DBSCAN, HDBSCAN) to group similar trajectories.
- classification_1.0.1.py, classification_1.0.2.py, classification_1.0.3.py: Implements machine learning models (Random Forest, SVM, XGBoost) to classify vessel types or routes.
- prediction_9.0.1.py: Uses regression models (Linear Regression, LSTM on time sequences) to predict future position and speed.
- analyzer_bearing.py: Calculates bearing angles and trajectory-based features to enrich datasets.
- counter_1.py: Aggregates and analyzes parameters (e.g., vessel count by time period, vessel type distribution, temporal statistics).
This project is licensed under the MIT License.
See the LICENSE file for full details.
- Micol Zazzarini
- Andrea Fiorani
- Antonio Antonini
Developed at Università Politecnica delle Marche, Department of Information Engineering.
