MoCo-AIS: a Contrastive Learning Framework for Vessel Trajectory Similarity Computation

Trajectory representation learning for similarity search of AIS data

MoCo-AIS is a contrastive learning framework designed to produce discriminative, embedding-based representations of vessel trajectories for large-scale similarity search. The framework integrates domain-specific trajectory augmentations, dual-stream encoders, and momentum contrastive learning to capture both geometric and semantic structure in AIS data.

Folder Structure

moco-ais/
│
├── model/                    # Encoder, projection head, and model components
│   ├── encoder.py
│   └── moco.py
│
├── utils/                    # Shared utility functions (I/O, spatial tools, logging)
│   ├── utils.py
│   └── tool_funcs.py
│
├── base/                     # Baseline functionalities, including t2vec and trajcl
│   ├── trajcl.py
│   ├── trajcl_utils.py
│   ├── config_trajcl.py
│   ├── t2vec.py
│   ├── t2vec_loss.py
│   └── similarity_metrics.py # functions for distance-based methods: Hausdorff and DTW
│
├── img/                      # Figures for README or paper
├── grid/                     # H3/Grid tokenizer, spatial indexing utilities for baseline TrajCL
│
├── preprocessing.ipynb       # Raw AIS → cleaned trajectories preprocessing
├── preprocessing2.ipynb      # Additional preprocessing utilities / region-specific operations
│
├── config.py                 # Global configuration and hyperparameter definitions
├── data_loader.py            # Dataset loading, padding mask, batching logic
├── train.py                  # Main MoCo-AIS training script
├── test_.py                  # Test script for embeddings and distance matrices
│
├── evaluate.py               # Retrieval evaluation: Recall@K, ranking, metrics
├── compute_hit_rate.py       # Compute hit rate for top-K retrieval experiments
│
├── compute_dist_mat.py       # Precompute embedding distance matrices with DTW and Hausdorff
├── t2vec_pipeline.ipynb      # Baseline models (t2vec) for comparison
├── trajcl_pipeline.py        # Baseline models (trajcl) for comparison
│
├── visualize_embeddings.py   # 2D UMAP/TSNE embedding visualization
├── visualize_loss.py         # Train-validation loss visualization
│
├── requirements.txt          # Required package installations
├── README.md                 # Project documentation
└── .gitignore

Data Availability

Since AIS data used in our experiments involves restricted or proprietary vessel information, we are unable to release the original datasets. To support reproducibility, we provide a publicly available alternative dataset sourced from the U.S. Marine Cadastre AIS archive, preprocessed into a compact SQLite format (download link). This public dataset can be downloaded from the link below and directly used with the notebook preprocessing3.ipynb to reproduce our preprocessing and trajectory generation pipeline.

Running Experiments

Environmental Setup

MoCo-AIS requires Python 3.9–3.12.

python -m venv mocoais_env
source mocoais_env/bin/activate   # on Linux/Mac
mocoais_env\Scripts\activate      # on Windows

pip install --upgrade pip
pip install -r requirements.txt

Data Preprocessing

With the Marine Cadastre data in AISdb sqlite format, first run preprocessing3.ipynb to produce paired .lat and .lon training, validation and test data.

MoCo-AIS

Config File

Once the .lat and .lon files are prepared, edit config.py to make sure the data path are correctly referred, also define the directories for saving embedding distance matrices and model checkpoints :

data="<YOUR DATA DIRECTORY>"
savedir="<DISTANCE MATRIX DIRECTORY>"
checkpoint="<MODEL CHECKPOINT DIRECTORY>"

To select the encoder plugins for MoCo-AIS, modify:

encoder_type="transformer" # transformer, gru, lstm, tcn

Other hyperparameters are also customizable in the config file.

Train and Test

For model training, run:

python train.py

During the training process, losses and time usage for each epoch will be recorded. You might log this information to a file for future reference and analysis.

Upon completion, the best model is saved. Run the test script to output the testing loss and infer embeddings to generate the trajectory distance matrix:

python test_.py

Performance Evaluation

Evaluations can only be done after the trajectory distance matrix is available. The Mean Rank of similarity retrievals can be produced by running:

python evaluate.py

The hitting rate evaluation requires both trajectory distance matrix and a distance matrix computed from the distance-based metrics. Specify which distance-based metric to be compared with in config.py and then run:

python compute_hit_rate.py

Results

Mean rank and rank percentage retrieval performance with MoCo-AIS and distance-based metrics:

Metric	MoCo-AIS encoders				Distance metrics
	Transformer	GRU	LSTM	TCN	Hausdorff	DTW
Mean Rank	3.041	9.414	2.110	6.502	1.293	1.019
Rank Percentage (%)	0.052	0.161	0.036	0.111	0.110	0.087
Time	secs.				hrs.
	1.73	11.03	10.65	1.81	0.3183	5.650

Distance-based Metrics

To compute distance-based metrics, define the metric name in compute_dist_mat.py and then:

python compute_dist_mat.py

Learning Baselines

Two baseline pipeline notebooks: t2vec_pipeline.ipynb and trajcl_pipeline.ipynb are provided in the main directory. After adjusting the data paths as needed, run each notebook to generate trajectory embeddings and their corresponding similarity matrices.

For t2vec, evaluation is carried out directly within t2vec_pipeline.ipynb due to differences in its embedding file format.

For TrajCL, please follow the same Performance Evaluation procedure (see above) as MoCo-AIS by using the steps outlined in the Performance Evaluation section after completing trajcl_pipeline.ipynb.

Citations

(to be presented soon)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MoCo-AIS: a Contrastive Learning Framework for Vessel Trajectory Similarity Computation

Folder Structure

Data Availability

Running Experiments

Environmental Setup

Data Preprocessing

MoCo-AIS

Config File

Train and Test

Performance Evaluation

Results

Distance-based Metrics

Learning Baselines

Citations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
base		base
grid		grid
img		img
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
compute_dist_mat.py		compute_dist_mat.py
compute_hit_rate.py		compute_hit_rate.py
config.py		config.py
data_loader.py		data_loader.py
evaluate.py		evaluate.py
preprocessing.ipynb		preprocessing.ipynb
preprocessing2.ipynb		preprocessing2.ipynb
preprocessing3.ipynb		preprocessing3.ipynb
requirements.txt		requirements.txt
t2vec_pipeline.ipynb		t2vec_pipeline.ipynb
test_.py		test_.py
train.py		train.py
trajcl_pipeline.py		trajcl_pipeline.py
visualize_embeddings.py		visualize_embeddings.py
visualize_loss.ipynb		visualize_loss.ipynb

tsuzzy/MoCo-AIS

Folders and files

Latest commit

History

Repository files navigation

MoCo-AIS: a Contrastive Learning Framework for Vessel Trajectory Similarity Computation

Folder Structure

Data Availability

Running Experiments

Environmental Setup

Data Preprocessing

MoCo-AIS

Config File

Train and Test

Performance Evaluation

Results

Distance-based Metrics

Learning Baselines

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages