This Project was implemented for academic purpose by Vasilis Kordalis and Aristeidis Oikonomou.
Aim of this project is to manipulate some bulk data. These data are lat-lon-timestamp rows on a csv file which are part of bus trajectories.
We start with some data containing different bushes' trajectories file (data_sets/train_set.csv)
After a quick edit of that file, a new one containing the routes is created (results/First_Group_of_Data/trips.csv)
After that this file is cleaned from corrupted data and a new file is created (results/Clean_Routes/tripsClean.csv)
Afterwards there are a few things that are happening:
Visualization of first 5 JourneyPatternId from tripsClean.csv
Find k-Nearest Neighbors of tripsClean.csv and write the results down to a new file (data_sets/test_set_a1.csv)
Find 5 first-matching routes using LCS method of tripsClean.csv and write the results down to a new file (data_sets/test_set_a2.csv)
After that classification comes. For this purpose some features are extraxted. These are:
A Grid Sequence (the lat-lon max values are extracted and a grid based on these is being "designed" using a specific width and a specific hight for each cell. After that the points of the dataset are being mapped to the cells being "drawn" in the grid)
Start of bus route (the starting grid of the bus route)
End of bus route (the ending grid of the bus route)
Length of the route (the length of the grids of the bus route)
Grid axis route thickness (how thick in horizontal and vertical axis the route points are)