Skip to content

billkord/Python-Data-Mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

This Project was implemented for academic purpose by Vasilis Kordalis and Aristeidis Oikonomou.

Python-Data-Mining

Aim of this project is to manipulate some bulk data. These data are lat-lon-timestamp rows on a csv file which are part of bus trajectories.

Description

We start with some data containing different bushes' trajectories file (data_sets/train_set.csv)

After a quick edit of that file, a new one containing the routes is created (results/First_Group_of_Data/trips.csv)

After that this file is cleaned from corrupted data and a new file is created (results/Clean_Routes/tripsClean.csv)

Afterwards there are a few things that are happening:

  • Visualization of first 5 JourneyPatternId from tripsClean.csv
  • Find k-Nearest Neighbors of tripsClean.csv and write the results down to a new file (data_sets/test_set_a1.csv)
  • Find 5 first-matching routes using LCS method of tripsClean.csv and write the results down to a new file (data_sets/test_set_a2.csv)

  • After that classification comes. For this purpose some features are extraxted. These are:

  • A Grid Sequence (the lat-lon max values are extracted and a grid based on these is being "designed" using a specific width and a specific hight for each cell. After that the points of the dataset are being mapped to the cells being "drawn" in the grid)
  • Start of bus route (the starting grid of the bus route)
  • End of bus route (the ending grid of the bus route)
  • Length of the route (the length of the grids of the bus route)
  • Grid axis route thickness (how thick in horizontal and vertical axis the route points are)
  • About

    No description, website, or topics provided.

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages