Skip to content

Implementation of the "Text Classification using String Kernels" publication by Lodhi et al. (KTH DD2434 Project)

Notifications You must be signed in to change notification settings

muggin/string-kernels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

String Kernels

Implementation of the "Text Classification using String Kernels" publication by Lodhi et al. Code was written mainly in Python with some parts moved to Cython for performance gains. The final report can be found here.

This project was carried out as part of the DD2434 "Advanced Machine Learning" course at KTH Royal Institute of Technology.

Contributors

Data

Files in the data directory:

  • train_data and test_data - original Reuters dataset split (Modified Apte) and (Pickled)
  • train_data_clean and test_data_clean - preprocessed and cleaned dataset (Pickled)
  • train_data_small and test_data_small - trimmed dataset prepared for experiments (Pickled)
  • precomp_kernels - directory with precomputed SSK gram matrices
  • approx - directory with precomputed approximated-SSK files

Setup

Before using SSK kernel compile Cython code using:

python setup.py build_ext --inplace

About

Implementation of the "Text Classification using String Kernels" publication by Lodhi et al. (KTH DD2434 Project)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published