Implementation of the "Text Classification using String Kernels" publication by Lodhi et al. Code was written mainly in Python with some parts moved to Cython for performance gains. The final report can be found here.
This project was carried out as part of the DD2434 "Advanced Machine Learning" course at KTH Royal Institute of Technology.
- F. Franzen (github: flammi)
- B. Godefroy (github: BGodefroyFR)
- W. Kryściński (github: muggin)
- V. Polianskii (github: vlpolyansky)
Files in the data
directory:
train_data
andtest_data
- original Reuters dataset split (Modified Apte) and (Pickled)train_data_clean
andtest_data_clean
- preprocessed and cleaned dataset (Pickled)train_data_small
andtest_data_small
- trimmed dataset prepared for experiments (Pickled)precomp_kernels
- directory with precomputed SSK gram matricesapprox
- directory with precomputed approximated-SSK files
Before using SSK kernel compile Cython code using:
python setup.py build_ext --inplace