For my project in the "Introduction to Data Science" class, I had to replicate and enhance the results from the paper "Text Classification Using String Kernels." This project was divided into three main phases.
Phase 1: Data Analysis In the first phase, I analyzed the data used in the original study. This involved preprocessing the text data, exploring its characteristics, understanding the distribution of classes and more. That gave me a solid foundation for the subsequent replication and improvement steps.
Phase 2: Replication of the results from the paper Next, I focused on replicating the results presented in the paper. Using the methods described in the paper, I implemented the string kernel techniques for text classification. This involved careful tuning of parameters and ensuring that the experimental setup closely matched the one in the paper. My goal was to achieve comparable results to those reported by the authors, validating the effectiveness of their approach.
Phase 3: Trying to improve on the results from the paper In the final phase, I aimed to improve the original results. This involved experimenting with different variations of string kernels, different parameters and incorporating additional preprocessing steps.