Automated Essay Scoring 2.0

This repository contains my solution for the Kaggle competition Automated Essay Scoring 2.0. The goal is to develop an automated system that evaluates essays based on their content and quality using advanced machine learning techniques.

Multiple approaches were considered, including:

Fine-tuning DeBERTa, a transformer-based language model.
Ensembling multiple DeBERTa models trained across different folds.
Combining LightGBM & XGBoost with feature engineering, model optimization, and hyperparameter tuning.

The best Quadratic Weighted Kappa (QWK) score was achieved using LightGBM + XGBoost, with more weight assigned to LightGBM’s predictions. The details of each approach and their results are available in the Inference section.

Data Loading & Preprocessing

This phase involves preparing the dataset for further analysis and model training.

Steps:

Loading Data: Essays are loaded using pandas and stored in a structured format.
Text Cleaning: A dataPreprocessing function is applied to:
- Convert text to lowercase.
- Remove HTML tags, URLs, mentions (@user), and numeric values.
- Replace consecutive spaces, commas, and periods with single instances.
- Trim whitespace for a structured output.
Handling Missing Values: Any missing data is handled to maintain data integrity.

Feature Engineering

Feature engineering plays a crucial role in improving model performance. Multiple text-based features were extracted at different levels.

Paragraph-Level Features

Number of paragraphs per essay.
Average paragraph length.
Coherence score between paragraphs.

Sentence-Level Features

Number of sentences per essay.
Average sentence length.
Sentence complexity, calculated using grammatical structure.

Word-Level Features

Vocabulary richness.
Word frequency distribution.
Stop word usage analysis.
Sentiment polarity of the essay.

Spelling & Grammar Features

Spelling errors detected using NLTK’s WordNet Lemmatizer and an English vocabulary set.
Grammar mistakes identified using Python’s LanguageTool.
Count of adjectives, adverbs, and grammatical errors using POS tagging.

TF-IDF & Count Vectorizer

TF-IDF Vectorizer: Assigns weights to words based on frequency & importance.
Count Vectorizer: Captures word frequency in essays.

DeBERTa Predictions as Features

DeBERTa Transformer Model generates predictions for essay scores.
These predictions are fed into LightGBM as additional features.

Feature Selection

To enhance model efficiency, only the most important features are selected:

A 10-fold Stratified CV trains a LightGBM regressor with a custom QWK objective.
Feature importance scores are accumulated across folds.
The top 13,000 most important features are retained.

Model Building & Training

Two ensemble models are used: LightGBM and XGBoost.

Cross-Validation Strategy:

Stratified K-Fold (n_splits=20) ensures class balance across training & validation sets.

Training Process:

LightGBM Regressor:
- Initialized with optimized hyperparameters (learning rate, depth, regularization).
- Trained using quadratic weighted kappa (QWK) loss.
XGBoost Regressor:
- Uses early stopping & QWK-based loss function.
- Pre-tuned learning rate, depth, and colsample parameters.
Model Ensembling:
- Final prediction = 76% LightGBM + 24% XGBoost.
- Predictions are adjusted using a constant a and clipped between 1 and 6.
Performance Metrics:
- Evaluated using F1 Score and Cohen's Kappa.
- Memory optimized using garbage collection.

Inference

Steps:

Data Transformation:
- New essays undergo the same preprocessing & feature engineering pipeline.
Prediction:
- Trained LightGBM + XGBoost model predicts essay scores.
Post-Processing:
- Scores rounded & clipped to valid range.
Output:
- Final predictions are saved for submission.

Results & Performance

Method	Description	Leader Board Score (QWK)	Validation Score (QWK)
₁	_{DeBERTa only}	_0.7507	_0.77816
₂	_{DeBERTa only (5 fold CV)}	_0.7900	_0.8201
₃	_{LightGBM + XGBoost + Feature Engineering (Spelling errors, Word count etc.)}	_0.81434	_0.82712
₄	_{LightGBM + XGBoost + Feature Engineering (DeBERTa predictions, Spelling errors, Word count etc.) + Vectorization (TF-IDF)}	_0.8169	_0.8315
₅	_{LightGBM + XGBoost + Feature Engineering (DeBERTa predictions, Spelling errors, Word count etc.) + Vectorization (TF-IDF)+ Standardscaler}	_0.8175	_0.8318
₆	_{LightGBM + XGBoost + Feature Engineering (DeBERTa predictions, Spelling errors, Word count etc.) + Vectorization (TF-IDF, Count)+ Standardscaler}	_0.8178	_0.8320
₇	_{LightGBM + XGBoost + Feature Engineering (DeBERTa predictions, Spelling errors, Word count, Grammar, Adjectives, Pronouns etc.) + Vectorization (TF-IDF, Count)+ Standardscaler}	_0.8182	_0.83269
₈	_{LightGBM(LR 0.1) + XGBoost(LR 0.05↓) + Feature Engineering (DeBERTa predictions, Spelling errors, Word count, Grammar, Adjectives, Pronouns etc.) + Vectorization (TF-IDF, Count)+ Standardscaler}	_0.8199	_0.8324
₉	_{LightGBM(ngram change) + XGBoost(ngram change) + Feature Engineering (DeBERTa predictions, Spelling errors, Word count, Grammar, Adjectives, Pronouns etc.) + Vectorization (TF-IDF, Count)+ Standardscaler}	_0.8019	_0.8124
₁₀	_{LightGBM + XGBoost + Feature Engineering (DeBERTa predictions, Spelling errors, Word count, Grammar, Adjectives, Pronouns etc.) + Vectorization (TF-IDF, Count) + Standardscaler + CV 10↓}	_0.8165	_0.8122
₁₁	_{LightGBM(LR 0.1, Max Depth 10) + XGBoost(LR 0.05, Max Depth 10) + Feature Engineering (DeBERTa predictions, Spelling errors, Word count, Grammar, Adjectives, Pronouns etc.) + Vectorization (TF-IDF, Count)+ Standardscaler + CV 20 ↑}	_0.8224	_0.8275
₁₂	_{LightGBM(LR 0.1, Max Depth 8) + XGBoost(LR 0.05, Max Depth 8) + Feature Engineering (DeBERTa predictions, Spelling errors, Word count, Grammar, Adjectives, Pronouns etc.) + Vectorization (TF-IDF, Count)+ Standardscaler + CV 20}	_0.8243	_0.8299

Conclusion

This project presents a comprehensive approach to automated essay scoring by combining:

State-of-the-art transformers (DeBERTa)
Tree-based models (LightGBM & XGBoost)
Advanced feature engineering
Custom optimization strategies for QWK metric

By leveraging multiple models, ensembling techniques, and rigorous evaluation, this approach achieves high accuracy & robustness in essay scoring.

Acknowledgements

Special thanks to Learning Agency Lab for providing the dataset and hosting the competition. Additional gratitude to the open-source community for developing tools that enabled this work.

🔗 Competition Link: Kaggle: Automated Essay Scoring 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Automated_essay_scoring_final.ipynb		Automated_essay_scoring_final.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Essay Scoring 2.0

Table of Contents

Data Loading & Preprocessing

Steps:

Feature Engineering

Paragraph-Level Features

Sentence-Level Features

Word-Level Features

Spelling & Grammar Features

TF-IDF & Count Vectorizer

DeBERTa Predictions as Features

Feature Selection

Model Building & Training

Cross-Validation Strategy:

Training Process:

Inference

Steps:

Results & Performance

Conclusion

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

hperer02/Automated-essay-scoring

Folders and files

Latest commit

History

Repository files navigation

Automated Essay Scoring 2.0

Table of Contents

Data Loading & Preprocessing

Steps:

Feature Engineering

Paragraph-Level Features

Sentence-Level Features

Word-Level Features

Spelling & Grammar Features

TF-IDF & Count Vectorizer

DeBERTa Predictions as Features

Feature Selection

Model Building & Training

Cross-Validation Strategy:

Training Process:

Inference

Steps:

Results & Performance

Conclusion

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages