Fertilizer Prediction - Kaggle Playground Series S5E6

Welcome to my solution for the Kaggle Playground Series - Season 5, Episode 6.
In this competition, the goal was to predict the correct fertilizer name given a set of soil and crop characteristics.

I participated in this challenge to enhance my skills in feature engineering, model evaluation, and hyperparameter optimization.
My final submission achieved a MAP@3 score of 0.34636 on the private leaderboard.

🔍 Problem Overview

The dataset contains various soil properties and environmental conditions.
The task is a multi-class classification problem, where the model must predict the top 3 most likely fertilizer types in ranked order.

🧮 Evaluation Metric: MAP@3

The evaluation metric for this competition is Mean Average Precision at k = 3 (MAP@3).

This metric evaluates how well the model ranks the true class within its top 3 predictions.
The earlier the correct class appears in the prediction list, the higher the score.
For example, predicting the correct fertilizer at rank 1 gives full credit, while at rank 2 or 3 gives partial credit.
If the correct class is not in the top 3 predictions, the model gets zero credit for that instance.

It is especially useful in multi-class problems where ranking predictions is more important than selecting just one label.

Dataset

The dataset consists of various numerical and categorical features related to soil properties and agricultural context. Below is a brief description of each feature and what it represents:

Temperature: A numerical variable representing the ambient temperature in degrees Celsius.
Humidity: The relative humidity percentage in the environment.
Moisture: Displays the soil moisture content as a numerical percentage. Helps assess how wet or dry the soil is.
Soil Type: A categorical attribute indicating the general type of soil (e.g., sandy, loamy, clayey).
Crop Type: A categorical variable that defines the type of crop grown in the field.
Nitrogen: A numerical value indicating the nitrogen content of the soil.
Phosphorus: The phosphorus content of the soil, measured as a numerical value.
Potassium: Indicates the amount of potassium present in the soil.
Fertiliser Name: This is the target variable. It specifies the recommended fertiliser based on the given conditions. It is a categorical label used during training.

⛓️ Project Structure

The project primarily consists of two Python files:

main.py: Contains data loading, exploratory data analysis (EDA), feature engineering, modeling, hyperparameter optimization, and prediction generation steps.
preprocessing.py: A separate module called within main.py, containing data preprocessing steps (feature engineering, encoding, scaling). This structure ensures that consistent transformations are applied to both the training and test datasets.

🛠️ Methods & Workflow

Throughout this notebook, I followed a structured machine learning pipeline:

1. Data Preprocessing

Handling missing values and type conversions
Label Encoding for categorical variables
Feature scaling (if required)

2. Feature Engineering

Created additional domain-specific features (e.g. NEW_...)
Removed less informative features based on feature importance

3. Model Training & Evaluation

I trained and compared several ensemble-based models:

XGBoost
CatBoost
LightGBM

The best performing model (XGBoost) was selected based on MAP@3 score on the validation set.

4. Hyperparameter Optimization

Used Optuna to tune hyperparameters efficiently
Applied early stopping to reduce overfitting
Focused on maximizing MAP@3 rather than traditional accuracy

5. Submission Strategy

Extracted top 3 predictions using predict_proba() + argsort()
Inverse transformed label encodings to match submission format
Saved the final model using joblib for reuse

Setup and Running

To run this project on your local machine, follow these steps:

Clone the Repository:

git clone https://github.com/BahriDogru/Predicting_Optimal_Fertilizers.git

After cloning, go to the project directory:
```
cd Predicting_Optimal_Fertilizers
```
Create and activate the Conda environment: All libraries required for the project are listed in the environment.yaml file. You can use this file to automatically create and activate the Conda environment.
```
conda env create -f environment.yaml
conda activate predicting_fertilizers_env
```
Download the Dataset: Download the train.csv and test.csv files from the Kaggle competition page Kaggle Playground Series - Season 5, Episode 6. Place these files inside a folder named dataset/ in your project's root directory.
```
.
├── main.py
├── preprocessing.py
├── dataset/
│   ├── train.csv
│   └── test.csv
└── README.md
```
Run the Code:
```
python main.py
```
This command will train the model, make predictions, and generate the submission.csv file.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
catboost_info		catboost_info
datasets		datasets
.gitignore		.gitignore
README.md		README.md
enviroment.yaml		enviroment.yaml
main.py		main.py
preprocessing.py		preprocessing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fertilizer Prediction - Kaggle Playground Series S5E6

🔍 Problem Overview

🧮 Evaluation Metric: MAP@3

Dataset

⛓️ Project Structure

🛠️ Methods & Workflow

1. Data Preprocessing

2. Feature Engineering

3. Model Training & Evaluation

4. Hyperparameter Optimization

5. Submission Strategy

Setup and Running

You can access the Kaggle Notebook I prepared for this competition via this link

About

Uh oh!

Releases

Packages

Languages

BahriDogru/Predicting-Optimal-Fertilizers

Folders and files

Latest commit

History

Repository files navigation

Fertilizer Prediction - Kaggle Playground Series S5E6

🔍 Problem Overview

🧮 Evaluation Metric: MAP@3

Dataset

⛓️ Project Structure

🛠️ Methods & Workflow

1. Data Preprocessing

2. Feature Engineering

3. Model Training & Evaluation

4. Hyperparameter Optimization

5. Submission Strategy

Setup and Running

You can access the Kaggle Notebook I prepared for this competition via this link

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages