Welcome to my solution for the Kaggle Playground Series - Season 5, Episode 6.
In this competition, the goal was to predict the correct fertilizer name given a set of soil and crop characteristics.
I participated in this challenge to enhance my skills in feature engineering, model evaluation, and hyperparameter optimization.
My final submission achieved a MAP@3 score of 0.34636 on the private leaderboard.
The dataset contains various soil properties and environmental conditions.
The task is a multi-class classification problem, where the model must predict the top 3 most likely fertilizer types in ranked order.
The evaluation metric for this competition is Mean Average Precision at k = 3 (MAP@3).
This metric evaluates how well the model ranks the true class within its top 3 predictions.
The earlier the correct class appears in the prediction list, the higher the score.
For example, predicting the correct fertilizer at rank 1 gives full credit, while at rank 2 or 3 gives partial credit.
If the correct class is not in the top 3 predictions, the model gets zero credit for that instance.
It is especially useful in multi-class problems where ranking predictions is more important than selecting just one label.
The dataset consists of various numerical and categorical features related to soil properties and agricultural context. Below is a brief description of each feature and what it represents:
-
Temperature: A numerical variable representing the ambient temperature in degrees Celsius. -
Humidity: The relative humidity percentage in the environment. -
Moisture: Displays the soil moisture content as a numerical percentage. Helps assess how wet or dry the soil is. -
Soil Type: A categorical attribute indicating the general type of soil (e.g., sandy, loamy, clayey). -
Crop Type: A categorical variable that defines the type of crop grown in the field. -
Nitrogen: A numerical value indicating the nitrogen content of the soil. -
Phosphorus: The phosphorus content of the soil, measured as a numerical value. -
Potassium: Indicates the amount of potassium present in the soil. -
Fertiliser Name: This is the target variable. It specifies the recommended fertiliser based on the given conditions. It is a categorical label used during training.
The project primarily consists of two Python files:
main.py: Contains data loading, exploratory data analysis (EDA), feature engineering, modeling, hyperparameter optimization, and prediction generation steps.preprocessing.py: A separate module called withinmain.py, containing data preprocessing steps (feature engineering, encoding, scaling). This structure ensures that consistent transformations are applied to both the training and test datasets.
Throughout this notebook, I followed a structured machine learning pipeline:
- Handling missing values and type conversions
- Label Encoding for categorical variables
- Feature scaling (if required)
- Created additional domain-specific features (e.g.
NEW_...) - Removed less informative features based on feature importance
I trained and compared several ensemble-based models:
- XGBoost
- CatBoost
- LightGBM
The best performing model (XGBoost) was selected based on MAP@3 score on the validation set.
- Used Optuna to tune hyperparameters efficiently
- Applied early stopping to reduce overfitting
- Focused on maximizing MAP@3 rather than traditional accuracy
- Extracted top 3 predictions using
predict_proba()+argsort() - Inverse transformed label encodings to match submission format
- Saved the final model using
joblibfor reuse
To run this project on your local machine, follow these steps:
- Clone the Repository:
git clone https://github.com/BahriDogru/Predicting_Optimal_Fertilizers.git
- After cloning, go to the project directory:
cd Predicting_Optimal_Fertilizers - Create and activate the Conda environment:
All libraries required for the project are listed in the
environment.yamlfile. You can use this file to automatically create and activate the Conda environment.conda env create -f environment.yaml conda activate predicting_fertilizers_env
- Download the Dataset:
Download the
train.csvandtest.csvfiles from the Kaggle competition page Kaggle Playground Series - Season 5, Episode 6. Place these files inside a folder nameddataset/in your project's root directory.. ├── main.py ├── preprocessing.py ├── dataset/ │ ├── train.csv │ └── test.csv └── README.md - Run the Code:
This command will train the model, make predictions, and generate the
python main.py
submission.csvfile.