Santander Bank needs to identify which customers will make a specific transaction in the future, irrespective of the amount of money transacted.
The data provided for this competition has the same structure as the real data they have available to solve this problem.
The data is anonimyzed, each row containing 200 numerical values identified just with a number.
In the jupyter notebook file, we will be exploring the data, preparing it for a LightGBM model, training and cross-validating the LightGBM model and predicting the target value for the test set.
Dataset can be obtained directly from the Kaggle Competition:
The score (Area Under Receiver Operating Charachteristic Curve) received on this submission is 0.90035 which places my position in the private leaderboard as 324 out of 8751 total submissions ranking me in TOP 4 %.
Team winning the competition received a score of 0.92573.