This project explores data from Prosper which is America's first marketplace lending platform. Prosper has funded over 12 billion dollars in loans. On porsper, borrowers list their loan requests between 2,000 and 40,000 dollars and individual investors can invest in as little as 25 dollars in their listing of choice.
The dataset was provided by Udacity as part of the Data Analyst Nanodegree Program certification in January 2021. The dataset can be found here with feature documentation available here
This exploration will contain statistics with visualizations to build understanding of Prosper dataset. The dataset consists of 81 variables and 113,937 observations. Visualizations will include univariate, bivariate and multivariate visualizations of several variables in the dataset, allowing the reader to gain understanding of variable distributions as well as their relationships.
Prosper provides a reliable platform for investors to lend and borrow money. The loans provided through Prosper show extremely low historical rates for the borrower with negative service fees for the majority of loans. More than 99 percent of the loan listings are fully funded. The default rates of the loans are less than 5 percent.
For the presentation, I focus on the influence of Original Loan Amount, Income Range, EmploymentStatus, Credit Score and Loan Term on the Borrower APR. I start by introducing these variables and their distributions to the pairwise relationships of the variables in bivariate plots, followed by introduction of multivariate relationships among the variables of intereset by use of multivariate plots. Each plot is followed by detailed analysis of the findings and the next step(s).
The plots used for this explotration include histograms, heatmaps, violin plots, scatter plots and several plot matrices.
- Python, Pandas, Numpy, Matplotlib, Seaborn
- Jupyter Notebook