Description: The data used here, for Sales Qualified Leads Generated for FY 23-24
Sales Owner: 29 Sales Team member responsible for conversion of lead to customer.
Entity Name: Names of the 846 lead organization.
Current Stage:
- 1. Customer - Identified **45** potential customers who may be interested in the product/service, but not yet engaged.
- 2. Proposal - Sending proposal to **64** potential customers.
- 3. Qualified Lead (QL) - **721** customers/leads qualified by Sales Team as a genuine.
- 4. Recently Qualified Lead (RQL) - **6** customers/leads recently qualified by Sales Team as a genuine.
- 5. Unqualified Lead (UQL) - **10** customers/leads not qualified by Sales Team as a genuine, may further require nurturing.
Status
- 1. Active - 36 customers/leads, 18 proposals, 248 QL, 6 RQL, 4 UQL
- 2. Free - 1 QL may have not been qualified further.
- 3. Lost - 9 customer, 38 proposals, 257 QL, 4 UQL have decided not to proceed with the purchase.
- 4. Nurture - 8 proposals, 215 QL, 2 UQL requires further nurturing to build their interest.
Lead Creation Date: Date on which first positive response from that entities is received.
QL Creation Date: Date on which Sales Team marked them as a genuine customer/lead through defined process.
Lost Date: Date on which the customer/lead did not convert.
Nurture Date: Date on which customers/leads further nutured to build their intereest.
Won Date: Date on which customer/lead convert into QL.
Channel: 272 Emails, 278 LinkedIn, 113 Website, 74 Bidding, 40 Apollo.io, 19 Tender, 13 Paid, 3 Referral, 1 Cold call, 1 Job Board, 1 Nurturing, 31 No Channel Data Available.
Lead Score: Score of lead is calculated through defined formula to understand business potential from the lead.Higher the lead score, better the lead.
Import Libraries
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
Loading Data
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt
View Dataframe
df
Total Rows & Columns
df.shape
Data Information
df.info()
Data Quality Check
Duplicate Values
len(df[df.duplicated()])
Missing Values/Null Values
print(df.isnull().sum())
#Visualizing the missing values using heatmap sns.heatmap(df.isnull(), cbar=False)
Incomplete Records:
df = df.dropna(subset=['Prospect Creation Date']) print(df.isnull().sum())
Variable Information
#Columns df.columns
#Describe df.describe(datetime_is_numeric=True)
#Check unique values for each variable
for i in df.columns.tolist():
print("No. of unique values in ",i,"is",df[i].nunique(),".")
Analysis
Lead Source Analysis
df['Channel'].value_counts()
#Plot a bar chart
plt.figure(figsize=(10, 6))
df['Channel'].value_counts().plot(kind='bar', color='skyblue')
plt.title('Distribution of Channels')
plt.xlabel('Channels')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for better visibility
plt.show()
Comparing each Channels with mean Lead Score
#Count of unique channels channel_counts = df['Channel'].value_counts().reset_index() channel_counts.columns = ['Channel', 'Count']#Median lead score for each channel channel_lead = df.groupby('Channel')['Lead Score'].median().reset_index() channel_lead.columns = ['Channel', 'Median Lead Score']
#Merge the two DataFrames on the 'Channel' column channel_l = pd.merge(channel_counts, channel_lead, on='Channel')
channel_l
#Plot a bar graph
plt.figure(figsize=(10, 6))
plt.bar(channel_lead['Channel'], channel_lead['Median Lead Score'], color='skyblue')
plt.xlabel('Channel')
plt.ylabel('Median Lead Score')
plt.title('Median Lead Score for Each Channel')
plt.xticks(rotation=45, ha='right') # Rotate x-axis labels for better visibility
plt.grid(axis='y')
#Count of unique channels channel_counts = df['Channel'].value_counts().reset_index() channel_counts.columns = ['Channel', 'Count']#Median lead score for each channel channel_lead = df.groupby('Channel')['Lead Score'].median().reset_index() channel_lead.columns = ['Channel', 'Median Lead Score'] channel_lead = channel_lead.sort_values(by='Median Lead Score')
#Merge the two DataFrames on the 'Channel' column result_df = pd.merge(channel_counts, channel_lead, on='Channel')
#Display the combined DataFrame print(result_df)
#Larger plot size
fig, ax1 = plt.subplots(figsize=(25, 15))
#Bar plot for counts
color = 'tab:blue'
ax1.set_xlabel('Channel')
ax1.set_ylabel('Count', color=color)
ax1.bar(result_df['Channel'], result_df['Count'], color=color)
ax1.tick_params(axis='y', labelcolor=color)
#Creating a second y-axis for the line plot
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Median Lead Score', color=color)
ax2.plot(result_df['Channel'], result_df['Median Lead Score'], color=color, marker='o')
ax2.tick_params(axis='y', labelcolor=color)
#Title
plt.title('Comparison of Counts and Median Lead Score for Each Channel')
#Rotating x-axis labels for better readability
plt.xticks(rotation=45, ha='right')
#Display the plot
plt.show()
Number of entities handled by each Sales Owner (Sales Team)
lead_s = pd.DataFrame(df.groupby('Sales Owner')['Entity Name'].nunique())
lead_s = lead_s.sort_values(by='Entity Name', ascending=False).reset_index()
lead_s
#Plot the data in a line chart
plt.figure(figsize=(10, 6))
plt.plot(lead_s['Sales Owner'], lead_s['Entity Name'], marker='o', linestyle='-', color='b')
plt.title('Number of Entities per Sales Owner')
plt.xlabel('Sales Owner')
plt.ylabel('Number of Entities')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Current Stage Distribution
df['Current Stage'].value_counts()
#Plot a horizontal bar chart ax = df['Current Stage'].value_counts().plot(kind='barh', stacked=True, colormap='viridis', figsize=(10, 6))#Add labels and title ax.set_xlabel('Percentage') ax.set_ylabel('Current Stage') ax.set_title('Current Stage Distribution')
#Display the legend ax.legend(title='Current Stage', bbox_to_anchor=(1.05, 1), loc='upper left')
#Show the plot plt.show()
Conversion Rates at different stages
1.Stage = CUSTOMER
customer_data = df.loc[df['Current Stage'] == 'Customer', ['Current Stage', 'Won Date']]#Display the result print(customer_data.count())
customer_data['No Won Date'] = customer_data['Won Date'].isnull()
#Count the occurrences of non-null and null values in 'Won Date'
won_date_counts = customer_data['Won Date'].notnull().value_counts()
#Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(won_date_counts, labels=['Customer', 'Lost'], autopct='%1.1f%%', colors=['lightblue', 'lightcoral'])
plt.title('Distribution of Won Date for Customer Stage=Customer')
plt.show()
2.Stage = Proposal
customer_data1 = df.loc[df['Current Stage'] == 'Proposal', ['Current Stage', 'Won Date']]#Display the result print(customer_data1.count())
#Count the occurrences of non-null and null values in 'Won Date'
won_date_count = customer_data1['Won Date'].notnull().value_counts()
#Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(won_date_count, labels=['Lost', 'Customer'], autopct='%1.1f%%', colors=['lightblue','darkblue'])
plt.title('Distribution of Won Date for Customer Stage=Proposal')
plt.show()
3.Stage = QL
customer_data2 = df.loc[df['Current Stage'] == 'QL', ['Current Stage', 'Won Date']]#Display the result print(customer_data2.count())
4.Stage = RQL
customer_data3 = df.loc[df['Current Stage'] == 'RQL', ['Current Stage', 'Won Date']]#Display the result print(customer_data3.count())
5.Stage = UQL
customer_data4 = df.loc[df['Current Stage'] == 'UQL', ['Current Stage', 'Won Date']]#Display the result print(customer_data4.count())
Lead Response Time
We are finding the difference between QL Creation Date and Prospect Creation Date.
#Select only the relevant columns df['Date Difference'] = df['QL Creation Date'] - df['Prospect Creation Date'] df_sorted = df.sort_values(by='Date Difference') #Select only the relevant columns result_df = df_sorted[['Date Difference','Won Date']] #result_df #Select only the relevant columns result_df = df_sorted[['Date Difference','Won Date']] result_df.count()
Assuming here that,
- QL Creation Date on/after Prospect Creation Date, then its NEW entity into the database.
- QL Creation Date before Prospect Creation Date, then its OLD entity from the database.
#Filter based on Date Difference < 0 and count not null Won Date negative_difference = df[df['Date Difference'] < pd.Timedelta(0)] count_not_null_won_date_negative = negative_difference['Won Date'].count()#Filter based on Date Difference = 0 and count not null Won Date equal_difference = df[df['Date Difference'] == pd.Timedelta(0)] count_not_null_won_date_equal = equal_difference['Won Date'].count()
#Filter based on Date Difference > 0 and count not null Won Date non_negative_difference = df[df['Date Difference'] > pd.Timedelta(0)] count_not_null_won_date_non_negative = non_negative_difference['Won Date'].count()
#Display the results print(f"{count_not_null_won_date_negative} number of leads won for OLD Entities/Prospects (where Date Difference < 0)") print(f"{count_not_null_won_date_equal} number of leads Won for New Entities/Prospects contacted on same day (where Date Difference = 0)") print(f"{count_not_null_won_date_non_negative} number of leads Won for New Entities/Prospects (where Date Difference > 0)")
#Circular area plot
categories = ['OLD Entities/Prospects', 'New Entities/Prospects contacted on same day', 'New Entities/Prospects']
counts = [count_not_null_won_date_negative, count_not_null_won_date_equal, count_not_null_won_date_non_negative]
fig, ax = plt.subplots(subplot_kw=dict(aspect="equal"))
wedges, texts, autotexts = ax.pie(counts, autopct='%1.1f%%', textprops=dict(color="w"))
ax.legend(wedges, categories, title="Categories", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=8, weight="bold")
ax.set_title("Count of Leads Won for Prospects/Entites categories")
#Display the plot
plt.show()
For past Nurture entites, number of leads won
#Count total Won Date when Nurture Date is not null count_won_date_not_null_nurture = df[df['Nuture Date'].notnull()]['Won Date'].count()#Display the result print("Total Leads Won Date when Nurture Date is not null:", count_won_date_not_null_nurture)
Actionable Insights found
- Top 3 channels from where the 77.21% of the total entities are coming are LinkedIn, Email, Website.
- Top 3 channels for generating Quality Leads are Referral, Cold call, Job Board.
- 73.3% of leads converted from the entities having stage = Customers.
- 1.6% of leads convereted from entities having stage = Proposals.
- There are negligible changes of converting leads when entities having stage = QL, RQL, UQL.
- 48.5% leads converting from new entities/prospects.
- 33.33% leads converting from entities/prospects which are contacted on same day through various channels.
- 18.20% leads converting from old entities/prospects.
- 12.12% leads converted from nurturing entities.
- Restructing Lead Score formula because, leads converted when lead scroe<1 is 32, else its 1 only.
- 17 Sales owner had got most entities/propects but rest have few below 10 in numbers.