Sales-Lead-Generation-Analysis

Description: The data used here, for Sales Qualified Leads Generated for FY 23-24

Sales Owner: 29 Sales Team member responsible for conversion of lead to customer.

Entity Name: Names of the 846 lead organization.

Current Stage:

1. Customer - Identified **45** potential customers who may be interested in the product/service, but not yet engaged.
2. Proposal - Sending proposal to **64** potential customers.
3. Qualified Lead (QL) - **721** customers/leads qualified by Sales Team as a genuine.
4. Recently Qualified Lead (RQL) - **6** customers/leads recently qualified by Sales Team as a genuine.
5. Unqualified Lead (UQL) - **10** customers/leads not qualified by Sales Team as a genuine, may further require nurturing.

Status

1. Active - 36 customers/leads, 18 proposals, 248 QL, 6 RQL, 4 UQL
2. Free - 1 QL may have not been qualified further.
3. Lost - 9 customer, 38 proposals, 257 QL, 4 UQL have decided not to proceed with the purchase.
4. Nurture - 8 proposals, 215 QL, 2 UQL requires further nurturing to build their interest.

Prospect Creation Date:b> Date of identification of the entities through different channels.

Lead Creation Date: Date on which first positive response from that entities is received.

QL Creation Date: Date on which Sales Team marked them as a genuine customer/lead through defined process.

Lost Date: Date on which the customer/lead did not convert.

Nurture Date: Date on which customers/leads further nutured to build their intereest.

Won Date: Date on which customer/lead convert into QL.

Channel: 272 Emails, 278 LinkedIn, 113 Website, 74 Bidding, 40 Apollo.io, 19 Tender, 13 Paid, 3 Referral, 1 Cold call, 1 Job Board, 1 Nurturing, 31 No Channel Data Available.

Lead Score: Score of lead is calculated through defined formula to understand business potential from the lead.Higher the lead score, better the lead.

Import Libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Loading Data

  import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

View Dataframe

df

Total Rows & Columns

df.shape

Data Information

  df.info()

Data Quality Check

Duplicate Values

  len(df[df.duplicated()])

Missing Values/Null Values

  print(df.isnull().sum())

  #Visualizing the missing values using heatmap
sns.heatmap(df.isnull(), cbar=False)

Incomplete Records:

df = df.dropna(subset=['Prospect Creation Date'])
print(df.isnull().sum())

Variable Information

  #Columns
df.columns

  #Describe
df.describe(datetime_is_numeric=True)

  #Check unique values for each variable
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")

Analysis

Lead Source Analysis

  df['Channel'].value_counts()

  #Plot a bar chart
plt.figure(figsize=(10, 6))
df['Channel'].value_counts().plot(kind='bar', color='skyblue')
plt.title('Distribution of Channels')
plt.xlabel('Channels')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.show()

Comparing each Channels with mean Lead Score

  #Count of unique channels
channel_counts = df['Channel'].value_counts().reset_index()
channel_counts.columns = ['Channel', 'Count']
#Median lead score for each channel
channel_lead = df.groupby('Channel')['Lead Score'].median().reset_index()
channel_lead.columns = ['Channel', 'Median Lead Score']
#Merge the two DataFrames on the 'Channel' column
channel_l = pd.merge(channel_counts, channel_lead, on='Channel')
channel_l

  #Plot a bar graph
plt.figure(figsize=(10, 6))
plt.bar(channel_lead['Channel'], channel_lead['Median Lead Score'], color='skyblue')
plt.xlabel('Channel')
plt.ylabel('Median Lead Score')
plt.title('Median Lead Score for Each Channel')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.grid(axis='y')

  #Count of unique channels
channel_counts = df['Channel'].value_counts().reset_index()
channel_counts.columns = ['Channel', 'Count']
#Median lead score for each channel
channel_lead = df.groupby('Channel')['Lead Score'].median().reset_index()
channel_lead.columns = ['Channel', 'Median Lead Score']
channel_lead = channel_lead.sort_values(by='Median Lead Score')
#Merge the two DataFrames on the 'Channel' column
result_df = pd.merge(channel_counts, channel_lead, on='Channel')
#Display the combined DataFrame
print(result_df)

  #Larger plot size
fig, ax1 = plt.subplots(figsize=(25, 15))

  #Bar plot for counts
color = 'tab:blue'
ax1.set_xlabel('Channel')
ax1.set_ylabel('Count', color=color)
ax1.bar(result_df['Channel'], result_df['Count'], color=color)
ax1.tick_params(axis='y', labelcolor=color)

  #Creating a second y-axis for the line plot
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Median Lead Score', color=color)
ax2.plot(result_df['Channel'], result_df['Median Lead Score'], color=color, marker='o')
ax2.tick_params(axis='y', labelcolor=color)

  #Title
plt.title('Comparison of Counts and Median Lead Score for Each Channel')

  #Rotating x-axis labels for better readability
plt.xticks(rotation=45, ha='right')

  #Display the plot
plt.show()

Number of entities handled by each Sales Owner (Sales Team)

  lead_s = pd.DataFrame(df.groupby('Sales Owner')['Entity Name'].nunique())
lead_s = lead_s.sort_values(by='Entity Name', ascending=False).reset_index()
lead_s

  #Plot the data in a line chart
plt.figure(figsize=(10, 6))
plt.plot(lead_s['Sales Owner'], lead_s['Entity Name'], marker='o', linestyle='-', color='b')
plt.title('Number of Entities per Sales Owner')
plt.xlabel('Sales Owner')
plt.ylabel('Number of Entities')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

Current Stage Distribution

  df['Current Stage'].value_counts()

  #Plot a horizontal bar chart
ax = df['Current Stage'].value_counts().plot(kind='barh', stacked=True, colormap='viridis', figsize=(10, 6))
#Add labels and title
ax.set_xlabel('Percentage')
ax.set_ylabel('Current Stage')
ax.set_title('Current Stage Distribution')
#Display the legend
ax.legend(title='Current Stage', bbox_to_anchor=(1.05, 1), loc='upper left')
#Show the plot
plt.show()

Conversion Rates at different stages

1.Stage = CUSTOMER

  customer_data = df.loc[df['Current Stage'] == 'Customer', ['Current Stage', 'Won Date']]
#Display the result
print(customer_data.count())

  customer_data['No Won Date'] = customer_data['Won Date'].isnull()
  #Count the occurrences of non-null and null values in 'Won Date'
won_date_counts = customer_data['Won Date'].notnull().value_counts()

  #Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(won_date_counts, labels=['Customer', 'Lost'], autopct='%1.1f%%', colors=['lightblue', 'lightcoral'])
plt.title('Distribution of Won Date for Customer Stage=Customer')
plt.show()

2.Stage = Proposal

  customer_data1 = df.loc[df['Current Stage'] == 'Proposal', ['Current Stage', 'Won Date']]
#Display the result
print(customer_data1.count())

  #Count the occurrences of non-null and null values in 'Won Date'
won_date_count = customer_data1['Won Date'].notnull().value_counts()

  #Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(won_date_count, labels=['Lost', 'Customer'], autopct='%1.1f%%', colors=['lightblue','darkblue'])
plt.title('Distribution of Won Date for Customer Stage=Proposal')
plt.show()

3.Stage = QL

  customer_data2 = df.loc[df['Current Stage'] == 'QL', ['Current Stage', 'Won Date']]
#Display the result
print(customer_data2.count())

4.Stage = RQL

  customer_data3 = df.loc[df['Current Stage'] == 'RQL', ['Current Stage', 'Won Date']]
#Display the result
print(customer_data3.count())

5.Stage = UQL

  customer_data4 = df.loc[df['Current Stage'] == 'UQL', ['Current Stage', 'Won Date']]
#Display the result
print(customer_data4.count())

Lead Response Time

We are finding the difference between QL Creation Date and Prospect Creation Date.

  #Select only the relevant columns
df['Date Difference'] = df['QL Creation  Date'] - df['Prospect Creation Date']
df_sorted = df.sort_values(by='Date Difference')
  #Select only the relevant columns
result_df = df_sorted[['Date Difference','Won Date']]
#result_df
  #Select only the relevant columns
result_df = df_sorted[['Date Difference','Won Date']]
result_df.count()

Assuming here that,

QL Creation Date on/after Prospect Creation Date, then its NEW entity into the database.
QL Creation Date before Prospect Creation Date, then its OLD entity from the database.

  #Filter based on Date Difference < 0 and count not null Won Date
negative_difference = df[df['Date Difference'] < pd.Timedelta(0)]
count_not_null_won_date_negative = negative_difference['Won Date'].count()
#Filter based on Date Difference = 0 and count not null Won Date
equal_difference = df[df['Date Difference'] == pd.Timedelta(0)]
count_not_null_won_date_equal = equal_difference['Won Date'].count()
#Filter based on Date Difference > 0 and count not null Won Date
non_negative_difference = df[df['Date Difference'] > pd.Timedelta(0)]
count_not_null_won_date_non_negative = non_negative_difference['Won Date'].count()
#Display the results
print(f"{count_not_null_won_date_negative} number of leads won for OLD Entities/Prospects (where Date Difference < 0)")
print(f"{count_not_null_won_date_equal} number of leads Won for New Entities/Prospects contacted on same day (where Date Difference = 0)")
print(f"{count_not_null_won_date_non_negative} number of leads Won for New Entities/Prospects (where Date Difference > 0)")

  #Circular area plot
categories = ['OLD Entities/Prospects', 'New Entities/Prospects contacted on same day', 'New Entities/Prospects']
counts = [count_not_null_won_date_negative, count_not_null_won_date_equal, count_not_null_won_date_non_negative]

fig, ax = plt.subplots(subplot_kw=dict(aspect="equal"))

wedges, texts, autotexts = ax.pie(counts, autopct='%1.1f%%', textprops=dict(color="w"))

ax.legend(wedges, categories, title="Categories", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=8, weight="bold")

ax.set_title("Count of Leads Won for Prospects/Entites categories")

#Display the plot
plt.show()

For past Nurture entites, number of leads won

  #Count total Won Date when Nurture Date is not null
count_won_date_not_null_nurture = df[df['Nuture Date'].notnull()]['Won Date'].count()
#Display the result
print("Total Leads Won Date when Nurture Date is not null:", count_won_date_not_null_nurture)

Actionable Insights found

Top 3 channels from where the 77.21% of the total entities are coming are LinkedIn, Email, Website.
Top 3 channels for generating Quality Leads are Referral, Cold call, Job Board.
73.3% of leads converted from the entities having stage = Customers.
1.6% of leads convereted from entities having stage = Proposals.
There are negligible changes of converting leads when entities having stage = QL, RQL, UQL.
48.5% leads converting from new entities/prospects.
33.33% leads converting from entities/prospects which are contacted on same day through various channels.
18.20% leads converting from old entities/prospects.
12.12% leads converted from nurturing entities.
Restructing Lead Score formula because, leads converted when lead scroe<1 is 32, else its 1 only.
17 Sales owner had got most entities/propects but rest have few below 10 in numbers.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Project_Anurag_ISINE.ipynb		Project_Anurag_ISINE.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sales-Lead-Generation-Analysis

About

Uh oh!

Releases

Packages

Languages

pynip/Sales-Lead-Generation-Analysis

Folders and files

Latest commit

History

Repository files navigation

Sales-Lead-Generation-Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages