Skip to content

pynip/Sales-Lead-Generation-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

Sales-Lead-Generation-Analysis

Description: The data used here, for Sales Qualified Leads Generated for FY 23-24

Sales Owner: 29 Sales Team member responsible for conversion of lead to customer.

Entity Name: Names of the 846 lead organization.

Current Stage:

  • 1. Customer - Identified **45** potential customers who may be interested in the product/service, but not yet engaged.
  • 2. Proposal - Sending proposal to **64** potential customers.
  • 3. Qualified Lead (QL) - **721** customers/leads qualified by Sales Team as a genuine.
  • 4. Recently Qualified Lead (RQL) - **6** customers/leads recently qualified by Sales Team as a genuine.
  • 5. Unqualified Lead (UQL) - **10** customers/leads not qualified by Sales Team as a genuine, may further require nurturing.

Status

  • 1. Active - 36 customers/leads, 18 proposals, 248 QL, 6 RQL, 4 UQL
  • 2. Free - 1 QL may have not been qualified further.
  • 3. Lost - 9 customer, 38 proposals, 257 QL, 4 UQL have decided not to proceed with the purchase.
  • 4. Nurture - 8 proposals, 215 QL, 2 UQL requires further nurturing to build their interest.
Prospect Creation Date:b> Date of identification of the entities through different channels.

Lead Creation Date: Date on which first positive response from that entities is received.

QL Creation Date: Date on which Sales Team marked them as a genuine customer/lead through defined process.

Lost Date: Date on which the customer/lead did not convert.

Nurture Date: Date on which customers/leads further nutured to build their intereest.

Won Date: Date on which customer/lead convert into QL.

Channel: 272 Emails, 278 LinkedIn, 113 Website, 74 Bidding, 40 Apollo.io, 19 Tender, 13 Paid, 3 Referral, 1 Cold call, 1 Job Board, 1 Nurturing, 31 No Channel Data Available.

Lead Score: Score of lead is calculated through defined formula to understand business potential from the lead.Higher the lead score, better the lead.

Import Libraries

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 

Loading Data

  import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

View Dataframe

df

Total Rows & Columns

df.shape

Data Information

  df.info()
image

Data Quality Check

Duplicate Values

  len(df[df.duplicated()])
image

Missing Values/Null Values

  print(df.isnull().sum())
image
  #Visualizing the missing values using heatmap
sns.heatmap(df.isnull(), cbar=False)
image

Incomplete Records:

df = df.dropna(subset=['Prospect Creation Date'])
print(df.isnull().sum())
image

Variable Information

  #Columns
df.columns
image
  #Describe
df.describe(datetime_is_numeric=True)
image
  #Check unique values for each variable
for i in df.columns.tolist():
  print("No. of unique values in ",i,"is",df[i].nunique(),".")
image

Analysis

Lead Source Analysis

  df['Channel'].value_counts()
image
  #Plot a bar chart
plt.figure(figsize=(10, 6))
df['Channel'].value_counts().plot(kind='bar', color='skyblue')
plt.title('Distribution of Channels')
plt.xlabel('Channels')
plt.ylabel('Frequency')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.show()
image

Comparing each Channels with mean Lead Score

  #Count of unique channels
channel_counts = df['Channel'].value_counts().reset_index()
channel_counts.columns = ['Channel', 'Count']

#Median lead score for each channel channel_lead = df.groupby('Channel')['Lead Score'].median().reset_index() channel_lead.columns = ['Channel', 'Median Lead Score']

#Merge the two DataFrames on the 'Channel' column channel_l = pd.merge(channel_counts, channel_lead, on='Channel')

channel_l

image

  #Plot a bar graph
plt.figure(figsize=(10, 6))
plt.bar(channel_lead['Channel'], channel_lead['Median Lead Score'], color='skyblue')
plt.xlabel('Channel')
plt.ylabel('Median Lead Score')
plt.title('Median Lead Score for Each Channel')
plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better visibility
plt.grid(axis='y')
image
  #Count of unique channels
channel_counts = df['Channel'].value_counts().reset_index()
channel_counts.columns = ['Channel', 'Count']

#Median lead score for each channel channel_lead = df.groupby('Channel')['Lead Score'].median().reset_index() channel_lead.columns = ['Channel', 'Median Lead Score'] channel_lead = channel_lead.sort_values(by='Median Lead Score')

#Merge the two DataFrames on the 'Channel' column result_df = pd.merge(channel_counts, channel_lead, on='Channel')

#Display the combined DataFrame print(result_df)

image

  #Larger plot size
fig, ax1 = plt.subplots(figsize=(25, 15))

  #Bar plot for counts
color = 'tab:blue'
ax1.set_xlabel('Channel')
ax1.set_ylabel('Count', color=color)
ax1.bar(result_df['Channel'], result_df['Count'], color=color)
ax1.tick_params(axis='y', labelcolor=color)

  #Creating a second y-axis for the line plot
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Median Lead Score', color=color)
ax2.plot(result_df['Channel'], result_df['Median Lead Score'], color=color, marker='o')
ax2.tick_params(axis='y', labelcolor=color)

  #Title
plt.title('Comparison of Counts and Median Lead Score for Each Channel')

  #Rotating x-axis labels for better readability
plt.xticks(rotation=45, ha='right')

  #Display the plot
plt.show()
image

Number of entities handled by each Sales Owner (Sales Team)

  lead_s = pd.DataFrame(df.groupby('Sales Owner')['Entity Name'].nunique())
lead_s = lead_s.sort_values(by='Entity Name', ascending=False).reset_index()
lead_s
image
  #Plot the data in a line chart
plt.figure(figsize=(10, 6))
plt.plot(lead_s['Sales Owner'], lead_s['Entity Name'], marker='o', linestyle='-', color='b')
plt.title('Number of Entities per Sales Owner')
plt.xlabel('Sales Owner')
plt.ylabel('Number of Entities')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
image

Current Stage Distribution

  df['Current Stage'].value_counts()
image
  #Plot a horizontal bar chart
ax = df['Current Stage'].value_counts().plot(kind='barh', stacked=True, colormap='viridis', figsize=(10, 6))

#Add labels and title ax.set_xlabel('Percentage') ax.set_ylabel('Current Stage') ax.set_title('Current Stage Distribution')

#Display the legend ax.legend(title='Current Stage', bbox_to_anchor=(1.05, 1), loc='upper left')

#Show the plot plt.show()

image

Conversion Rates at different stages

1.Stage = CUSTOMER

  customer_data = df.loc[df['Current Stage'] == 'Customer', ['Current Stage', 'Won Date']]

#Display the result print(customer_data.count())

image

  customer_data['No Won Date'] = customer_data['Won Date'].isnull()
  #Count the occurrences of non-null and null values in 'Won Date'
won_date_counts = customer_data['Won Date'].notnull().value_counts()

  #Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(won_date_counts, labels=['Customer', 'Lost'], autopct='%1.1f%%', colors=['lightblue', 'lightcoral'])
plt.title('Distribution of Won Date for Customer Stage=Customer')
plt.show()
image

2.Stage = Proposal

  customer_data1 = df.loc[df['Current Stage'] == 'Proposal', ['Current Stage', 'Won Date']]

#Display the result print(customer_data1.count())

image

  #Count the occurrences of non-null and null values in 'Won Date'
won_date_count = customer_data1['Won Date'].notnull().value_counts()

  #Plot a pie chart
plt.figure(figsize=(8, 8))
plt.pie(won_date_count, labels=['Lost', 'Customer'], autopct='%1.1f%%', colors=['lightblue','darkblue'])
plt.title('Distribution of Won Date for Customer Stage=Proposal')
plt.show()
image

3.Stage = QL

  customer_data2 = df.loc[df['Current Stage'] == 'QL', ['Current Stage', 'Won Date']]

#Display the result print(customer_data2.count())

image

4.Stage = RQL

  customer_data3 = df.loc[df['Current Stage'] == 'RQL', ['Current Stage', 'Won Date']]

#Display the result print(customer_data3.count())

image

5.Stage = UQL

  customer_data4 = df.loc[df['Current Stage'] == 'UQL', ['Current Stage', 'Won Date']]

#Display the result print(customer_data4.count())

image

Lead Response Time

We are finding the difference between QL Creation Date and Prospect Creation Date.

  #Select only the relevant columns
df['Date Difference'] = df['QL Creation  Date'] - df['Prospect Creation Date']
df_sorted = df.sort_values(by='Date Difference')
  #Select only the relevant columns
result_df = df_sorted[['Date Difference','Won Date']]
#result_df
  #Select only the relevant columns
result_df = df_sorted[['Date Difference','Won Date']]
result_df.count()
image

Assuming here that,

  • QL Creation Date on/after Prospect Creation Date, then its NEW entity into the database.
  • QL Creation Date before Prospect Creation Date, then its OLD entity from the database.

  #Filter based on Date Difference < 0 and count not null Won Date
negative_difference = df[df['Date Difference'] < pd.Timedelta(0)]
count_not_null_won_date_negative = negative_difference['Won Date'].count()

#Filter based on Date Difference = 0 and count not null Won Date equal_difference = df[df['Date Difference'] == pd.Timedelta(0)] count_not_null_won_date_equal = equal_difference['Won Date'].count()

#Filter based on Date Difference > 0 and count not null Won Date non_negative_difference = df[df['Date Difference'] > pd.Timedelta(0)] count_not_null_won_date_non_negative = non_negative_difference['Won Date'].count()

#Display the results print(f"{count_not_null_won_date_negative} number of leads won for OLD Entities/Prospects (where Date Difference < 0)") print(f"{count_not_null_won_date_equal} number of leads Won for New Entities/Prospects contacted on same day (where Date Difference = 0)") print(f"{count_not_null_won_date_non_negative} number of leads Won for New Entities/Prospects (where Date Difference > 0)")

image

  #Circular area plot
categories = ['OLD Entities/Prospects', 'New Entities/Prospects contacted on same day', 'New Entities/Prospects']
counts = [count_not_null_won_date_negative, count_not_null_won_date_equal, count_not_null_won_date_non_negative]

fig, ax = plt.subplots(subplot_kw=dict(aspect="equal"))

wedges, texts, autotexts = ax.pie(counts, autopct='%1.1f%%', textprops=dict(color="w"))

ax.legend(wedges, categories, title="Categories", loc="center left", bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=8, weight="bold")

ax.set_title("Count of Leads Won for Prospects/Entites categories")

#Display the plot
plt.show()
image

For past Nurture entites, number of leads won

  #Count total Won Date when Nurture Date is not null
count_won_date_not_null_nurture = df[df['Nuture Date'].notnull()]['Won Date'].count()

#Display the result print("Total Leads Won Date when Nurture Date is not null:", count_won_date_not_null_nurture)

image

Actionable Insights found

  • Top 3 channels from where the 77.21% of the total entities are coming are LinkedIn, Email, Website.
  • Top 3 channels for generating Quality Leads are Referral, Cold call, Job Board.
  • 73.3% of leads converted from the entities having stage = Customers.
  • 1.6% of leads convereted from entities having stage = Proposals.
  • There are negligible changes of converting leads when entities having stage = QL, RQL, UQL.
  • 48.5% leads converting from new entities/prospects.
  • 33.33% leads converting from entities/prospects which are contacted on same day through various channels.
  • 18.20% leads converting from old entities/prospects.
  • 12.12% leads converted from nurturing entities.
  • Restructing Lead Score formula because, leads converted when lead scroe<1 is 32, else its 1 only.
  • 17 Sales owner had got most entities/propects but rest have few below 10 in numbers.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published