Obligatory: this was created before the public release of ChatGPT. Hence the use of GPT-2.
Executive Summary
Challenge
The Data
Models and Conclusion
Datasets, libraries, and additional tools used
I have spent the majority of my adult life working in sales in some capacity.
I have worked in retail sales, corporate sales, and, most recently, in online sales as a copywriter and digital marketer.
While each aspect of sales has its positives and negatives, a common feature amongst all of them is burnout.
Granted, attrition is a fact of life in all industries, but it's especially worrying in sales. The Harvard Business Review estimates that the annual turnover of U.S. salespeople is as high as 27%, twice the rate in the overall labor force, with the average tenure being less than two years.
Many would like you to believe that this is a recent phenomenon, the truth is it's not. No, really, it isn't.
You can find a whole host of ideas and suggestions online as to how companies can tackle this challenge, but, seemingly without fail, every single one of them misses what in my experience has been the most obvious and best solution to motivating salespeople and preventing burnout: giving them better leads.
Sales, with its constant rejection, is a profession that is bruising on the ego.
As rejections pile up, salespeople become less passionate about their jobs.
If you give them higher quality leads, they will experience fewer rejections.
What's more, better leads mean more money for sales reps (which creates even happier salespeople) and helps a company's bottom line.
Okay, so but how do you give them better leads?
You do that through lead nurturing, the process of engaging prospective customers by providing them with appropriate content and information at each stage of the sales funnel with the end goal of earning their business.
And this is where Conversational AI aka chatbots come into play.
Currently there is a range of lead nurturing strategies companies employ:
- Email marketing/nurturing
- Retargeting
- Personalization
- Face-to-face interaction
- Content marketing
And while these strategies have borne some fruit, the fact of the matter is there is still a lot of potential business left on the table.
Unfortunately, due to the rigid nature and adherence to very strict if-this-then-that rules of most chatbots, the chatbots that companies currently deploy haven't been greeted with open arms by the masses.
(Back to table of contents)
Can I use my knowdlegde of data science to create a chatbot capable of open and dynamic conversation?
(Back to table of contents)
Initially, I had hoped to use the Cornell Movie-Dialogs Corpus as my dataset, but after initial disappointing results I pivoted to a topical chat dataset from Amazon that consists of 8,000+ conversations and 184,000+ messages.
It was created by Arnav Sharma and is a more streamlined version of this original Amazon Alexa dataset.
The dataset contains a conversation id, a message (which is either a reply or the start of a conversation), and the sentiment of each message.
The dataset spans 8 broad topics and is made up of conversations between two people who don’t have defined roles (ex. interviewer and interviewee). It was created with the goal of aiding in the effort to build a socialbot that can have deep, engaging open-domain conversations with humans.
The eight broad topics are as follows:
- fashion
- politics
- books
- sports
- general entertainment
- music
- science and technology
- movies
(Back to table of contents)
Since my goal was open and dynamic conversation, I focused on using attention-based models for the creation of my chatbot.
Due to the time constraints, I limited the dataset I used to the first 50,000 entries and dedicated the majority of my time to fine-tuning GPT-2 models.
My best small GPT-2 model had a final perplexity of 1.1191, while my medium model had a final perplexity of 1.0628.
Though very happy with the scores my models were able to achieve, limiting my dataset to 50,000 entries led to suboptimal performance on certain subjects, simply because they weren't trained on them. Time constraints made it difficult to train them more extensively, but I very much look forward to doing much more in-depth training in the future.
Nonetheless, what I have seen so far excites me for the potential Conversational AI has for lead nurturing, online shopping experiences, interactive brand messaging, and, not to be forgotten, the sanity of salespeople everywhere.
If you are interested in interacting with my chatbots, you can do so here:
Datasets:
Libraries: gensim, glob, logging, matplotlib, nltk, numpy, os, pandas, pickle, pyLDAvis, pytorch, random, re, seaborn, shutil, sklearn, spacy, sys, tensorflow, textblob, time, tqdm.notebook, transformers, typing, and wordcloud.
Additional tools: Google Colab and Hugging Face.