Sales Scientist: analyzing a potential customer base

5 min readSep 29, 2021

Sales Scientist is a project I’ve been working on, keep reading to learn more!

In the past 3–4 months I’ve been working on a lead scoring tool, however given the limited dataset it is more of a customer vs. non-customer classifier. This tool is aimed at helping the sales team at an ad tracking company called Hyros. They are looking to expand their outbound marketing and sales. They’ve built a great product and are solving their customers’ problems. At this point their goal is to increase market share and continue scaling, a lead scoring tool could help here because it would allow the sales team to evaluate their leads much faster and operate more efficiently, ultimately speeding up the sales cycle. This is by far the most effort I’ve put into understanding the business context of a project.

I collected data from 4 different sources. I conducted a survey to get information regarding the domain, business model, revenue, and ad spend of each business. Once I Got that data I connected to the following sources;

Twitter to run sentiment analysis on each business.
Similar Web to collect the web traffic analytics of each business (bounce rate and traffic breakdown).
Fullcontact to find the website’s rank; rank is based on the number of visits the page gets.

Overall, there were two kinds of features in the dataset. We had qualification features (are they even qualified for the product?); adspend and business model. Hyros is only meant to be used for advertisers spending upwards of $20k a month on ads, it is also made for Info (online courses & coaching) and Ecommerce brands. That being said, a lot of Hyros’s customers are actually small marketing agencies, however they run ads for the bigger Info and Ecom brands and use the software on behalf of their clients. For this reason Hyros recently started an affiliate program where agencies sell the product to their clients, great move :)

There were also quality features which were basically everything else. Some really important features included CAC (cost of acquisition), ROAS (return on ad spend), and LPC (landing page complexity), and twitter sentiment. Revenue wasn’t a very big factor because of the small agencies that I mentioned earlier. A massive limitation that I must point out is that I was only able to collect 59 datapoints, and struggled to create anything too valuable for that reason. Just to clarify LPC is a score based on the number of words, links, and trigger words (words that trigger a call to action). If this is high then it indicates that the landing page is bogging down the end user, making them feel overwhelmed and unlikely to convert.

Again, with only 59 data points any findings or conclusions I make are not necessarily true. I also had to remove Clickfunnels, a 9 figure software company, because it was a total outlier and had the highest ROAS by far. Based on my analysis I was able to conclude the following…

Having over 700 words in the landing page led to lower ROAS. Maybe we can provide value upfront by helping leads clean up their landing pages?
Businesses with negative twitter sentiment from the public had lower ROAS. Maybe we should avoid leads with negative sentiment
Having a website ranked in the 4000 most visited led to much higher revenue.

a scatterplot of twitter sentiment and return on adspend of all the businesses

For more on the analysis have a look at this video!

The Dashboard of customers vs. non customers

With such little data I was reluctant to try Machine Learning! My first ‘model’ was a simple heuristic that was based on Bayesian Inference. The user could input a revenue range, business model, and domain (ie. health ecommerce brands doing $225k — $275k a month). The program would find all the businesses that fit the criteria and output the percentage of those businesses that were customers. This model only got 77% accuracy and was limited to predicting the probability of the lead closing, but in reality we care about the quality of the lead as well.

Ultimately I got the best results when I combined machine learning with a simple heuristic. I trained a Decision Tree on the data and used it’s output as a probability score. Next, I combined CAC, adspend, and twitter sentiment to create the opportunity score. I would combine the opportunity and the probability scores for an overall score. I combined these features because they are key indicators of how much Hyros can help the business.

When CAC and ad spend is high it indicates that Hyros can save them a lot on ad spend; spend more save more! We also take twitter sentiment into account because it is our closest indicator of how the public perceives the lead’s product / service, if the perception is negative then even the best advertising strategy is unlikely to be effective.

For more on the model and an in depth explanation check out this video!

I decided to take this further by building and deploying an app with Heroku. Although I’m able to get it to work, the app currently is only able to take in csv files with a raw github url. The problem with that is that sales reps are not on github and will therefore need a more simple way of working with their data. Feel free to check out the demo!

Thank you so much for reading! If you’d like a more in depth explanation on any aspect of the project feel free to shoot me a message on Facebook or Linkedin. I know I didn’t go much into the technical stuff so if you have any questions about that let me know too.

Until next time, Spencer

Sales Scientist: analyzing a potential customer base

Written by Spencer Holley