These are just observations I’ve extracted from 8 conversations with various Data Proffessionals including 4 Data Scientists, 1 Data Analyst, 2 Research Scientists, and 1 Machine Learning Engineer.

Photo by Faizur Rehman
  1. ML/AI is mostly used at very large companies and AI Startups, but rarely used or beneficial at small companies unless the AI is in the product/service. This doesn’t necessarily mean that ML does not have its place at midsize companies, however Machine learning works better with very large datasets that many non-tech companies don’t have access to.
  2. Data Engineer is a more realistic goal than Data Scientist for a lot of people…

So you want to be a Data Scientist!

So you want to be a Data Scientist! Have you been told it’s the career of the future? Are you an artificial intelligence (AI) enthusiast looking to make your passion your profession? Are you coming from the business or engineering side looking for a hybrid role such as Data Scientist? Are you interested in Data Science but don’t know where to start? If so then keep reading!

Photo by author

When it comes to Data Science there are many misconceptions out there. On one side, you have the media hyping up Data Science and showcasing…

Machine Learning has been the craze for the past few years but many companies are starting to question it’s value as they aren’t getting the ROI they were expecting. In this blog I cover three situations where Machine Learning and Deep Learning might actually be useful in your organization. Please do keep in mind that I’m a bit skeptic of Machine Learning (ML) at the moment and may be somewhat biased against it. Let’s dive in!

photo by Kevin Ku
  1. Your organization is very large

If your organization has thousands of employees then it is a good indicator that you have enough internal facing…

Data Scientist is apparently the sexiest job title of the 21st century, according to the Harvard Business Review anyway, but what even is a Data Scientist? Many people think Data Scientists work on AI and build models that can kick our butts in go, but the vast minority of Data Scientists are actually working on those types of problems. …

Photo by Sangga Rima Roman Selia
  1. EDA and Statistics is taught for a reason

When I first started the Bootcamp I was surprised to see two whole months of Data Visualization & Exploration (EDA) and Statistics. Because I had the misconception that Data Science is all about Machine Learning. Although this phase was more fun than I expected, I never really understood why it was so important. As soon as we shifted into Machine Learning I got lazy and returned to my newbie strategy of throwing data at a model. I was able to get away with this without suffering too bad, mostly because my main…

photo by

Yelp is a massive directory for businesses worldwide. You can find whatever you need; a plumber to fix your pipes, a realtor to sell your house, or some sushi to satisfy your sushi craving, in any location! Before the days of the internet you had to flip through your phone book to find what you were looking for. Us Data Savvy people can use yelp for a number of things, we can extract data about businesses and aggregate with data elsewhere (ie. twitter or google analytics) or preform sentiment analysis on the reviews to name a few. While we can…

A User’s Guide

photo by Zdeněk Macháček on Unsplash

About 2 months ago I released a Python Package called Potosnail, It started out as a collection of helper functions I built for my Wikipedia Capstone project However I’ve since added higher level functions that, can automate the bulk of the modeling process. There is no substitute for domain knowledge and intuition in Machine Learning but Potosnail can take a massive leap in this process. What Potosnail does is take emphasis off of the modeling step in the Data Science Pipeline, allowing data scientists to shift their focus more towards the data. …

For my capstone project with the Flatiron Data Science Bootcamp I decided to build a web app that can predict the probability of an article being generated by an AI, namely GPT2.

Link to project/demo:

The results of the best model

I scraped 1,000 articles from Wikipedia and retrained GPT2 on them, Then I generated 180 fake articles and put them in a dataset with 180 real articles randomly selected from the 1,000. I then built a deep neural network, a bidirectional GRU, to classify the real articles from the fakes. After an 8 hour gridsearch with 32 different parameter combinations I found a model…

Python is a very widely used programming language, especially for people in the Machine Learning community. It is very straightforward, making it beginner friendly and very practical. Python programs also take fewer lines of code to write, therefore making it a less labor intensive option for companies to use. With all of these great things about Python can we jump to the conclusion that it’s the only language we should depend on? Absolutely not! Python may be fast and easy to develop but the language runs about 100 times slower than C++ or Java. For many projects this doesn’t matter…

In this article I will take you through the end to end process I used to build a web app that classifies an article as being AI generated or written by a real person. Click here to see my repo.


Wikipedia is an amazing place full of free information maintained by a community of volunteer editors. It has made giving and receiving knowledge very easy. Unfortunately, with this great availability of knowledge comes the potential spread of misinformation. With the rise of complex transformer models such as GPT2, AI can generate persuasive content that is practically identical to human…

Spencer Holley

This is where I share my journey as a Machine Learning Practitioner and a Programmer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store