Why Python is slow

Spencer Holley
4 min readFeb 20, 2021

Python is a very widely used programming language, especially for people in the Machine Learning community. It is very straightforward, making it beginner friendly and very practical. Python programs also take fewer lines of code to write, therefore making it a less labor intensive option for companies to use. With all of these great things about Python can we jump to the conclusion that it’s the only language we should depend on? Absolutely not! Python may be fast and easy to develop but the language runs about 100 times slower than C++ or Java. For many projects this doesn’t matter and the faster workflow will still make up for the longer runtimes. However, in larger projects, especially in industry settings, this does matter.

Until recently I was merely aware that Python wasn’t the fastest language to code in, but it wasn’t until my Data Science Mock Interview that I actually did a deep dive on this topic. My interviewer asked me to return all the duplicates in a list of numbers, I created a for loop to get rid of numbers that were only in the list once and then added an np.unique() to make the remaining values only show up once. This was my solution…

import numpy as npdef GetDuplicates(lst):
duplicates = [num for num in lst if lst.count(num)>1]
return np.unique(duplicates)

Then my interviewer asked if I could do this in pandas, a Python library used to work with tables of data, instead, I wasn’t able to figure it out. The next morning I did some research and found out that Pandas, among many other Python Libraries use C++ for faster execution. This means that these libraries allow programmers to get the best of both worlds, by leveraging the faster development time with the faster runtime! Here’s the pandas solution to the same problem…

import pandas as pd
import numpy as np
def GetDuplicatesPD(lst):
df = pd.DataFrame(lst)
df.columns = ['nums']
return list(df.loc[df.duplicated() == True]['nums'])

This is all great, but why the heck is Python so slow? Well I’m going to tell you three major reasons.

Python is dynamically typed and complies at runtime. This means that when you run your code, the code must know what type all the variables, functions, and classes are in order to compile. However, it won’t know this until the code is running already, this compiling while running slows things down a lot. By contrast, C++ and Java require the user to specify all the types, making these languages statically typed. These languages require longer scripts and are more difficult to learn. The code will then compile and will only run if all the types are correct. In Summary, a script must be compiled in order to be executed, Python does this for you at the expense of a slow runtime and C++ or Java run faster because they don’t do this for you, giving a faster runtime at the expense of more work on your end.

As you can see, C, C++, and Java are static, while Python is dynamic

In addition, Python has a Global Interpreter Lock, GIL. The GIL allows for only one thread to be used at a time. For context, a processing chip on a computer has a certain number of threads. These threads split up memory within the computer. However, the GIL gets in the way of this and forces a single thread to store all the memory, making it no surprise that Python is slower than other languages like C++ and Java which don’t do this.

Lastly, Python forces single processing. Like the threading issue, Python also limits the computer to using only one core. Cores are similar to threads however they split up tasks instead of memory. When using raw Python, one core must execute the code and consequently runs much slower than languages that use all cores. As a workaround, some libraries such as PySpark and Sklearn, namely the GridSearchCV function (ever set n_jobs in a gridsearch?), use other languages to take advantage of multiprocessing.

This is an example of what multiprocessing looks like

Actionable advice…

With this new information, you may feel overwhelmed, maybe you are still getting comfortable with Python and are wondering if you should also start learning C++ or Java. You have nothing to worry about! Learning another language when you’re still getting familiar with Python, as your first language, isn’t helpful. I instead suggest being aware of the limitations of Python while leveraging all of the amazing tools and libraries that this language and community has to offer to write better code!

Wrapping it all up…

Python is slow because it compiles at runtime, forces all the memory onto one thread, and forces all the tasks onto one core. Although you can do great with Python as your only programming language, you should be aware of its limitations.

--

--