So you want to be a Data Scientist

29 min readJun 7, 2021

So you want to be a Data Scientist!

So you want to be a Data Scientist! Have you been told it’s the career of the future? Are you an artificial intelligence (AI) enthusiast looking to make your passion your profession? Are you coming from the business or engineering side looking for a hybrid role such as Data Scientist? Are you interested in Data Science but don’t know where to start? If so then keep reading!

When it comes to Data Science there are many misconceptions out there. On one side, you have the media hyping up Data Science and showcasing the most advanced problems that only AI researchers work on. On the other side, Data Science itself is kind of an umbrella term in that it’s very vague. Saying “I want to be a Data Scientist” is like saying “I want to be an entrepreneur,” you’ve still got a lot of figuring out to do, but I hope this blog will help. The Data Science and AI hype creates false hopes in aspiring Data Scientists. The hype leads to a massive influx of people expecting six-figure salaries just because they can build Tensorflow models!

First, I am going to lay out some general misconceptions that some people, myself included, have about Data Science, and mention some bad reasons to get into this field.

Common beginner misconceptions

Data Science is all about building models.

Reality: Building models is only a small aspect of the job, you will spend far more time collecting and cleaning data. According to a Kaggle survey, building models accounts for only 20.6% of the job.

2. There is massive demand for Data Science.

Reality: Yes, tech jobs are increasing, but most organizations will have about ten Software Engineers for every Data Scientist. You might also be hired due to hype instead of necessity because companies sometimes think they need a Data Scientist when they don’t.

3. The main objective of Data Science is to optimize metrics like accuracy, precision, and f1 score.

Reality: The objective is to understand the needs of your organization and use your skills to deliver value, often generating monetary results.

4. You don’t need people skills.

Reality: Most Data Scientists must make presentations in an articulate and coherent manner. They must be understood by people who don’t understand Data Science.

Bad reasons for getting into Data Science

You want to make a lot of money.

Alternate path: Find a stronger why.

2. You want to focus on the technical side, such as coding and engineering.

Alternate path: Seek an engineering role; Software Engineering, Data Engineering, or Machine Learning Engineering, or even Web Development.

3. You want to work on state of the art AI models and love the math behind them.

Alternate path: Machine Learning Research

What kind of Data Scientist should you be?

If I wasn’t able to scare you away yet, congratulations! If this is the path for you, you still need to gain clarity what kind of role you should get into. I’ve broken down five types of Data Scientists as well as two very similar underrated roles. That being said, keep in mind that there is some gray area between these roles, and many jobs don’t correspond to these roles perfectly.

Data Analyst

Data Analysts are able to communicate with executives, managers, other departments, and any external stakeholders, all of whom are unlikely to have a technical background. So Data Analysts must be able to identify what they want to know about their data by speaking in business terms, not technical terms. They can then go into the database, created by the Data Engineering team, to query the data needed to answer the questions and then present their findings in a way their audience understands. These Data Scientists rarely, if ever, use Machine Learning models and are more focused on hypothesis testing and statistics.

Data Analysts often work for very large organizations. Within these organizations you have fast movers, such as big tech companies like Facebook, Google, and Netflix. Then you have slow movers like healthcare, finance, and entertainment companies, including Sony. The key difference here is that the fast movers use Data Analysts for low-hanging fruit, doing relatively small and low-risk projects that executives have high confidence in. These may include analyzing user engagement or running A/B tests on different front-end designs for the platform. At slow movers, the Data Analysts are more integral to the Data Culture. They work on similar projects but the difference is that these companies move slower and are not ready to invest in higher-level projects like the chatbots and recommenders that big tech is rolling out. Therefore, their work is less viewed as ‘low-hanging fruit’.

Skills : Querying Language (i.e. SQL), BI Tools/ Data Visualization(i.e. Tableau), Programming Language (i.e. Python), Statistics, A/B testing, presentation & communication, Business Acumen. You will also need to know Data Structures and Algorithms very well if you plan on working at a FAANG-M (Facebook, Apple, Amazon, Netflix, Google, Microsoft)

Note: all of these jobs require good communication skills in addition to whatever I have mentioned

Is it for you?

You have a nontechnical background. Maybe Economics, Finance, or Marketing?
You’re not crazy about Machine Learning.
You dream of working at a FAANG.
You love making colorful graphs, charts, and other visualizations.

Learning resources: Alex Freeberg’s Youtube Channel, Khan Adacemy’s intro to SQL, Tableau Fundamentals, Udacity’s intro to Python, People Skills for Analytical Thinkers

2. Traditional Data Scientist a/k/a Full Stack

This is what I think of when I think of a Data Scientist! Although these Full-Stack Data Scientists do much of the work of Data Analysts, they work for small- to mid-sized companies, most often in software companies from 100 to maybe a few thousand employees. They are different from Data Analysts because they often use Machine Learning to make predictions, classifications, and recommendations or to find anomalies. They also tend to work in a more siloed environment or in a small team. Many companies lack the internal data to complete projects they need and will often have to collect data from external Data Vendors, although they will likely have Data Engineers doing this. When it comes to education the Classic Data Scientist has years of experience. Almost all have a Master’s degree and many have a Ph.D (in STEM areas) although some bootcamps can serve as an effective alternative to graduate school. This is a contrast from the Data Analyst who can typically get hired with a somewhat relevant bachelor’s degree coupled with self education in Data Analytics. Another common path to this role is to work as a Software Engineer or Business Consultant of sorts and study up on Data Science in your free time.

In sum, these Data Scientists will query data and perform analysis just like the Data Analyst. The difference is that they build models to make predictions, classifications, and recommendations, or to flag anomalies. I will also point out that Machine Learning isn’t always the answer — there will be times when simple statistical models or even heuristics are the best solution.

Sample Business Case : Tala is a fintech company that helps people in developing countries with no financial backgrounds get approved for loans, usually for business or education. Their Data Engineers seek out alternative data such as internet use and social media data of their customers, and then the Data Scientist queries the data, performs analysis, builds models, might even do some NLP, and ideally finds a way to predict if a customer could get a loan without using financial data.

Skills : Querying Language (i.e. SQL), BI Tools/ Data Visualization(i.e. Tableau), Programming Language (i.e. Python), Statistics, presentation & communication, Business Acumen, Machine Learning and a framework (i.e. SciKit Learn), Deep Learning and a framework (i.e. Tensorflow), Machine Learning Intuition, Cloud Deployment (i.e. AWS), presentation skills, stakeholder engagement

Is it for you?

You have accidentally come across Data Science techniques in your work or study and really enjoy it.
You have a scientific background.
You’re experienced in a highly technical role, love machine learning, and want to be more on the business side.
You’re experienced in a business role, love machine learning, and want to be more on the technical side.

Learning resources: Krish Naik’s Youtube Channel, The High ROI Data Scientist, Udacity’s intro to Python, People Skills for Analytical Thinkers

3. Scrappy Data Scientist

This is quite similar to the Classic Data Scientist. The difference is that the Scrappy Data Scientists take on the entire data pipeline which includes more Data and Software Engineering tasks. They often work for very small AI startups — in fact many are co-founders. They carry the weight of the company on their shoulders because the models they build are often the core of the company’s product/service, rather than, say, just a way of increasing a webpage’s click-through-rate by 5 percent. These companies tend to have a very scrappy culture; move fast, fail fast, learn fast. Because of this, alongside having a very low budget, the companies rarely, if ever, require formal education, and thus a Scrappy Data Scientist doesn’t need formal education. Being a Scrappy Data Scientist comes at the expense of low pay and long hours, but with the benefit of having near 100% control of how you do your job. I will note that you should use comprehensive tools, packages, and language — this is because your company will make hires if it does well, and onboarding should be as efficient as possible. I would advise coding in Python and avoiding the use of packages that aren’t well known in your tech stack.

Interestingly, companies and hiring managers have been known to look for the perfect Data Scientist that doesn’t exist, posting jobs with unrealistic requirements. This type of Data Scientist is the one who can actually meet all those requirements, although their lack of education and specialization would make them unattractive candidates.

Skills : They should emphasize Machine Learning and Data Analysis but ultimately they should be prepared to learn just enough about anything to make it happen! A willingness to do engineering work, presentation skills, stakeholder engagement (particularly with external stakeholders), and entrepreneurial drive is also key.

Is it for you?

You feel bored at a comfortable 9–5 job.
You love startups. Move fast and break things!
You’re okay working 80 hours a week and living off ramen. Hopefully you can find a healthier but equally cheap alternative :)

Learning resources: Siraj Raval’s AI startups playlist, Garry Tan, Krish Naik’s Youtube Channel, The High ROI Data Scientist, Udacity’s intro to Python

4. Research Scientist

This Face swap was made by a GAN, most likley developed by Resarch Scientists

Research Scientists most often work at R&D departments at FAANG-M companies. Many work on recommendation systems and text-to-speech applications that we know and love — think Siri, Alexa, that up-next on Netflix that you can’t turn down, and those goofy Snapchat filters. Others work with Engineers to build API’s like Hugging Face that allow the general public (Data Scientists, ML Engineers, and AI enthusiasts) to access these models. These Data Scientists build custom models from scratch whereas the Data Scientists tend to use prebuilt models from libraries. The vast majority of Research Scientists have PhDs, typically in Computer Science, but some with Master’s degrees have been hired if they have enough experience in scientific research.

Sample Business Case : Netflix wants to build a new movie recommendation system aimed at exposing users to new content.

Skills : They should know Statistics, Algebra, Calculus, Machine learning, and Deep Learning very well, Programming (i.e. Python or C++), Presentation skills, and Stakeholder engagement

Is it for you?

You are very academic and want to go the PhD route.
You‘re passionate about scientific research.
You want to work on the types of problems above, including recommendation engines and other models that will be used by the masses.
You’re extremely competitive. Let’s face it, these roles are HARD to get.

Learning resources: People Skills for Analytical Thinkers, Arxiv Insights

5. AI Researcher

AI researchers or Machine Learning researchers, most often work at R&D departments at FAANG-M companies as well as research-focused companies like OpenAI or Deepmind. They work to push the boundaries of what’s possible and develop new models and concepts like Reinforcement Learning , GPT2, and GAN’s as well as publishing scientific papers. These positions almost always require PhD’s. If you’re reading this blog, odds are that the AI Researcher is not for you.

Sample Business Case : This blog says it better than I can :)

Skills : They should know Statistics, Algebra, Calculus, Machine learning, and Deep Learning very well, Programming (i.e. Python or C++)

Is it for you?

You are very academic and want to go the PhD route.
You‘re passionate about scientific research.
You want to do stuff that’s never been done before.

Learning resource: Pieter Abbeel

6. Data Engineer

Data Engineers work behind the scenes: they are the ones that make Data Analysis and Machine Learning possible at many organizations! They are responsible for setting up the databases and pipelines that enable the large and clean datasets that Data Science thrives on. While the role lacks the glamour of Data Science, it is in great demand. In fact, many companies blindly hire Data Scientists when a Data Engineer would serve them better! Check out this blog. I must also note that Data Engineers, especially at large companies, often go through migrations. This means that companies often change their tech stacks, making you responsible for rebuilding the data structures and pipelines. Although most Data Engineers have bachelor’s degrees, 25 percent have only a high school diploma or associate’s degree.

Sample Business case: large company; migrate from Oracle to BigQuery, small company; create the first data warehouse for a real estate firm.

Skills: Querying Language, Programming Language, ETL Pipelines, ELT Pipelines, Data Warehousing, Dev Ops, Cloud Computing, Data Visualization, and Big Data (ie. Pyspark)

Is it for you?

You haven’t gone to college and aren’t interested in doing so.
You want to help initialize data initiatives and help organizations become data driven.
You want to be the magic behind the scenes.
You want an extremely practical role in the data space.

Learning resources: The Seattle Data Guy, Udacity’s intro to Python, Khan Adacemy’s intro to SQL

7. Machine Learning Engineer

Machine Learning Engineers likewise sit behind the scenes and make Data Science possible. Machine Learning Engineers typically work on MLOps (Machine Learning Operationalization) in large companies with over 10,000 employees. Smaller companies are more likely to have their Data Scientists handle their MLOps or use models purely for internal reasons, and therefore don’t need Machine Learning Engineers.

Their role is focused on taking the models that the Data Scientists build and integrating them into the company’s software. They bring automation to the table and build pipelines that allow models to be trained and make predictions at scale, often by using code operationalization tools and cloud computing. You can think of it as turning a scientific problem into an engineering problem. Although many job listings prefer candidates to have a Master’s, many have been able to break into these roles with just a bachelor’s, especially if they have work experience in other engineering roles.

Sample Business case: You are working at a Fintech, and the Data Scientist has built a model that predicts which customers should get their loan approved. It is now your responsibility to deploy the model in the cloud and work with the software team to bring the model to production by integrating the model in the app. There will likely be several iterations as model performance often declines in production; therefore, Data Scientists will work to improve the model and decrease latency which will also improve the scalability. For these reasons, you need to use OOP to build retraining and redeployment pipelines.

Skills: Programming Language, Machine Learning and a framework (i.e. SciKit Learn), Deep Learning and a framework (i.e. Tensorflow), Machine Learning Intuition, Cloud Deployment (i.e. AWS), Big Data (ie. Pyspark), and Virtual Environments (ie. Docker)

Is it for you?

You’re a Software Engineer, and you want to get more into Machine Learning.
You love Machine Learning but would rather focus on bringing it to production than getting into the nitty gritty.
You want to be more on the technical side and want something with fewer meetings and presentations.

Learning Resources: Udacity’s intro to Python, Siraj Raval’s intro to Deep Learning, Daniel Bourke, Code Operationalizing & AWS, Andre Violante

A word about Freelancing and Independent Consulting

I’m sure some of you are interested in being self-employed. Within the data industry, the Data Engineer would have the best chance at being an independent consultant. That is because many companies aren’t ready for Data Scientists but want to get ready for the future and have their data stored in a clean and orderly manner so they can be ready for Machine Learning when the time comes. However, there are many independent consultants that offer full-stack Data Science solutions as well. Research Scientists and Machine Learning Engineers are less likely to go the consulting route as they’re mostly needed by very large companies that prefer in-house solutions.

Here’s a few Data Sciences consultancies I know of:

If you’re young and lack work experience, pro bono consulting for startups is a good way to gain experience. You can also get to see what it’s like to be a scrappy data scientist without the pressure of actually being one. You will not get paid but you will get a leg up in your job hunt :)

How Do I learn?

I know I gave you some learning resources that correspond to the different roles, but now I’ll provide a general framework for developing your skill set. As you might have noticed, all of these Data Science journeys begin with Python, and most begin with SQL too. It’s no secret that developing an in-depth understanding of both of these languages is key unless you plan on being a researcher, and then just Python may suffice. Once you start getting comfortable with the code, you can start learning the skills that are higher up the ‘pyramid’. This includes stuff like Data Viz a/k/a Data Storytelling, Data collection, and Statistics. Then, and only then, you can go into the Machine Learning and Cloud Computing stuff if your target role involves it. If you’re interested in Data Engineering I would suggest this learning path. In addition, once you get past the coding basics hurdle, it will also be time to develop your non-technical skills. This means connecting with other people in the space. As for time allocation, I would advise something like working 70% on technical skills, 10% on statistics, and 20% on networking.

Another point I will stress is that you need to build up these skills through actual projects rather than online courses. If you’ve seen the Matrix you can think of online courses as the blue pill and custom projects as the red pill! I feel that these courses are useful only for learning coding basics. Finding project ideas is not easy, and it may be a daunting task, however if you talk to professionals in the space you can get a good idea of projects that may be worth doing. Just throwing an example out there, maybe you’re talking to someone at a company that has seen an unexpected dip in revenue. You can then do a deep dive into the company and create a POC (proof of concept) for an analysis centered around this problem.

What kind of education or credentials should I pursue?

As I cover educational paths to these different roles I’ll describe the most direct route from high school, and then I’ll cover alternate paths. In addition, this blog published by Indeed offers an in depth report on this topic.

Data Analyst

Major in Statistics or Economics, maybe Business Analytics if they offer it, develop your Data Analytics skill set, and seek out summer internships. If you majored in something else, I would advise a Data Analytics bootcamp augmented with some personal projects. If you have one of these majors then I would advise you to teach yourself, much like the college student example. The reason for this is that the bootcamp can act as the relevant degree if your degree is not relevant. If your degree is relevant a bootcamp may be a waste of money. For more on the self-taught vs bootcamp debate, check out this video.

Classic Data Scientist

Major in Computer Science, Statistics, or Economics, maybe Business Analytics if they offer it, develop your Data Science skill set, and seek out summer internships. If you majored in something else then I would advise a Data Science bootcamp or Master’s program augmented with some personal projects. If you already have one of these majors, I would advise you to teach yourself, much like the college-student example. If you are a Software Developer or Engineer I would suggest the self-teaching route and putting emphasis on developing business acumen, as well as the best practices of consulting to develop yourself in the business side. If you are a business consultant or a product manager of some sort, I would advise the self-teaching route and putting emphasis on developing technical skills, such as best practices for coding and engineering. As a side note, if this is you then you’re at a major advantage!

Scrappy Data Scientist

Find a very small startup with a mission that you are passionate about (Facebook groups and AngelList are great places). If you can’t find one, you can start your own! If you are serious about this path I wouldn’t recommend getting a degree; instead I’d advise that you teach yourself along the way. The reason for this is because these types of companies have very low hiring budgets and therefore rarely require credentials. Especially considering that you won’t make much money, the degree is very unlikely to pay off. While not getting a degree is a risky path, if you give it your best shot you will gain the work experience needed to get over the work experience hump should you want something more stable. If you are in school or already have a degree, I would advise leveraging your network from the school wherever possible and finding a startup that’s at least somewhat relevant to your degree.

Research Scientist

Get a bachelor’s degree or in a STEM or quantitative field and then get a Master’s in Data Science. Seek out summer internships, make sure they involve scientific research. Scientific research is a must for these roles, and it is why most of these candidates have PhD’s. If you majored in something completely different, I would still go for the Master’s, just keep in mind you will likely have to work a little harder to keep up with the curriculum.

AI Researcher

Hands down I would advise going the PhD route and studying Artificial Intelligence. I would also advise getting a bachelor’s in computer science first if you don’t already have one. The road to AI Researcher is very long, and you need to have a very good understanding of Computer Science and abstract theoretical concepts. In addition, calculus is a must as it is key to back propagation which is how most neural networks optimize. As a researcher you will implement this from scratch and likely modify it in ways that have never been done before. I will also point out that it is in your best interest to go to a top school: Stanford, Berkeley, MIT, ect. List of top schools to study AI

Data Engineer

Data Engineer is another one of these careers that you can get without a degree. Alhough a degree might make your job hunt easier, you don’t necessarily need one. I would advise teaching yourself the skills to become a Data Engineer. There may be Data Engineering bootcamps that are suitable, but I will advise against Data Science Bootcamps for this role. That is because they are somewhat susceptible to the data science hype and won’t cover the Data Engineering side very well.

Machine Learning Engineer

The most practical route to becoming a Machine Learning Engineer would be to become a Software Engineer first! Once you are working as a Software Engineer you will learn best practices for engineering in the real world and gain the experience needed to get over the hurdle. While you are working you should develop the Machine Learning Engineer skill set and eventually make the transition to MLE!

Concerns for Data Science

As a side note, the rest of this blog is written with Data Science in mind and not AI research, Data Engineering, or Machine Learning Engineering as I have less knowledge here.

Although these concerns may seem to be addressed in a negative light, they aren’t necessarily bad, but they are things that someone getting to Data Science should know.

Erroneous job listings

According to Harvard Business Review, Data Scientist is the sexiest job title of the 21st century! In the Data Science community we often laugh at this claim, but I take it that some recruiters and hiring managers take it seriously. There are countless job listings that have Data Science in the title but aren’t actually Data Science jobs. I have seen tons of job postings that say ‘Data Scientist’ and ‘AI’ at the top, but when you actually read them you see they’re really looking for a Data Analyst or something else! In many cases ‘Data Scientist’ is a mere glorification.

When reading job listings it is important to read closely — attention to detail is key! Be sure to steer clear of jobs that have a lot of vague requirements or use terms incorrectly. If the job listing has any of these bad features, it is a sign that the business doesn’t really know what Data Science is and has no knowledge of best practices in Data Science. Check out this video if you want to learn more. As a general rule of thumb, if you feel like the writer of the description doesn’t know what they’re talking about, you won’t want to work there.

Data Science, like many other careers, has the job and experience dilemma where even entry level jobs often require 2+ years of experience. This creates the whole “I need experience to get a job but a job to get experience” situation. I’m sure you’ve seen the memes! The advice here is to apply anyway as the arbitrary amount of years of experience requirement is more of a nice-to-have. That being said, this is why it is important to work on projects. If you have worked on projects that solve real business problems then you can have something to say for yourself when the interviewer asks you about experience. This is also why networking is important, as it can help you get in the “VIP lane’ in the hiring process at many companies!

2. Expectation & Reality disconnect

With Data Science being all the rage, many companies have gotten on the Data Scientist hiring train. However many of these companies were hiring blindly, and in fact many companies weren’t even ready for Machine Learning! This led many Data Scientists with backgrounds in Machine Learning to work lower down the data pyramid, doing the work of Data Engineers and Data Analysts. This disconnect causes many Data Scientists to be unhappy at work and costs companies money as Data Scientists are often more expensive to hire than the professionals that are actually needed!

Another issue is that many companies are looking for “the unicorn data scientist.” They are often actually looking for someone that can do the work of a Data Engineer, Data Analyst, Data Scientist, Machine Learning Engineer, and maybe a Product Manager too. While the Scrappy Data Scientist has aptitude for all these things, they lack the needed expertise and preferred graduate degree. They are looking for a highly educated person that is a generalist and a specialist at the same time and will soak up all of an organization’s problems like a sponge. This person does not exist! Such companies should really accept the fact that they need to hire people to fill five separate roles.

These issues often cause Data Scientists to work on tasks that are below their paygrade, leading to an unfulfilling work environment. I was at a virtual Data Science happy hour where a Senior Data Scientist was talking about how his job had him doing things that his high-school self could’ve done, which opened up a lot of people to share similar experiences. Simple statistical tests, tedious data wrangling, and many others were mentioned.

3. Machine Learning 2.0

Implementing Machine Learning models has never been easier, all you have to do is import a library like Tensorflow, Pytorch, or Scikit Learn and build your model in a few lines of Python. This goes against the science in Data Science, as traditional Data Science is about using the scientific design process to build custom models for the given problem. However, we are approaching a day and age where these pre-canned imported models perform better. Although slightly less tailored, these models can be built much faster which saves companies money. As Machine Learning gets to be more established we are seeing a transition in which customized models built by Data Science teams are instead being built by Software Engineers in a few lines of code. As use for these frameworks as well as high level API’s like hugging face increase, we will begin to see only a handful of researchers developing advanced models that get used by the masses.

Such transitions are nothing new! For example, we can think about modern medicine, say penicillin. Scientists invented penicillin but once it was safe for treatment on humans, they found ways to produce it at scale. Then Chemical Engineering started cropping up while the spotlight left the chemists who invented it. The point here is that this scientific problem is evolving into an Engineering problem. With that being said, If you want to get into Data Science, you should be at least aware of this possibility.

4. Politics

With Machine Learning being all the rage, you may think that Machine Learning, assuming the model’s evaluation metrics are good, creates massive value. You may think to yourself: this model has a great recall and can do a great job at preventing churn! You may expect that the execs will be eager to implement your solution, that’s why they hired you after all? However, many Data Scientists struggle to actually get the decision-makers to buy into their projects! I will lay out a few reasons why this might be the case.

The cost of deployment and integration outweighs the benefit of the model.
Lack of communication and presentation skills
Using overly technical language
An inability to tie the project’s value to a monetary value
The problem addressed is not a pain point or in line with top priorities

To successfully get buy-in from decision-makers, a/k/a stakeholders, you need to speak in terms they are comfortable with. While the exact details will vary depending on whom you work for, some general advice is to study up on the industry your company is in, lay out potential objections and ways to address them, tie monetary value to your projects, and steer clear of technical terms they don’t understand. For example, say “our new churn model can increase revenue by up to $42,000 per month”* instead of “our churn model’s recall was able to go from 69% to 89%”.

*actually calculate this, don’t just drop random numbers :)

A word about bootcamps

As you might have noticed earlier, I recommended the bootcamp route for many people looking to become Data Analysts or Data Scientists (Classic Data Scientists). However, as a bootcamp grad I have insight into what to watch for both in the bootcamp and in yourself.

Finding the right bootcamp

As a graduate of the Flatiron Bootcamp I am biased in its favor when it comes to recommending bootcamps. Three things I really liked about what they did were that they taught us Data Storytelling and Statistics first, that they had us create our own repository of projects, and that they embraced fast learning. Some things they could’ve improved upon would be explaining the importance of the statistics they taught us. The lack of “why” made it easy to bury the statistics under Machine Learning and to neglect it in projects going forward. In retrospect, I blame myself for this far more than I do Flatiron School, but I still think it’s worth pointing out.

Regardless of what bootcamp you choose it must include the following; mathematics behind Machine Learning, Data Collection and sourcing, and one-on-one career support. Before signing up for a bootcamp, you need to evaluate job-placement statistics, reviews from graduates, and the curriculum. If the boot camp’s curriculum is heavy on theory and neglects the business side, then run! If it teaches Machine Learning at the beginning of the curriculum, run! And if it is 100% asynchronous and doesn’t require communication to complete, run!

These are just bare bones requirements. Vin Vashishta, a highly experienced Machine Learning Strategist, says that a bootcamp should include cross-team collaboration (Software Engineering, Stakeholders, etc) and putting Machine Learning into production in a real-world environment. I have yet to see a bootcamp that lives up to his standards, but I consider his video on evaluating a Data Science bootcamp to be a must watch for anyone in this position.

How to Utilize the bootcamp

I am going to share three tips for anyone going into a Data Science Bootcamp, the first two of which are things I wish I had done, and the last is something I’m very glad I did.

The first tip is to apply your Data Science Skill Set to a particular domain. Don’t think “I am looking to become a Data Scientist’ but instead think ‘I am looking to apply Data Science to X industry.” Many bootcamps do a great job at teaching you the underlying concepts but leave you hanging when it comes to developing business acumen, putting you in charge of developing the business acumen yourself. For this reason, I advise focusing on a single niche. It’s easier to gain a deep understanding of one domain and how it operates from a business perspective than to develop a general business acumen, at least for getting your first role. You should also make sure the bootcamp is in line with your goal. If you want to be a Data Analyst then do a Data Analytics bootcamp. The time, and money, spent studying Machine Learning will not pay off in your job. Many people in my cohort had past careers like marketing, finance, or hospitality to tie their new skills to whereas I didn’t. The result is that their projects were carried out with more passion, revealed more business acumen, and led to a more orderly portfolio.

Second, start applying for jobs as soon as you get comfortable programming! Once you can code you need to start chucking out those job applications. If you have a dream company, I would advise you hold off on applying to it and save it for when you feel competent. The idea here is to fail your way to success! You can develop your interviewing skills alongside your Data skills and set yourself up for success down the road. For more on this check out this blog post. It’s also worth noting that only 2–3% of job applications get interviews (assuming you’re qualified).

Lastly, you need to explain your projects to other people, especially people of non-technical backgrounds. You should have an online medium for sharing and explaining your projects such as a Medium blog or a Youtube channel, and it’s okay to get somewhat technical here, in addition to talking about your work with non-technical people you know. I personally like to write on Medium and talk about my projects with my dad who knows very little about tech. I also went to a networking event a few months ago and talked to a Business Consultant who didn’t know what Data Science was, merely defining the term was a learning experience of its own!

Networking

There are about a million and one ways to do “networking,” and almost everyone I talk to considers networking an important part of finding a job. The problem with networking is that it can be a massive waste of time. Because there are so many ways to do “networking,” I will cover a few simple tips that can help you take action.

Networking Tips

Know why you want to connect with each person you reach out to.

At first glance this looks obvious. You may be thinking “I’m looking to expand my network to better my odds of getting hired,” but truth be told that isn’t enough. You do not want to be that person that goes on LinkedIn, types in “Data Scientist,” and messages as many people as their account will allow (I’ve definitely been this guy). You need to pinpoint something to everyone you reach out to: maybe they wrote an amazing blog, or maybe they work at a company you’re already interested in, or maybe they just have a really interesting post on LinkedIn. Whatever it is, make sure it is bringing you to their LinkedIn page rather than you finding them via LinkedIn search.

2. Don’t play the numbers game

The numbers game is for job applications, not networking! You need to keep networking natural and personal. If you turn it into a robotic process in which you are reaching out in bulk, you will be spinning your tires. Sending out 100 connection requests is something a five-year-old could do! That time would be better spent working on projects. It is the quality over quantity here!

3. Know a thing or two before you hit send

This one goes in line with #1. By creating a strong reason to connect with an individual you will actually know something about them before you reach out. As a rule of thumb, I would advise that you know where they went to school, what role they have and also had in the past, and why you’re reaching out (thinking back to #1), and also that you develop a compelling discussion topic based on this knowledge, with the topic being what you request to talk about. The following is a LinkedIn message I sent to a Senior Data Scientist, which prompted a great talk.

‘’’Hey [name], I would love to hear more about your Machine Learning journey and am especially interested in hearing about what led you to write your blog on the merge of Data Science and Software Engineering. I’m writing an informative blog on different roles across the Data Industry and I value your input. Could we have a 15 minute chat about why you expect this merge? I’m open all through next week from 3:30 to 5:30 your time, though I could do after 8pm if it’s better for you.

Best, Spencer’’’

In summary, he wrote a piece on the merge of Data Science and Software Engineering, and I wrote to him asking to talk about it in more depth.

4. Don’t use #opentowork

This might be more of a pet peeve of mine, but I don’t like the #opentowork banner as it makes you look like a low-quality and desperate candidate. While this may not be the end of the world in your networking, this will definitely hurt you when you look for work and won’t look good in the eyes of employers and hiring managers.

5. Give value or seek to create it

Try to find a way to offer value to those who you reach out to. You can offer to do work for free, introduce them to people that you think they would like to be connected with, send them resources they might find helpful, or invite them to work on an open-source project that you’re already working on. Alternatively, you can offer a way to use a chat in a way that provides value to other people. I am writing this very blog because I offered to include the knowledge I gain from having chats with other Data Scientists. Just look at this part of my message “I’m writing an informative blog on different roles across the Data Industry and I value your input”. Bits and pieces of his input are scattered throughout this very blog! Harpreet Sahota gets well respected people in the Data Community to chat with him and drop advice because he is recording the conversations and turning them into podcasts that can be heard by many people, producing value for far more than just him.

Unstructured Learning

The learning section earlier was focused on Structured Learning. This type of learning consists of disciplines and hard skills which are developed mainly through projects. Courses and tutorials count, too, but they are inferior to projects for the most part. However, unstructured learning such as reading blogs or listening to podcasts are also crucial parts of the learning journey! Reading blogs and listening to podcasts, especially in your free time, will allow you to immerse yourself in your discipline of choice. Blogs are great to read when you’re having coffee, winding down, on the train, or heck even on the toilet! Podcasts are great for when you’re doing mundane tasks that don’t require tons of focus; driving, walking, cooking, cleaning. When I’m doing more ‘boring’ tasks like Data Wrangling or front end Web Design, I like to listen to podcasts as well. Speaking from personal experience, back when I was studying Machine Learning while working my old job it was only when I started reading Towards Data Science blogs on my lunch breaks that I began to develop an understanding for the concepts I was learning.

Some unstructured learning resources that I highly recommend

Artists of Data Science
Towards Data Science
The AI Podcast (more on the entertainment side)
Madhav Thaker
Better Programming

Wrapping it up

If you’ve made it this far, Congratulations, you are a warrior! I know that was a ton of information, quite possibly too much. If you got value out of this then my advice going forward would be to keep this link saved somewhere so you can revisit these concepts as needed. I would also appreciate a few claps if you found this useful in any way.

Until next time, Spencer

So you want to be a Data Scientist

Written by Spencer Holley