15 Observations from talking to 9 Data Professionals

These are just observations I’ve extracted from 8 conversations with various Data Proffessionals including 4 Data Scientists, 1 Data Analyst, 2 Research Scientists, and 1 Machine Learning Engineer.

Photo by Faizur Rehman
  1. ML/AI is mostly used at very large companies and AI Startups, but rarely used or beneficial at small companies unless the AI is in the product/service. This doesn’t necessarily mean that ML does not have its place at midsize companies, however Machine learning works better with very large datasets that many non-tech companies don’t have access to.
  2. Data Engineer is a more realistic goal than Data Scientist for a lot of people, especially if you don’t go to college. While this is a highly debatable topic it is true that many companies need Data Engineering but aren’t mature enough for Data Science yet. Data Engineering is also much lighter on the advanced math concepts. For these reasons Data Engineering is in greater demand and tends to be a less academic role.
  3. You don’t need to dive deep into a particular domain, unless you’re passionate about a particular one
  4. You can be an ML Researcher without a phd! Many have had success landing a role with just a masters. The key around it is to gain experience in ML research and Scientific Research on your own, using best practices of course. The AI Pianist is a great example of an independent research project!
  5. Business Intelligence tends to work internally at smaller companies, as companies grow the tasks begin to fall more in the hands of Data Science. BI is focused on analytics projects that can bring insights to the strategy teams. When companies hit the ~1,000 employee mark, give or take depending on industry, they begin to offload ML projects to Data Science.
  6. The real challenge in Data Science is using the technology to create value, not learning it. Machine Learning, the technical side of Data Science for that matter, is not that hard anymore. There is no shortage of people that can build dashboards and models. What companies need is Data Scientists that can learn a domain and identify solutions to business problems and find data in accordance to those solutions
  7. Everyone in this space faces imposter syndrome, they go through a phase when they feel unqualified, doubting their abilities and feeling like a fraud. The key to overcoming it is to share! Share your knowledge in the form of blogs, posts, even videos. Share your projects, soak in positive feedback and take action on the improvement areas. Share your doubts, when you do you will be reminded that this isn’t a ‘you’ problem.
  8. There’s often a 1–10 ratio for Data Scientists to Software Engineers in organizations. Data Science is only a small portion of operations that take place at companies, even if their models and analysis is external facing (meaning that its part of the project).
  9. When You’re the first Data Scientist you often have to identify problems and solutions on your own. It’s also vital that you avoid the curse of overcomplexity when crafting these solutions. You also need to use a stack that is understood by those who come next, this means no odd packages.
  10. ML libraries like Keras and Pytorch are driving the Merge of Data Scientists and Software Engineers. They are turning Scientific problems into Engineering problems. Data Scientists used to build highly customized models specific to their company, domain, and use case and while this is still common we are seeing a shift to the use of slightly less customized off the shelf models that are highly replicable. These types of models are often worth the tradeoff because they aren’t as labor intensive and develop much faster!
  11. As soon as you understand the basics of programming you should start building, seek to code your solutions independently. Programming is about developing a framework for thinking and problem solving, this is something that is not developed with memorizing syntaxes and copy pasting. With that being said Stackoverflow searches have their place and learning the syntax for all the packages you use is not practical. However you should be able to write conditionals, functions, and classes on your own.
  12. Being a programmer is about thinking creatively to solve problems, not memorizing code, this goes right in line with #11.
  13. 11 and 12 hold true especially for ‘research scientists’. The reason for this is because they spend significantly more of their time coding from scratch and use packages much less.
  14. You don’t need a degree to be a Data Scientist. Almost all Data Scientists have degrees, many companies require it, many job listings require it, but at the end of the day it is not needed. When you look at a successful Data Scientist you realize their positive attributes; Business sense, problem solving, keeping up with the latest in their domain, great communication skills; these are all things that must be developed on your own regardless of your education. In addition, tech moves so fast that the 4+ years of higher education is growing obsolete and being replaced with learning spurts throughout one’s whole career.
  15. The cost of deployment must be considered when deploying a model. Sure, machine learning can deliver massive value to businesses but the model must be deployed in order to be of any use. Some models can be highly effective and have metrics that are tied to the business well but they have high latency and are therefore not scalable due to the high computational costs. Data Scientists and engineers often work closley to identify ways to build more scalable and computationally cheaper models without sacrificing preformance.