I’m Close to Graduating. Guess I Should Find a Job?
Current link for this presentation can be found at:
embios.quarto.pub/career-pathways-statistics-and-data-science/
Most Biostatistics students will find work, not surprisingly, as a Biostatistician though your work as a student has also prepared you for more general work in interesting realms such as Data Science, FinTech, Business Analytics, Investment Banking, Sports Analytics and even Music Analytics.
Obviously, you would need to learn more about those domains in terms of the types of problems being solved but knowing math, stats, and how to build (and defend) models is fundamental knowledge for all these employment domains.
This presentation is designed to help you understand what potential job titles are out there once you graduate from the BIOS program at Rollins. This reflects just my experience and opinion though I would like to think it’s actionable by you.
Looking for work in data related fields can be confusing given that job titles, especially related to Data Science, are in flux and therefore difficult to nail down in terms of what precisely one’s duties will involve. The “Data Science” descriptor can be vague and over-generalized to include duties more appropriate for an IT person or a database manager. The purpose of this presentation involves the following:
Topic | Description |
---|---|
Understand Industry Trends | Gain insights into the current trends in data-related fields, including the shifting job landscape and emerging opportunities. |
Explore Key Job Titles | Learn about the potential job roles available to graduates from the BIOS program and what each title typically entails. |
Identify Common Skills | Understand the core skills that are essential across data-related jobs, including technical expertise in areas like statistics, modeling, and data analysis. |
Respond to Recent Tech Layoffs | Discuss how recent layoffs in the tech sector might impact job opportunities and what that means for your career planning. |
Access Learning Resources | Highlight resources available to help you explore and prepare for opportunities in data-related fields. |
What Skills Do I Need for Data Jobs?
You can ask 5 people what it is a data scientist does and you will get at least that many different answers. But there are skill sets and ideas that you can reliably assume exist for that type of job. We’ll get into some details later.
Note that this presentation was motivated in part by this YouTube video by Thu Vu Analytics concerning trends in Data-related jobs.
It’s been said, (at least by me), that Data Science is a domain whose activities are basically all the things that statisticians, software developers, analysts, and database administrators do not want to do on top of what they already do. While this sounds a bit flippant, it really isn’t. Consider the following which presents an overview of attributes important for a Data Scientist to have.
I’m also fond of this graphic by Market Distillery which captures similar information, albeit in more detail:
It could be said that developers, statisticians, and data analysts exhibit these characteristics though it’s usually only in part. Nothing against them because, after all, writing software on a professional basis is a job all on its own.
The larger point is that Data Science emerged in response to a need for knowledgeable professionals to address the knowledge gap between statisticians, software developers, database managers, and finance people. This also accounts for the observation that Data Scientists sometimes get pulled into IT or database activities to solve longstanding problems rather than focusing on analytics work.
What’s in a Title?
Job titles are important. So much so that Human Resource groups agonize over them. That doesn’t mean they get them right. Many times they are behind the times or, if the have the right title, the job duties aren’t in line with what the industry expects. Let’s put this aside for now and look at one common source of information which is the the Occupational Outlook Handbook developed by the US Bureau of Labor Statistics.
Let’s look at some of these job titles since. It’s interesting to note that it took quite some time (years in fact) for the job domain of “data scientist” to even show up in this resource. While this handbook is in no way the definitive list of jobs it does show occupations officially recognized by the US Government.
To do this we’ll put up an interactive table that can be used to inspect some of these quantitative jobs using median salary and projected percent growth. This might influence decisions we might make when shopping the job market. It’s never too soon to start thinking about employment.
Education
In the above table you will see the suggested or expected degree requirements associated with the associated job title. These are guidelines and in no way hard and fast rules. That said, most statisticians will in fact have graduate degrees as would research scientists and epidemiologists.
Actuaries are somewhat unique in that it is possible to secure work at the bachelor’s level since actuarial science uses a series of professional exams to control for job advancement. Check the graphic below for titles and common educations requirements.
In my experience, these are simply guidelines and not hard and fast. For example I’ve encountered many data architects who do not have graduate degrees. Whether they should (or not) is another issues outside the scope of this report. It’s the same situation with Machine Learning Engineers. There are plenty without a Masters although I would argue that it’s useful to have one.
Inspecting Growth and Salary of The US Labor Data
Let’s check out some plots to understand which occupations have better growth and those which have better annual salary.
Now what about salary? That’s important isn’t it? It’s straightforward to plot the data and determine which occupations bring in the largest MEDIAN salary. Note that it’s important to realize that median salary is important here given that salaries will vary significantly depending on location with the US.
Additionally, in the post-COVID work world, remote jobs are common which can influence the salary also.
Industry Vs. Government and Education
Keep in mind that these figures represent the US as a whole. Salaries vary by region and especially so when considering government and medical research jobs taking place at universities. Generally these will pay lower than what industry would offer. This is not a rule but is an observable trend.
Median Pay vs Job Growth
It might be useful to look at a plot of Median Salary vs job Growth to see which titles emerge as the top 4 in terms of pay.
Job Growth vs Median Pay
Let’s flip the axes to see what the top 4 titles are in terms of Job Growth percentage.
embios.quarto.pub/career-pathways-statistics-and-data-science/
Other Sources of Information
If you start looking for jobs using services such as Linkedin, Indeed, Monster, Glass Door, or ZipRecruiter you will see a larger range of job titles and salaries that don’t necessarily match the information found in the US Labor Handbook.
In fact, it might vary significantly based on the location and whether it’s part time, full time, remote, on-site, or temporary. There are lots of things influencing the market value. Also keep in mind that statisticians and data scientists could possible work in domains other than pure research. For example, gambling casinos employ “data people” to ensure continued profitability.
I’ve found that Glass Door and Linkedin have lots of supporting information as a result of using their job search process. Glass Door has lots of information on interview questions associated with various jobs at a given company. No one site is perfect and it’s likely you will use different ones. If you are looking for federal/government work such as with the CDC then you will use USA website.
Check out the data set on Kaggle which represents job postings from Google search results for Data Analyst positions over time.
This dataset pulls job postings from Google’s search results for Data Analyst positions in the United States.
Data collection started on November 4th, 2022, and adds ~100 new job postings to this dataset daily.
Back To Titles
Using the Kaggle dataset, let’s look at the top 20 or so most common job titles that resulted. It’s little surprise that the highest returned job title is “Data Analyst” because, well, that was the search term. It will also likely yield more general results than if we had used “Data Scientist”. One thing you will soon see is that there are various jobs associated with or adjacent to “Data Analyst” which is a testament to how various HR departments might view Data related positions.
The first thing you might notice is that there are many more data-related job titles which emerge from a general Google search using “data analyst” as a search term. Get used to the idea that what you, or the US Labor department, might call a Data Scientist or Analyst could well be called something different at another company.
There is no way to control for this situation. Don’t get hung up on the title. You just have to look at the job description and skills requirements to determine if the job matches your interest and qualifications.
But Who Is Hiring?
Who is “Upwork” and why do they have so many job listings? This is a good question. It turns out that they are a free lance sight where people willing to do technical work will offer their services to those requiring work. Sort of a matchmaking site. This also brings to light the idea that not all technical employment is full-time and/or salaried. In fact look at the number of job type categories in the data set
Since we have some “real” data available to us let’s look at the top 20 average annual salaries in the data set. This might be something of a challenge in that many of our job postings do not post salary information (more on that momentraily). Actually, it turns out that we have a very large percentage of the data rows that have missing values for yearly average salary. Around 83%
Percent of data with missing `salary_avg`: 83
Before you get excited you should consider that the top jobs represent maybe 1 or 2 job positions at the respective companies. That is, those jobs are exceptional and not at all indicative of a typical salary. As an example, the “Sr. Data & Reporting Analyst” position which represents a salary of $434,500 actually represents only two job postings from the entire dataset. Note one of the listings omits yearly salary_avg
# A tibble: 2 × 3
title company_name salary_avg
<chr> <chr> <dbl>
1 Sr. Data & Reporting Analyst Applied Systems, Inc. NA
2 Sr. Data & Reporting Analyst PCS Retirement 434500
The “Pre-Sales Data Scientist, Financial Services” position with a salary of $288,000 represents only one job posting.
$title=="Pre-Sales Data Scientist, Financial Services",][c("title","company_name","salary_avg")] gs[gs
# A tibble: 1 × 3
title company_name salary_avg
<chr> <chr> <dbl>
1 Pre-Sales Data Scientist, Financial Services Teradata 288000
Let’s look at some Junior level titles
Salary Transparency
In the above plot we can’t assume that the salaries are indicative of an average trend. Since we have a lot of missing values in terms of salaries then any kind of mean annual salary estimate will be suspect since we don’t know if the jobs that did include the salary figures were perhaps outliers in some way.
I mean if you were a company and wanted to attract top talent you might publish salary info to get people excited about the opportunity. Some companies publish only salary ranges whereas others don’t publish anything (Emory is like this).
The plot gets thicker as they say. Most (if not all) candidates want to know “how much does this job pay?” If the company does not list this then you have to ask. The company might also ask you how much you currently make which can be awkward.
In some jurisdictions, employers must disclose salary ranges for open or current positions, and they may be prohibited from asking about a candidate’s salary history. But this varies across states. Check the map below. You can see that as of this map not many states mandate salary transparency.
Even in states that require transparency they can often publish salary ranges which can be very, very wide which makes it difficult to know what they have actually budgeted for the position.
More Salary Info
So earlier we looked at salary information for top paying jobs. Let’s look at the distribution for all the job titles listed in the data set. This is helpful to see where most of the salaries will bin in terms of a histogram. Note, we still have to filter out jobs where there is no salary information. There is also a small problem in that some of the salary averages are given in terms of an hourly rate instead of a yearly salary.
Let’s look at the distribution of Data Science jobs in particular for those jobs that have listed salary information.
Job Postings Over Time
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Layoffs?
Wow. No one wants to be laid off from a job. It’s a negative and demoralizing experience. The tech domain is not immune to this and there have been two major layoff periods: January of 2023 and 2024. Let’s look at some graphs. The first shows tech layoffs since 2020 whereas the second shows tech layoffs start in January of 2023. There are some patterns here one of which is that there was a wave of layoffs in Q1-Q2 of 2020. You know why don’t you? That’s when the COVID pandemic hit in a very big way. Little surprise that some jobs were wiped out.
But is this trend worse than in the recent past? That is, should you still consider employment in the tech industry in general? The following plot relates to all jobs not just tech which is important because we do see that when considering that - we are actually not in a major and ongoing layoff wave. It just seems that way because tech companies have been laying off workers and selectively rehiring for new positions. Some of this relates to how AI could be used to replace the level 1 or entry level workers. If companies think, for example, that they can replace customer service reps with AI-agents and avatars they most certainly will do that.
There is also the idea that layoffs can be used as a mean to report better financials to stock holders which is unfortunate. You might be interested to know that companies of a certain size that intend to layoff employees will, in some cases, need to file paperwork in compliance with federal regulation. The WARN Tracker site captures this information.
Let’s also look at their historic information relative layoffs
Site | Description |
---|---|
WARN Tracker | Layoff Insights derived from Public Data in compliance with the WARN act |
Crunchbase | General technology site which contains layoff tracking, finance and tech trends |
Layoffs.fi | Site Dedicated to Layoff tracking in the Tech Industry |
Trueup | Tech Hiring, finance, and Layoff trends |
Data Nerds | Salary and Skills trends for data and ML-related jobs |
US Bureau of Labor | Federal Government site for Labor Statistics |
So we do have to face the fact that there have been tech layoffs this year and the previous. We’ll talk about this in greater depth later but recent layoffs have an interesting motivation in that larger tech companies layoff when other comparable companies layoff. Stock holders want to know, for example, why Google might have laid off (also known as “job shedding”) yet Meta has not. Check this link for more discussion on this
“There is a herding effect in tech,” said Jeff Shulman, a professor at the University of Washington’s Foster School of Business, who follows the tech industry. “The layoffs seem to be helping their stock prices, so these companies see no reason to stop.”
What Do You Know? General Skills To Acquire
Back to the job at hand (no pun intended) It should be said that independently of a given job title it’s best to consider what skills might be common to data jobs in general. Let’s start with one that is REALLY important for any data-related role - obtaining data! Note that just getting data is a skill all to its own. What you do with it after that, like cleaning it, is another skill set.
Data Janitor - Your Future
In classes you are generally given a nice, easy-to-use and well documented data set. That’s actually a good thing because it allows you to focus on the analytic methods being used to learn from the data. If you have to spend all your time generating and cleaning the data that leaves less time for building a model on it.
But then, wouldn’t you think that understanding the data BEFORE making the model is a good thing? Of course, but you have to start somewhere. It be that it’s all a cycle wherein you get some data in pre-determined format, you training a model on it, make some judgement about the model and then adjust the model. At some point it might as much sense to maybe transform or supplement the data to make it easier to train the model.
I don’t clean data - I just analyze it. Can’t you get a student to fix it up? An actual question from a researcher
So, what you originally thought was just a simple case of building model has now turned into what we call a data engineering problem. We might even question the format in which were were given the data. Maybe we could supplement it in some way but that would require us to go back to the data source - assuming that is even a possibility. All this means that the following is in your future:
Digging Deeper
At this point - It might be useful to further discuss the difference between a Data Scientist and a Data Analyst. As we know, it’s somewhat “easier” to get into Data Analysts positions when considering education and skill set. That said, it’s a competitive market so anything a person knows relative to data is only going to be an advantage when interviewing for jobs. The typical duties involved in a Data Analyst position are along the following lines though I find that it’s more involved than just this.
Compare the above to this representation of what a Data Scientist handles. Note that there is indeed a strong overlap in terms of data work but then the Data Scientist will go deeper into prepping and transforming the data and building models. The idea of deploying models so that others can use them is part of what a DS does but given the large data sizes and general need to scale model performance a relatively new position called “Machine Learning Engineer” has emerged which goes deeper into that domain.
MLOps
We have now entered the world of “MLOps” which is the shortened form of Machine Leaning Operations. This domain is concerned with the continuous integration (CI) and continuous delivery (CD) of machine learning products. The goal is to improve the model development lifecycle by automating testing, integration, and deployment to deliver high-quality models faster and more reliably.
Think about this for a moment. In educational scenarios much of the work involves data sets of modest size against which models can be developed using your laptop. This is by design so you can focus on understanding the mechanics of the model itself. Once you transfer those skills to an actual employment scenario the data will usually be much larger as will the need for more robust models capable of handling simultaneous invocations from an API.
This then requires the Data Scientist to know how to do this which is a good skill to have. However, it can compete with the business of creating a good model in the first place. This is where a good Machine Learning Engineer can help. In the data realm there is also the Data Engineer who is adept at managing and access very large amounts of data.
This was not always the case. In the early days of Data Science it fell to the Data Scientist to do all this work. The emergence of these new job titles is a testament to how much the domain of Data Science has grown and continues to do so. Not every company will have all of these titles which could mean they don’t have enough budget to hire multiple data-related titles. In these cases it is possible that
The data and models are of modest size and therefore do not need to be develop for at-scale delivery
The organization understands that multiple data-related positions are required but perhaps doesn’t yet have the budget for multiple data-related positions.
The organization doesn’t even know (or care that multiple data-related job titles exist.
The organization has Data Science reporting to the Information Technology group which will typically not have a good understanding of what Data Science is all about.
During your interviews you want to better understand what data-related titles exist in an organization before taking the job. If you are interviewing with an organization that falls into the number 3 category above then life won’t be nice for you because they will expect you to basically do it all! And I strongly recommend to avoid situations that fall into category 4.
For your convenience, here is a brief summary of these titles and respective duties. You should investigate these in greater depth. As I mentioned earlier not every organization will have all of these titles within an organization. Even if they do the larger organization might not understand what these positions are supposed to do which means that, for example, the Machine Learning Engineer will do the work of a Data Analyst and a Scientist.
Title | Description |
---|---|
Data Analyst | Analyzes data to generate reports, insights, and visualizations for decision-making. |
Data Scientist | Develops models and algorithms to extract insights and predict trends from data. |
Machine Learning Engineer | Builds and deploys machine learning models, ensuring they are scalable and efficient. |
Data Engineer | Designs and maintains data pipelines, ensuring data is accessible and reliable. |
Databases
So what do all of the above positions have in common? It’s actually an easy question! Come on - you can guess! It’s DATA! The ability to find, analyze, manipulate, transform and manage. This brings us to a really big and important skill…
In the so called “real world” there are these things called “databases”. Surely, you’ve heard of them. They are seemingly monolithic sources of data that are usually well guarded perhaps by people who have no idea (or interest) in what anyone might ultimately do with the data so they are suspicious of requests to access the database.
Their idea of data security (even if they don’t really know what that means in any official way other than quoting various standards are you), requires a certain degree of paranoia. That’s another problem altogether and one you will most definitely encounter in your professional data career but that’s outside the scope of this presentation.
This graphic is demonstrative of the fact that many organizations, especially those engaged in research, have tons of data parked in Excel spreadsheets. Excel is convenient and ubiquitous and is almost always the first stop on any data analysis adventure. Most of the world is NOT wired to import data into Python or R.
In medical research lots of data generating devices (e.g. sequencers, mass spec) will dump data that can be conveniently read into Excel. It’s also easy to pass spreadsheets around the lab for first pass inspection but once it comes time to do “real” analysis that’s when the data needs to be imported into packages and programs designed for such.
It’s also quite possible that the data will need to be merged with a patient or subject record that’s already in a data base. You need to know how to do that.
Level Up Your Skills
SQL / Apache Spark
All of this has been a build up to promote SQL as something you should learn either as part of your coursework or on the side. Both approaches are valid although the larger point is that it’s become a standard practice for organizations to create data warehouses against which you (or a broker) would pull data.
Now this doesn’t mean that study or instrument level data won’t persist in the form of Excel spreadsheets - those will never go away and it’s possible that you might not need to query a large database. But that’s the trend. And it’s even more of a trend for data scientists and analysts to know how to extract data. Check out the top skills trend for data scientists according to the Data Nerd site.
Apache Spark supports multiple programming languages, allowing R, Python, and SAS users to work with large datasets seamlessly. Through PySpark (Python API), SparkR (R API), and SAS/ACCESS for SAS integration, it enables programmers to leverage Spark’s distributed computing power using their preferred languages. Python developers can easily integrate Spark with popular libraries like Pandas and scikit-learn, while R users can apply SparkR to process massive datasets beyond the capabilities of standard R.
Foundational Skill
SQL can be considered as being foundation to data-related work. Knowing it will give you flexibility that others do not. I’ve been in situations wherein someone wanted ONLY to build models so they expected to have the data presented to them in perfectly cleaned shape which is very unrealistic. That’s something that almost never happens.
Many times as a data person (or when wearing your data hat) your first job is to pull data from a database and experiment with it to see if there are any hypotheses that can be generated. Other times you will be asked to pull specific data but it’s going to be up to you to do that. If you have to wait for someone else to do it then your value to the organization is less that what it could be.
Note that some organizations have security policies that might present direct access in which case you will need to go through a “data broker” to get information. In some cases this is understandable but in others maybe not. It can be a sign of a dysfunctional data group who doesn’t really know how to adequately protect data so they pretty much deny and all requests.
Improved Productivity
Databases aren’t just there to query. You can also create your own databases to serve your interests especially if you need to share your work with others so they can reproduce it. There are plenty of free relational databases such as SQLite, MySQL, and Postgres which you can install on a laptop where you can create a database to house your data. All modern programming languages have packages that allow one to access databases and store the results in a data frame (certainly true of Python and R).
Data Scientists
Let’s look at top skills mentioned in job descriptions for Data Scientists.
Data Analysts
Look at a similar graph for a Data Analyst. Notice how the priority flips! SQL is the most important skill followed by Excel. This makes sense because the top priorities for data analysts involves getting, examining, cleaning, and manipulating data.
Now let’s introduce another title you are likely to encounter in data-related work: Machine Learning Engineer.
Machine Learning Engineer
A Machine Learning Engineer (ML Engineer) focuses on developing, deploying, and maintaining machine learning models in production environments. Such a person works closely with data scientists to build and optimize models, ensuring they handle large datasets efficiently and are ready for real-world use.
This involves tasks like data preprocessing, training models, and deploying them to platforms such as AWS or GCP. Once in production, ML Engineers monitor models for performance, optimize for scalability, and handle retraining when needed. They also collaborate with software developers and data engineers to integrate models into broader systems.
ML Engineers need strong programming skills, particularly in Python, Java, or C++, and experience with machine learning frameworks like TensorFlow and PyTorch. They also work with cloud services and data engineering tools to manage large-scale data pipelines.
Let’s put this up against some hiring trends as determined by the site hntrends.com This shows up that there is growing and substantial interest in hiring machine learning positions. Again, you would have to look at the underlying details of any position you are interested in pursuing to determine if it is really machine learning or perhaps some position that combines a lot of job functions from adjacent roles such as Data Scientist, Data Enginner, MLops and so on.
Larger Skill Set
Now if we look at the top skill sets desired across a large section of data related jobs we see the following graph. This represents a mixture of skills involving not only databases but programming and cloud computing and associated libraries.
Language Skill Set
We might want to look more closely at the skills for a specific area such as “Languages”. This can help you determine which languages to pursue though you must make that decision with your area of study and expertise in mind. For example, it would be easy to focus on Python and SQL but I would argue that if your intention is to function as a statistician then you might focus on R primarily since in my experience it is a better language for those domains. Of course, another answer would be to learn both.
Emerging Skill Set
The website has a lot of interesting stats and trends. There are too many to get into here but one last slide I’ll present in this section relates to trending skills that are rewarding in terms of salary. Note that these reflect a larger cross section of data-related titles at differing levels (e.g. junior, mid, and senior). You should investigate Spark.
AI Engineers
So this is an interesting job title which I claim is an extension or perhaps a rebranding of Machine Learning Engineers and Data Scientists. The term “AI” itself has become something of a hype term which has diluted the actual definition of AI. You will often hear it from sales people because the market is so large and is aggressively growing. Check out the following based on research conducted by Grand View Research.
At this point you will see AI engineers being mentioned as a job title with the duties being very much targeted to development and use of Large Language Models, text recognition, knowledge extraction from videos, real-time language translation, audio extraction, as well as generative music and film. These areas rely much more heavily on deep learning and associated architectures so you could view it as something of a “deep learning” focus.
In terms of the general workflow the data can be more unwieldy because it is frequently NOT tabular in format so various techniques must be employed to ingest the data so it can be learned from. If you think of the most basic Deep Learning tasks such as digit recognition or cat recognition from photos then consider more complex cases such as live video.
How would you identify a person from a live feed of a security camera placed in Times Square or the Trevi Fountains in Rome? Is this easily done? Is it ethical to do so?
Just understand that you would need to be comfortable in selecting and optimizing neural network architectures (e.g., CNNs, RNNs, Transformers) to meet specific project objectives. While this sounds exciting, and it is, this does not at all mean that the more traditional modeling methods are no longer useful - far from it.
Plenty of data lends itself well to logistic regression and decision trees. It’s just that with the amount of real time data in our leaves (music and video streams, medical imaging, satellite imagery) there is great interest in detecting patterns of interest. Or, more recently, identifying specific events or people.
So in terms of pay what could you expect?
Statisticians
Conclusions / FAQS / Comments
So Could You Review The Top Skills?
Sure. SQL, Python, R for starters. Visualization is import also. ggplot, Shiny. Power BI and Tableau if you are going into a corporate environment. Here is a graphic which provides some guidance.
The way to consider this information is as a guideline rather than a “must have” although as a student you do have the opportunity to take classes which will give you the information in the “Student” column. If you aren’t currently a BIOS student then things like Coursera can help fill in some gaps. As one moves into industry it is inevitable you will encounter a larger range of tools that your employer has chosen to use. Ideally these will be tools that are popularly used which means skills will be transferable.
The important thing to realize is that as you move forward you should become more adept at presentations and communication as this will set you apart from those who are just technical. Not everyone enjoys presentations but one can still be an effective communicator without giving presentations. Ask yourself what you would like to know about a project before doing any kind of work on it. Imagine what your co-worker might want to know as a result of your work. This gives you goals not only for the technical aspects of the work but for the ultimate impact it might make.
Soft Skills!
Data scientists often frustrate their bosses by not being able to provide a simple “yes or no” answer to complex questions. Bosses, expecting quick and clear decisions, may struggle to understand why a model that shows strong predictive value can’t deliver such straightforward responses. The data scientist, in turn, becomes frustrated when their boss overlooks the nuances, limitations, or variables that impact the accuracy of the model. This gap in understanding can create tension, as the demands of business decisions clash with the inherent complexity of data-driven insights.
Working with executives, dealing with organizational politics and reaching out to strangers will be part of your job.
The tension between data scientists and their bosses is a core issue in organizations that rely on machine learning. Bosses often expect quick, actionable insights, while data scientists are bound by the complexity of the models and the uncertainties inherent in predictions. This disconnect can lead to workplace frustration, where the demands for clarity clash with the reality of nuanced data.
However, this challenge also presents an opportunity for data professionals to stand out. Those who can communicate effectively, manage expectations, and use diplomacy to bridge the gap between technical depth and business needs will differentiate themselves, demonstrating not just technical acumen but crucial soft skills.
Should I Be Afraid Of Layoffs?
Yes and No. Layoffs are inevitable in the modern tech industry which is increasingly driven by shareholder influence for publicly traded companies. It’s a truism (at least in my opinion) that if one of the major FAANG (See question below) companies goes through a layoff cycle then others will follow often as a “copy cat” measure to prove to their shareholders that they too are keeping up with the trends. As evidence of the copy cat phenomenon look at two very recent situations at Amazon and Dell which have mandated a Return to Office policy that effectively eliminates remote work. Amazon did it first, now Dell and others are sure to follow.
In reality there is little you can do about layoffs and you should put it out of your mind when looking for a job. Obviously, if a company has a reputation for frequent layoffs then maybe avoid it. This would be companies that are reliant upon government contracts (or grants). Once the money runs out from the contract then it’s possible your job will be terminated. The aerospace industry has this problem as do medical research institutions that hire tech people in support of a grant.
On the other hand if the contract or grant is in the 3 to 5 year range and you would be beginning your employment near the beginning of the funding cycle then apply for the job. By the way, it’s your job to Google all you can about a prospective employer to determine the corporate health of that organization.
Keep in mind that there are some major organizations who practice a form of yearly layoffs without actually calling it that. At least one popular employer implements a policies which require some percent of the company to be “managed” out for “low performance”. In effect, all managers are required to “create” low performers in their group when in fact they might not actually exist. This also creates an unhealthy environment in my opinion where employees are competing against each other when they should actually be working together.
But How do I get a job?
Well you haven’t been paying attention now have you? Just kidding. The short answer is to have someone in your network to give you a hookup. Wait… You say you don’t have a network? Why not? That’s really the “secret” to getting employment.
Quantitative people have a reputation for being less outgoing - probably because they are always studying ! You should always be looking for seminars and presentations relating to your areas of interest. Note that when I say “networking” I mean “meeting people”.
If you are a current Rollins student you should be attending School of Medicine presentations on bioinformatics, metabolomics, proteomics, genomics and anything else that might intersect with your statistical background. These presentations will describe in-progress research and publications which you should read to see how medical investigators apply statistical methods.
At the end of the day ALL of those fields rely upon statistical ideas to make discoveries and predictions. The idea is to hear what they are doing and what results they are getting. If what you hear isn’t interesting then you will need to think about more general areas of possible work. That’s fine. Not everyone who studies biostatistics will wind up being a biostatistician but it’s a good field to be in.
Back to the point of networking - it’s only when you interact with professional-level statisticians and quantitative researchers will it become apparent what it is you like. Sometimes you get the “lightbulb” just by talking to someone.
So I Attend Lots of Presentations But Nothing Happens
So if you are just attending presentations and seminars and are leaving immediately thereafter then you aren’t doing your job in finding a job. It’s incumbent upon you to do a little advance research on the presenter and formulate some possible questions you might ask. Come up with a question any question. Well. Do try to make it relevant but the point is to ASK and ENGAGE with the speaker.
If you are shy then wait for someone else to ask a question. If what they asked sounds interesting then approach them after the seminar, introduce yourself and ask them if they would be interested in a larger discussion. Talk yourself up - talk about what you are doing in class. Talk about projects that you would like to pursue. You may or not get an enthusiastic response but I can tell you that plenty of investigators are looking for someone to help them. Now, you have to be careful. You don’t want to jump on a major project if you already have a full plate. But a targeted contribution can happen.
Okay I Got A Job interview - What next?
That’s another presentation. Stay tuned.
I Often See References to FAANG on LinkedIn. What is that?
This is an acronym used to refer to the top tech companies that are generally considerable a good place to work. Note that there
F - Facebook (now Meta)
A - Apple
A - Amazon
N - Netflix
G - Google (Alphabet)
Note that many people really want to work for these companies but jobs in these companies are highly competitive so expect for the process to be challenging just to get an interview. That said, anyone with demonstrated skills and ability will get the attention of recruiters at these companies. It might take multiple attempts.
Should I Focus Mostly On Deep Leaning and “AI”?
Maybe, but remember that much of what we call “machine learning” doesn’t involve Deep Learning or Neural Networks. In fact, many “AI Engineer” roles in HR still focus on traditional machine learning concepts you’ll need to master—like confusion matrices, performance metrics, hyperparameter tuning, and feature engineering. For now, I’d recommend focusing on statistics and exploring data scientist or machine learning engineer positions.
Will ChatGPT Replace Quantitative Jobs?
Well, this is the big one, isn’t it? The fear surrounding Large Language Models (LLMs) like ChatGPT is certainly on the rise. But here’s the truth: anyone overly reliant on LLMs isn’t doing themselves any favors. At some point, you need to have skills you can call on without running to Google or ChatGPT.
If you can do that, you’ll have a clear advantage over those who lean on LLMs for answers. Most technical interviews involve on-the-spot evaluations, so if you’re not comfortable with that, it’s time to prepare—or risk looking out of your depth compared to candidates who have their knowledge on lock when it comes to stats, building models, and defending them.
That said, LLMs will absolutely change the way we work with data, but they won’t replace the need for human analysts—at least not anytime soon. The quality of AI-generated results is still iffy at times, with some pretty bizarre assertions thrown into the mix. Let’s not forget about “hallucinations,” where LLMs spit out authoritative-sounding nonsense. Remember, an LLM’s job is to come up with plausible responses, even if they’re totally off base.
It’s also true that companies are 1) cutting back on entry-level hires as they experiment with AI to see if and to what extent they can replace entery level software developers. Despite what the US Labor handbook says (or what it doesn’t say) jobs for software developers are not out there as much as they should be. It might have been more accurate for the handbook to have said that growth for experienced software developers is growing. Check out this plot below.
Hey! I’ve learned and used SAS in some of my courses. Is it still Viable In The Workplace?
SAS remains a critical tool in many organizations, particularly in the pharmaceutical industry and government agencies like the CDC, where vast amounts of historical code have been developed in this language. These organizations rely on SAS for regulatory submissions, clinical trials, and large-scale data processing. Switching away from SAS would require a complex change management process and rigorous testing, making it more practical for them to continue using it. As a result, having proficiency in SAS remains a valuable skill, especially for those entering fields where legacy systems are prominent.
However, while SAS is a mainstay, there is a growing demand for R and Python, especially when it comes to developing new models and procedures. These languages offer flexibility, open-source resources, and are widely used for cutting-edge data analysis and machine learning. By gaining expertise in both Python and R, along with SAS, students can position themselves as versatile data scientists capable of contributing to both legacy and modern data ecosystems. This combination of skills will make you highly adaptable and competitive in the evolving data landscape.
Working At SAS - Independently of what industries use SAS I can say that I’ve had various friends and colleagues who have worked directly for SAS who report a very positive work environment. Note that this is just my experience but I’ve heard this a number of times through the years. While I haven’t used SAS in a long time I can say that whenever I interacted with SAS support it was a positive experience.
Being multilingual in programming is as valuable as knowing multiple spoken languages when communicating within a global community. Just as fluency in different languages allows you to navigate various cultures and contexts, proficiency in multiple programming languages—such as SAS, R, and Python—enables you to work seamlessly across different technical environments. In industries like pharma and government, where both legacy systems (often written in SAS) and newer technologies (R and Python) coexist, being able to “speak” each language is essential.
This versatility not only expands your opportunities but also makes you more adaptable. Whether you’re maintaining established SAS code for regulatory compliance or developing innovative models in Python and R, being proficient in multiple languages ensures you can tackle a broader range of projects, collaborate across diverse teams, and future-proof your career in an ever-evolving data landscape.