WTF is a data scientist?
What the f**k is a data scientist?
That’s literally the question I get every time I go out and someone asks, “So, what do you do?” I say, “I’m a data scientist,” and they just stare at me like I’ve made the job up.
I feel like I’ve explained it in about fifty different ways at this point, and I still don’t think people actually get what it is. So now I’m thinking: maybe I just need to write it down.
If you’re reading this because I sent it to you after we met: hi, yes, this is the TED Talk I didn’t want to give in the smoking area.
Let’s start stupidly simple.
How Data Scientists use Data, Math, and Code to predict behaviour
A data scientist is basically:
Someone who uses data (numbers, text, clicks, whatever) + maths + code to answer questions like:
What’s happening?
Why is it happening?
What will happen next?
And what should we do about it?
That’s literally it.
It sounds fancier because companies like to throw buzzwords on LinkedIn and write “AI-powered” on everything.
You’ve already been attacked by data science today and you didn’t notice.
Netflix recommending a show you actually like?
That’s models trained on what millions of people watched, when they paused, what they binged, and then going:
“People like you tend to watch this next.”Spotify dropping a “Discover Weekly” playlist that is suspiciously accurate?
Same story. It's just maths.Supermarkets putting certain things at eye level, or sending you coupons that weirdly match what you buy?
They’re not guessing. They’re looking at your purchase history and everyone else’s, and then running models to see what makes you spend more.
Behind all of that: data scientists. We’re not psychic. We just use data to make very educated guesses that feel like magic.
Netflix recommendations powered by data science and predictive algorithms.
The Real Steps of Data Science: From Messy Data to Actionable Insights
Let me ruin the illusion for a second: most of data science is not building sexy AI robots. It’s basically cleaning and waiting for your code to run and praise that won't break. Lets break this down and 6 parts:
Step 1: Get the data
This is where someone says, “We want to understand X,” and I go digging:
spreadsheets
databases
APIs
logs
random half-broken files some guy from reddit had uploaded on 2012
Step 2: Realise the data is trash
Missing values.
Wrong formats.
Duplicates.
Someone wrote “yes”, someone wrote “y”, someone wrote “YESS” and someone just put an emoji.
So I spend… a lot of time fixing that.
Step 3: Exploration
I start doing:
“How many X do we have?”
“Is this going up, down, flat, weird?”
“Are there obvious patterns?”
This is charts, groupbys, summaries. It’s the “getting to know you” phase but with numbers.
Step 4: Modelling – the ‘magic’ bit
Once I know what’s going on, I build something that can predict or classify or rank stuff. That might be:
A simple model: “People who did A are 3x more likely to do B.”
Or a more complex one: “Given these 20 things, what’s the chance this user will churn / click / buy / run away?”
This is where people think I’m doing evil work. It’s just maths and code.
Step 5: Does this actually work?
take data the model has never seen before and throw it at it to see if it freaks out. I check the accuracy, error and confusion matrix. Then I actively try to break it: is this thing actually robust, or did it just get lucky?
And honestly, this is the part of data science that no one puts on the shiny “AI transformation” slides.
It’s you, at 11:50pm, realising the model you were hyping all day completely falls apart on one tiny segment of users. It’s rerunning things, tweaking features, wondering if you’re stupid or if the data is cursed (spoiler: it’s usually the data). It’s explaining, again, that “80% accuracy” does not mean “the model is always right”, while silently questioning all your life choices.
And it’s not just the cursed data, it’s the time.
You hit run, and suddenly you’re in progress-bar hell, watching a model train for 40 minutes just to tell you it’s slightly worse than the last one. Or worse: it doesn’t even finish. It crashes at the end because of one tiny typo in one cell, one missing bracket, one column name you forgot you renamed 3 hours ago. You kick off a grid search, go make a coffee, come back, it’s still “epoch 3/50”.
Sometimes the real skill in data science is estimating if you have time to try one more experiment before your laptop takes off like a jet engine and your will to live times out with the kernel.
By the way, if you’re particularly feeling the pain of this step, you’re actually in the right place. Here at Evil Works we’ve been designing a platform to get you out of progress-bar hell or at least reduce the amount of time you spend there.
Through better caching, the ability to spot your annoying typo much earlier in the process and actual multithreading we can cut down a big chunk of your wasted time. The closed beta is launching in Feb, Click this link to grab your spot
Our Puff Platform logo
Step 6: Explain it to humans
Honestly, this is half the job. Because a model is useless if no one understands it. So I have to turn:
“We used a regularised logistic regression with feature scaling and cross-validation”
into something like:
“We looked at the patterns, and three things matter most: how often they come back, how much they spend, and how recently they bought. If those drop, the chance they leave goes way up.”
Same result. One sounds smart on a CV, the other actually gets a budget approved.
Let’s clear a few things up:
I don’t fix your WiFi but have you tried turning it on and off again?
I don’t build websites for your cousin’s candle business.
I can’t “just ask the algorithm” like a person.
Also: data science is not just:
“Press button, AI happen.”
Or “throw data in a model and wait for enlightenment.”
It’s a lot of:
arguing with messy spreadsheets
waiting for your data to process
coffee breaks
trying five ideas that suck
realising the original question was wrong
going back to stakeholders like: “Good news, I have an answer. Bad news, it’s not the answer you thought you wanted.”
So why are people paying nerds like me to play with data? Because guessing is expensive. If you:
launch the wrong product,
send the wrong customers the wrong offer,
stock the wrong stuff
you burn money.
Image of Money going up in flames because you didn’t invest in data science
Data science is basically:
“Can we make smarter decisions by looking at what actually happened in the past, instead of just going with Gary’s gut feeling in the meeting?”
We:
Find patterns humans can’t see at a glance.
Quantify risk and opportunity.
Help decide: “Is this worth it?”, “Who should we target?”, “What’s likely to happen if we do X vs Y?”
Not perfectly. Not magically. But better than guessing. Some days, it feels like:
solving puzzles,
doing little experiments,
proving or disproving assumptions,
and getting that weird rush when the chart finally makes sense.
Other days, it’s:
“Why is nothing working?”
“Why is this dataset like this?”
“Who did this and why do they hate me?”
But the core of it is:
Curiosity + maths + storytelling.
I’m constantly asking:
“If this is true in the data… what does it mean in real life?”
And then turning that into something that helps someone decide what to do next and stop wasting time.
So yeah, what is a data scientist?
I don’t fix printers.
I don’t just “do AI” in some dramatic sci-fi way.
I use data to answer questions, make predictions, and help people make less dumb decisions.
If you’ve made it this far and you still don’t know what I do… that’s okay.
Just remember this bit:
TL;DR: I take numbers and make them look like magic. Boom.
For more of this, come on the journey with us and keep being Evil