Your jupyter notebook IS NOT production - Part 2: Testing
Leigh Collier Leigh Collier

Your jupyter notebook IS NOT production - Part 2: Testing

A lot of Data Science education teaches you how to BE a data scientists but rarely does it teach how to WORK as a data scientist. In this video on this series we’ll be diving into how to test your code as a data scientist, including how to work with unit tests, regression tests, randomness and non-deterministic functionality as well as throwing in a few honourable mentions. Want to see more of this topic? Let me know in the comments. I’m weirdly passionate about unit testing!

Read More
Yes, You Can Do Spatial Data Science
Leigh Collier Leigh Collier

Yes, You Can Do Spatial Data Science

Your boss asks you to find the best location for a new store. “You’re a data scientist — this should be easy.”

You say: “Battersea.” Then comes the real question: “Where in Battersea?”

A neighbourhood isn’t enough, they need specific streets, footfall, and customer mix.

That’s where geospatial data science comes in.

Read More
Hypothesis testing for data scientists
Leigh Collier Leigh Collier

Hypothesis testing for data scientists

A +40% revenue lift looks great… until your boss asks: “Why?”

You changed multiple things, but don’t know what worked.

Even single changes (like button colour) can look like wins just due to noise.

That’s why hypothesis testing exists, to separate real impact from randomness.

Read More
Your Jupyter Notebook CAN’T be Production - Part 1
Leigh Collier Leigh Collier

Your Jupyter Notebook CAN’T be Production - Part 1

Your Jupyter notebook can’t be production and the biggest reason is that when things you wrong, it’s an absolute nightmare to debug. So let’s start by solving that problem from the start. In this series we talk about how to turn a Jupyter notebook into something that can actually be put into production, starting with the first step: How you write it.

Read More
Don't Let Text Data Blow Up Your Model. TF-IDF and Truncated SVD Tutorial
Leigh Collier Leigh Collier

Don't Let Text Data Blow Up Your Model. TF-IDF and Truncated SVD Tutorial

Not many people realise this, but an LLM can’t be your therapist. A therapist understands meaning. An LLM only sees tokens. To it, “I feel lonely” isn’t emotion, it’s text to convert into numbers. Get that conversion wrong and you get nonsense. Get it right and everything works.

So how do we turn words into data? … TF-IDF and Truncated SVD.

Read More
Stop Writing Bad Data Science Code
Leigh Collier Leigh Collier

Stop Writing Bad Data Science Code

It’s 4pm on a Friday and your model has “stopped working” in production.

Quarter-end reporting is due next week, so you drop everything to debug.

You rerun cells, check the data, double-check the maths nothing looks wrong. By 6pm, you’re restarting the whole notebook.

Then you find it: a column name change in one Jupyter cell that never propagated.

A simple mistake, but it broke everything.

Instead of relying on luck and “if only’s,” there’s a better way defensive data science.

Read More
How does a computer see hair?
Leigh Collier Leigh Collier

How does a computer see hair?

Have you ever wondered how computers can do this? Today we’re going to learn exactly how to do that. 

Detecting someone’s hair colour from an image sounds simple. 

Just find the colour of the hair and you’re done. 

But when I tried to automate this, the results were completely wrong. 

Sometimes the algorithm detected a yellow wall as blonde hair. Other times it picked up a brown shirt instead of the hair. 

The problem wasn’t the colour detection.

The real problem was that the computer didn’t actually know where the hair was in the image. And solving that problem turns out to be much harder than it sounds.


Read More
Feature Engineering: A beginner’s guide
Leigh Collier Leigh Collier

Feature Engineering: A beginner’s guide

Turn messy real-world data into powerful machine learning features. This post shows step-by-step feature engineering: cleaning raw CSVs, crafting meaningful variables, and fixing weak inputs so your data science models actually perform.

Read More
How do you differentiate code?
Leigh Collier Leigh Collier

How do you differentiate code?

This post shows how to answer “what if” questions from stakeholders using model sensitivity, scenario analysis, and data science, going beyond theory to make machine learning results clear and business-friendly.

Read More
Domain Knowledge: The Machine Learning Unlock
Leigh Collier Leigh Collier

Domain Knowledge: The Machine Learning Unlock

Discover why predicting love with data science is a nightmare. This blog dives into a failed Valentine’s linear regression experiment, exposing bias, messy variables, and why domain knowledge matters more than ever in real-world machine learning and AI.

Read More
Dijkstra’s Algorithm Tutorial with The Simpsons
Leigh Collier Leigh Collier

Dijkstra’s Algorithm Tutorial with The Simpsons

Ever wondered how route planning algorithms power Google Maps, LinkedIn suggestions, and Amazon drone delivery? This post explains graph algorithms and optimization through Bart’s trick-or-treating route in Springfield, making complex data science easy to understand.

Read More
What is incremental Computing? The Data Science Game Changer
Leigh Collier Leigh Collier

What is incremental Computing? The Data Science Game Changer

Incremental computing is transforming data science. Instead of rerunning massive models from scratch, this breakthrough lets data scientists update results instantly: saving time, cutting costs, and reducing waste. Discover how smarter, sustainable computation is reshaping analytics.

Read More
How to Actually Use ChatGPT
Leigh Collier Leigh Collier

How to Actually Use ChatGPT

Get a crash course in Large Language Models (LLMs) from a data scientist’s perspective. This blog breaks down how LLMs like ChatGPT actually work, cutting through jargon to explain AI, machine learning, and natural language processing in simple, practical terms.

Read More
WTF is a data scientist?
Leigh Collier Leigh Collier

WTF is a data scientist?

What do data scientists really do? This no-fluff guide explores the real world of data science, AI, and analytics from cleaning messy data to building models and explaining insights. Discover how coding, math, and storytelling power AI, Netflix, Spotify, and everyday decisions.

Read More
The 12 Days of Data Science
Leigh Collier Leigh Collier

The 12 Days of Data Science

Discover our “12 Days of Data Science” series! Using the Twelve Days of Christmas theme, we explain core data science concepts like machine learning, network analysis, fraud detection, and forecasting in a fun, accessible way.

Read More
Can Data Science Create the Next Christmas Hit?
Leigh Collier Leigh Collier

Can Data Science Create the Next Christmas Hit?

Can data science and machine learning craft the perfect Christmas song? Using Spotify data, music analytics, TF-IDF, and Elastic Net regression, we reveal the secrets behind hit festive tracks. Exploring lyrics, sentiment, BPM, and danceability to create a data-driven Christmas classic.

Read More
Is Die Hard A Christmas Movie?
Leigh Collier Leigh Collier

Is Die Hard A Christmas Movie?

People have argued about this forever: is it a Christmas movie or not?

Instead of debating it, I built a model.

I trained a simple classifier using movie metadata and soundtrack data to turn the “Die Hard argument” into machine learning.

The goal wasn’t fancy AI, just this: can we turn the usual talking points into numbers, train a model, and see what it says about Die Hard?

Read More
Is Black Friday back with a new Switch 2?
Leigh Collier Leigh Collier

Is Black Friday back with a new Switch 2?

Black Friday is back. But are we still that into it or is that just leftover lockdown energy? Remember 2020, when Switches disappeared in minutes and we were flexing our Animal Crossing islands (and yes, the fruit absolutely mattered). With a new Switch on the horizon, is Black Friday still the best time to buy, or just a myth we love to repeat? I’m pulling some google trends data to see if 2025 can beat peak lockdown

Read More