Google Trends Is Misleading You
Google Trends. What a gift to society this is. If not for google trends, how would we have ever known that more Disney movies released in the 2000s led to fewer divorces in the UK. Or that drinking Coca Cola is an unknown remedy to cat scratches.
Wait, am I getting confused by correlation vs causation again?
If you prefer watching over reading, you can do so right here:
So why am I looking at motivation anyway? When I was at University, I distinctly remember the day I sequestered myself away in the silent section of the study room, so desperate I was for my undiagnosed ADHD brain to concentrate that I denied myself all possible forms of distraction so that I could revise for my differential equations exam.
But the problem with undiagnosed ADHD is that when the executive function ain’t executive functioning, it’s really hard to get started and really hard to keep going when you do.
So it was on this glorious summer day that I was stuck inside for that I was desperate. And so I did what any early 2010s student would do when they find themself completely bereft of the willpower to revise and googled the word “motivation”.
It was there that I read a quote that stayed with me over a decade later: “Pain is temporary, quitting lasts forever.” It was inspiring. It was deep. To my 20 year old self it gave me the drive I needed to push through my exams. The discovery since that Lance Armstrong was allegedly doping at the time has since dulled my appreciation for this quote but still, it was enough for me to pick up my pen and start writing proofs.
My activity that day brought an interesting behavioural question to my mind though, because logically, the people googling the word “motivation” are unlikely to be the ones that have enough of it - they’re actually getting on with the work - but the ones that are finding things a slog. Googling “motivation” is likely the siren song of the demotivated.
So being a nerd, I was curious whether I could model it and went straight to google trends, happily entered the word “motivation” and grabbed myself a cup of hot chocolate to get stuck in.
Anyone else go for hot chocolate as their drink of choice?
Now, I don’t know if you’ve used Google Trends before but if you haven’t, I’m going to talk you through it so that we can get to the meat of the problem.
So I’m going to search the word “motivation” and it’s going to default to the UK because that’s where I’m from and to the past day and we have a lovely graph which shows how often people were searching the word “motivation” in the last 24 hours.
24 Hours of Motivation in the UK
I love this because you can see really clearly that people are mostly searching for motivation during the working day, no one is searching it when most of the country is asleep and there’s definitely a couple of kids needing some encouragement for their homework. I don’t have an explanation for the late night searches but I would kind of guess these are people not ready to go back to work tomorrow.
Now this is lovely but while eight minute increments over 24 hours does give us a nice 180 data points to use, most of them are actually zero and I don’t know if the past 24 hours have been highly demotivating compared to the rest of the year or if today represents the year’s highest GDP contribution, so I’m going to increase the window a little bit.
The moment we go to a week, the first thing you notice is that the data is a lot less granular. We have a week of data but now it’s only hourly and I still have the same core problem of not knowing how representative this week is.
I can keep zooming out. 30 days, 90 days. At each point we lose granularity and don’t have anywhere near as many data points as we did for 24 hours. If I’m going to build an actual model, this isn’t going to cut it. I need to go big.
And when I select five years is where we’re going to encounter the problem that motivated this entire video (excuse the pun, that was unintentional): I can’t get daily data. And also, why is today not at 100 anymore?
Five years of UK motivation searches
Herein lies the real problem with google trends data. Google doesn’t actually publish figures on their search volume. That information prints dollars for them and there’s no way they would open that up for other people to monetise. But what they do give us is a way to see a time series, to understand changes in people’s searches of a particular term and the way they do that is by giving us a normalised set of data.
This means that whatever time period I use or whatever single search term I use, the data point with the highest number of searches is immediately set to 100. All the other points are scaled down accordingly. If the 1st of April had half the searches of the maximum, then the 1st of April is going to have a google trends score of 50.
And this is a problem. Because I bet as I was lamenting the fact that I couldn’t get daily data for five years, at least one of you was thinking “just take the maximum time period you can get daily data for and move that window” and yes, that was my first idea too. Then I discovered the normalisation.
So let’s look at an example here just to illustrate the point. Let’s take the months of May and June 2025, both 30 or 31 days so we have daily data here, we actually lose it beyond 90 days. If I look at May you can see we’re scaled so we hit 100 on the 13th and in June we hit it on the 10th. So does that mean motivation was searched proportionally as many times on the 10th of June as it was on the 13th of May?
Google trends data for May
Google trends data for June
If I zoom out now so that I have May and June on the same graph, you can immediately see that that’s not the case. When both months are included we see that the searches for motivation had a google trends score of 83 on the 10th of June, meaning as a proportion of searches in the UK, it was 81% of the proportion of searches on the 13th May. If we didn’t zoom out, we wouldn’t have known that.
May and June on the same graph
Now all is not lost, we did get a good bit of information from this experiment because we know that we can see the relative difference between two data points if they’re both included in the same graph, so if we did load May and June separately, knowing 10th of June is 81% of 13th of May means we can scale June down accordingly and the data will be comparable.
So that’s what I decided I’d do. I’d fetch my google trends data with a one day overlap on each days, so 1st of Jan to 31st of March, then 31st of March to 31st of July. Then I could use March 31st in both data sets to scale the second set to be comparable to the first.
But while this is close to something we can use, there’s one more problem I need to make you aware of.
So when it comes to google trends data, google isn’t actually tracking every single search. That would be a computational nightmare. Instead, Google makes use of sampling techniques so to build a representation of search volumes.
This means that while the sample is likely very well-built, it is Google after all, each day will have some natural random variation. If by chance March 31st was a day where Google’s sample happened to be unusually high or low compared to the real world, our overlap method would introduce an error into our entire data set.
On top of this, we also have to consider rounding. Google trends rounds everything to the nearest whole number. There’s no 50.5, it’s 50 or it’s 51. Now this seems like a small detail but it can actually become a big problem. Let me show you why.
On the 4th of October 2021, there was a massive spike in searches for Facebook. This massive spike gets scaled to 100 and as a result everything else in that period is much closer to zero. When you’re rounding to the nearest whole number that tiny error of 0.5 suddenly becomes a huge proportional error when your number is only 1 or 2. This means that our solution has to be robust enough to handle noise, not just scaling.
So how do we solve this? Well we know that on average the samples will be representative, so let’s just take a bigger sample. If we use a larger window to get our overlap, the random variation and rounding errors have less of an impact.
So here’s the final plan. I know I can get daily data for up to 90 days. I’m going to load a rolling window of 90-day periods but I’ll make sure each window overlaps by a full month with the next. That way, our overlap isn’t just one potentially noisy day but a stable month-long anchor that we can use to scale our data more accurately.
So it sounds like we’ve got a plan. I’ve got some concerns, mainly that by having lots of batches there’s going to be compounding errors and it could result in big numbers absolutely blowing up. But in order to see how this shakes out with real data we have to go and do it. So here’s one I made earlier.
After writing up everything we’ve discussed in code form and, after having some fun getting temporarily banned from google trends for pulling too much data, I’ve put together some graphs. My immediate reaction when I saw this was: “Oh no, it blew up”.
Those are some big spikes some days…
The graph below shows my chained-together five years of search volumes for Facebook. You’ll see a pretty steady downward trend but two spikes stand out. The first of these was the massive spike on 4th October 2021 that we mentioned earlier.
These spikes are scary
My first thought was to verify the spikes. I, unironically, googled it and found out about widespread Meta outages that day. I pulled data for Instagram and Whatsapp over the same period and saw similar spikes. So I knew the spike was real but I still had a question: Was it too big?
When I put my time series side-by-side with Google Trends’ own graph, my heart sank. My spikes were huge in comparison. I started thinking about how to handle this. Should I cap the maximum spike value? That felt arbitrary and would lose information about the relative sizes of spikes. Should I apply an arbitrary scaling factor? Again, it felt like a guess.
Five years of Facebook searches on google trends.
That was until I had a bolt of inspiration. Remember, Google Trends is giving us weekly data for this period, that’s the whole reason we’re doing this. What if I averaged my data for that week to see how it compared to Google’s weekly value?
This is where I breathed a huge sigh of relief. That week was the biggest spike on Google Trends so set to 100. When I averaged my data for the same week, I got 102.8. Incredibly close to Google Trends. We also finish in about the same place. This means the compounding errors from my scaling method haven’t blown up my data. I have something that looks and behaves just like the Google Trends data!
So now we have a robust methodology for creating a clean, comparable daily time series for any search term. But a time series for “Facebook” isn’t what we’re interested in, is it? It’s “motivation”. And that brings us to the next big problem we have to solve before we can get to our final model.
Now the reason I was looking at Facebook isn’t because I’m in my boomer phase. It’s actually fundamental to answering this motivation question. My ambition was bigger than just looking at the UK. I wanted to get a sense of how motivated people are around the globe and compare countries to each other. I wanted to see if the UK are comparatively keen beans or slumming it. And this wasn’t as easy as I thought it would be.
Because while Google Trends allows you to compare multiple search terms it doesn’t allow direct comparison of multiple countries. So I can grab a dataset of motivation for each country using the method we’ve discussed today, but how do I make them comparable? Facebook is part of the solution.
But this solution is one for the a later blog post, one in which we’re going to build a “basket of goods” to compare countries and see exactly how Facebook fits into all of this.
So today we started with the question of whether we can model national motivation and in trying to do so immediately hit a wall. Because Google Trends daily data is misleading. Not due to an error, but by its very design. We’ve found a way to tackle that now, but in the life of a data scientist, there are always more problems lurking around the corner.
Come on the journey with us by joining the Evil Lair.