Imagine a future where machines do everything for us. Where humans just…atrophy. Where we become so reliant on automation we literally can’t walk anymore. Where we can clap our hands and a robot will appear to solve our every whim, whether that’s something to drink or shade from the sun. Wait… Isn’t that just WALL-E?

If you prefer watching over reading, you can do so right here:

Alright so it’s fair to say that the fact we haven’t invented hover-chairs yet means we’re a little way off Disney’s prediction but the future could, in some ways, be much closer than you think. It’s not curious and heart warming little robots like WALL-E and it’s not Terminators either. It’s something far more subtle, something that many of us interact with every single day. It’s AI. Specifically, ChatGPT.

And it’s not just tasks we’re outsourcing either. We might be outsourcing our intelligence.

Is this the dawn of a new era of human ingenuity or are we on the fast track to intellectual laziness? The latest research is sounding some serious alarms.

WALL-E in some water looking sad — *I wouldn’t mind a little WALL-E helping me out though because this guy is adorable.*

30th of November 2022. It’s not a date burned into my memory, but it might be a significant one when it comes to human history. This was the day when conversations about AI shifted, from talks about the abstract to what is actually really here.

I was not an early adopter, but like many of us the curiosity eventually got the better of me and I had to try the tool myself, initially scoffing at its hallucinations and ability to forget basic maths until I suddenly had a moment of: “Huh. This is actually useful.”

Of course the utility of ChatGPT depends on what you use it for and this point is becoming a growing source of debate. While certain businesses rub their hands gleefully at the thought of replacing workers with AI and heralding in the dystopia, the world of medical science is getting access to diagnostic tools that could quite literally save lives.

The involvement of AI in many locations is highly controversial, but perhaps the industry that it has disrupted the most is the world of education.

So giving away my age here, I am a student of the internet age. I come from a techy family and I’m not joking when I say that I knew what a computer virus was before I knew humans could get them too.

I was a pretty techy teen, had spent hours getting steam games to run on my linux laptop and been called a nerd by my IT teacher for writing html. So by the time I got to university, the following had been drilled into me pretty clearly:

Check your sources are from reputable websites.
Wikipedia is not a reputable source.
Don’t talk to strangers online.

So when it came to university I was making use of the tools available to me. I was studying maths so, unsurprisingly, was constantly using wikipedia to look up and understand concepts, using Google Scholar to find papers to reference and desperately hoping that Wolfram Alpha would be able to solve the problem my friends and I were stuck on. Which, given the problem usually involved proving it, meant we had to reverse engineer our way there anyway.

Fast forward more than a decade and the university landscape looks very different. Instead of staring at the Wolfram Alpha solution wondering how it got there, ChatGPT can tell you. Give you the exact steps it took. It can write things up neatly, using LaTeX so you can copy and paste that right into your academic paper. It’s been a while since I last had to write a proof, but I bet it could take a stab at at least the fundamental ones if I asked it.

Graduates standing in their graduation robes and gowns — *Back in my day… I’ve earned the right to be old and grumpy now ;)*

In the end the majority of my degree was based on exams, hours spent in silence in a hall, hand cramping from writing. No involvement of AI just there. But for subjects in which exams aren’t so heavily favoured? I mean, why would you write your own essay if ChatGPT can do it for you?

This creates a huge problem for universities and professors. Plagiarism has been a hot topic in the industry for decades with specialist tools to scan papers and highlight any instances of things that are a little too similar. Universities have disciplinary procedures specifically to handle this and some students unfortunately find out the hard way that cheaters don’t prosper.

But if your essay is completely new original content then it won’t get flagged. Even if the original content wasn’t by the person whose name is in the top corner.

So what’s to stop someone asking their essay question to ChatGPT and copy and pasting the result? Of course there’s the moral argument. Cheating is bad. But that’s never stopped people from trying in the past. Of course there is the issue that ChatGPT hallucinates, makes things up that are inaccurate and there’s a risk but unless the professor can somehow evidence that you used ChatGPT, at worst you’re going to be knocked down a few marks for getting something wrong.

There’s the argument that it devalues degrees from the perspective of an employer, because if everyone is getting a good mark based on creating things with ChatGPT then a good degree is no longer a marker of good.

But that’s so abstracted from you as an individual. You can’t control other people devaluing your degree so if you can’t beat them, might as well join them, right?

But here’s why you might want to think twice about that.

A statue of a brain, made from balls and sticks to look like neurons — *I needed to break up the page and didn’t have a creative idea so… here’s a brain.*

So given ChatGPT has only existed for less than three years at this point it takes a little while to really assess the impact it’s coming. But enough time has passed since its genesis that we’re actually starting to see some research coming out in this space, and the research is very interesting.

So one of the use cases that has been suggested for AI in education is personalised learning. That AI will be able to break down concepts in a way that corresponds to the person’s unique learning styles, meet them at their level of understanding and explain it just so that they will understand it. If you need the ELI5 version then it will give it to you. But if you have an advanced degree in this topic and are looking to learn about the cutting edge research then it’ll give you all the technical details instead.

It will also open up the ability to ask questions so you can drill down into the parts you need to clarify and confirm your understanding. A truly personalised learning experience.

Now this obviously isn’t the reality that we live in today. Ask ChatGPT a simple question and you get a twelve paragraph essay in return. That’s definitely not something broken down for a five year old.

Now this is something that better prompting can improve so let’s say we’ve got some good prompting going on as well.

The Cognitive Load Theory suggests that human working memory has a limited capacity for learning and learning activities take up a large amount of our cognitive resources. So in theory, if we use ChatGPT to remove some of that cognitive load, ie it telling us the information rather than us needing to go looking for it or work it out ourselves, then that frees up some of our cognitive processes to focus on learning the information we need to. But does that theory hold up?

So earlier I suggested that ChatGPT can write you essays and I’m sure the more skeptical of you watching would have thought: “yeah but are those essays any good?” so right here I have a paper where they looked into that.

In this study, the students were split into four groups, ones that had human support, ones with LLM support, ones with non-AI writing analytics tools and the ones that had to go at it alone. In the study they measured a few things: the metacognitive load, intrinsic motivation, learning and the actual quality of the essays themselves.

An ink pen writing on lined paper — *I was a maths student, as if I ever wrote essays like this.*

What they found was that the intrinsic motivation wasn’t statistically different between the groups and actually neither was the learning, but the metacognition load was much lower for the ChatGPT group and their essay scores were significantly improved. So that’s great, right?

Using ChatGPT didn’t kill off their motivation, neither did it make them worse at learning but they got higher scores and better essays out of it. Case closed, let’s all use ChatGPT to write our essays. But actually, let’s go a little deeper.

It’s funny because what wasn’t immediately apparent from this paper is that the experiment was to use ChatGPT to do research, not actually to write the final text and that apparently the experiment was set up to block ChatGPT from doing so. The authors flat out say though that they thought students were working out how to bypass it and in a lab setting literally watched some participants copy and paste text from ChatGPT that was directly optimised for the scoring rubric which I assume they had been given out.

That led them to conclude two things, that this out performance might have been the result of “AI-empowered learning skills” that optimise short term performance at the expense of long term knowledge acquisition and that ChatGPT performed so well because the criteria and scoring feedback were so well defined. Not sure whether that’s just, as the kids call it, “copium”. Let’s have a look at something else.

One valid argument that I’m sure will come up is that LLMs are always improving, it takes time for a study to come out and by that point the version of ChatGPT used will be out of date, so for that reason I’m glad to have come across a super recent study that useless ChatGPT-4o, the latest publicly available version at the time of writing.

Now I’m not going to go into the full detail of the study, I’ll link it down below, but this time they actually were scanning the brains of the essay writers in the experiment so had brain activation data to look into as well. What they found was that there were significant differences in neural connectivity patterns between those using ChatGPT, those using Google and those using just their own brains to write essays. In fact, the search engine group had 34-48% less brain connectivity than when using no tools and the LLM group had even less at 55%.

One of the things they tested was both the ownership the participants felt they had over their work and their ability to recall what they’d written. For ownership, some of the participants in the LLM group actually completely denied writing their essay which is pretty funny and I’d like to see correlation between that and their views on LLMs going into the experiment. Some of them claimed partial ownership from half to 90% of it, whereas this was higher for the search engine group and of course the brain only group took full ownership.

The recall was perhaps the most interesting though. In the first of three essay writing sessions, 83% of LLM participants had difficulty quoting their essays and none of them provided correct quotes. In the third session a third still couldn’t quote correctly. The other two groups were near perfect in quoting by session two with the brain only group most accurate.

I’ll note as well that in this study there were human assessors reviewing the output and they claimed (without knowing some were written by LLMs) to be able to tell a number of essays had a very similar writing style and those essays happened to be the LLM groups so maybe your teachers are more clued in than you think.

A female teacher with ginger hair standing in front of a white board — *Your teacher’s face when she overhears you saying teachers can’t tell whether you’re using AI.*

Anyway, you might be thinking, okay well if it’s something I care about learning and remembering then that time I’ll do it myself and just use the LLM for things I don’t care about as much. But the study has a further cautionary tale for you.

They brought a group of people back for a fourth session in which the LLM group became a brain-only group and the brain only group could use the LLMs. This time they asked them to write about a topic they had covered before. And while the former LLM group, now brain-only group did have better neural connectivity than the original brain-only group in the first essay, it was nowhere near the brain activity in the second and third session of that group. The authors suggest that showed an initial gain in improvement but no deeper neural integration.

The brain-to-LLM group in contrast had an even better brain connectivity using a LLM on a familiar topic. While 78% of the LLM to brain group couldn’t quote their essay at all and only 11% could quote correctly, these stats were reversed in the brain-to-LLM group.

So putting this simply, using ChatGPT too early in the learning process might result in poor recall and worse learning outcomes.

In fact I’m going to read you an exact quote from the paper:

“While these tools offer unprecedented opportunities for enhancing learning and information access, their potential impact on cognitive development, critical thinking, and intellectual independence demands a very careful consideration and continued research.”

— Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task, Kosmyna et al.

Alright so we’ve looked at ChatGPT writing essays but you might be thinking: “You’re a channel about data science. Why are you making an essay about writting essays?” See what I did there? So okay, I hear you, let’s generalise this to the scientific world.

This study got participants to make a scientific recommendation about nanoparticles in suncream based on their research. The cognitive load was lower and actually in contrast there wasn’t a difference in the diversity of recommendations or justifications between the google and LLM group, perhaps because essays are more open ended than scientific opinions. But unfortunately the LLM users did show lower quality reasoning and arguments in their responses.

Getting even closer to data science now, a school in Taiwan split their C++ learning students into two groups, ones that stayed learning with their teacher and problem sets in one group, and one that could use ChatGPT for help. The authors of the paper fully expected the ChatGPT group to come out better by the way, but I think you and I can guess what happened.

The ability to go into flow state, thought to enhance learning, self-efficacy and performance were all lowered in the group using ChatGPT.

Some python code on a screen — *The question of “Is AI good at programming?” is always an ongoing source of debate depending on who you ask…*

So how do we rationalise this among all the excitement about incorporating ChatGPT into education? Because it wasn’t actually the inaccuracies and hallucinations that were the issue, the recommendations were sound, the essays were of good quality (and alright, there were inaccuracies in the programming paper but that wasn’t the only issue). So what was the reason?

Actually, the theories of learning are counterintuitive. In order to learn something, it’s not about breaking things down into the most easily digestible format so we can take it in without struggling too much. It turns out that we don’t actually encode information that way. Instead, we need to struggle.

It turns out that in order to learn we need desirable difficulty. That’s not to say we should always be working at maximum struggle, if it’s too far out of reach because we don’t have the background knowledge or skills then this is obviously undesirable difficulty and means we won’t learn. But desirable difficulties are where the learning happens.

So what are desirable difficulties? They include varying the conditions of learning, interleaving instruction on separate topics, spacing and using tests rather than presentations.

So to explain this I’m going to illustrate varying the conditions of learning with an example at the piano that I am ashamed to say I only learned recently. Say I’ve got this hard section that I need to practise - you can view the video to see it. So the way I used to be able to do this was to play that section over and over again until I finally found it sinking in. But turns out that’s not the best way to do it.

Leigh sat next to a grand piano — *Yep, as well as a data scientist I’m a legit pianist*

Instead I need to vary what I’m doing. Instead of playing it straight, I can play it with a swung rhythm. Instead of playing it legato and joined up, I can play it staccato. It seems unintuitive but since I switched to this practise style my performance and speed of learning have dramatically improved.

But instead of just anecdotes I wanted to point you to a study in which kids practised throwing beanbags on a target on the floor that they couldn’t see at the time of the throw. Half of the group practised with the target at a fixed difference whereas the other group varied the distance of their targets. The ones that practised with a variable difference were far more accurate when it came to the test.

Similarly, I could use interleaved practise by practising another part I need to work on and come back to it. I can space out my practise by doing a little bit of it every day instead of trying to secure it all today and finally I can test myself by “performing” it to an audience, even if that audience is just a camera.

So relating this back to the studies from earlier. The students were not learning because there was no desirable difficulty. They asked ChatGPT for the answers and they received them. None of the desirable difficulties were present.

What’s key to note about learning with desirable difficulty is that it often doesn’t feel like this is the best way of learning. When I practise the same thing over and over again it feels more secure at the end of the session, whereas my interleaved or varied conditions don’t. But we’re often a poor judge of what we’re learning, and just because it feels better doesn’t mean it is.

But actually, when I look at these studies, the participants using ChatGPT didn’t actually feel like they were learning better. In the essay writing study, one participant after three sessions concluded that ChatGPT wasn’t worth it for the assessment. Another mentioned that it was “fine for structure… [yet] not worth using for generating ideas”.

In the programming study, a student was even more harsh with: “I transitioned from being able to learn to being unable to learn.” Ouch.

Now there are a few gaps in this analysis. Firstly, we can’t say how much of this is inexperience with ChatGPT. We do not know how knowledgeable the participants in the studies were with prompt engineering. We also don’t know if future versions of LLMs will be able to overcome these challenges. So with this understanding of learning where does this leave us?

One thing that’s important to note about all these studies is that the primary goal of using ChatGPT in these studies is to do a task. It’s to write an essay, or to make recommendations, or to solve a programming problem. It’s not to enhance learning.

Now when it comes to school and university work, the point of the assignments isn’t to just get the project done, it’s to do the learning. It’s a varied condition of learning. It’s probably set spaced from the time the information was first presented, interleaved with other topics and it can also act as a test of our knowledge. By actually doing the work yourself, you’re hitting all of these categories of learning.

So what does this say? That we should all give up on ChatGPT? That our brains are doomed going forward and the younger generation will never grow up to be employable? Well, not really.

Despite ChatGPT not being good for learning in these studies, it did have utility. It was useful for structuring and putting together ideas that had been previously generated. It did help build recollection in the group that used it to revisit a topic they had already studied. Using ChatGPT in the right way is key.

I didn’t use ChatGPT to do the research for this blog post, and the more papers I read the more glad I was that I didn’t. But, emboldened by my research, I did decide to use it to help me build a structure. The post has evolved since then, these are my own words, but it gave me the blueprint I needed to get started.

And of course, I do need to mention that all of these studies were done using ChatGPT, but based on the outcomes we’ve discussed there’s no reason to believe this wouldn’t generalise to other LLMs.

So where does this leave you, the reader, looking to build your data science skills? Well I mean, I’ve just given you the blueprint of how to learn better. But more importantly when it comes to life and careers in the real world, companies don’t want employees who can just churn out the same stuff that ChatGPT can come out with. The top salaries go to people with the best individual insights. And that requires learning.

And that’s our mission at Evil Works. To enable data scientists to learn with the right desirable difficulties. Come on the journey with us by joining the Evil Lair.

References:

Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance

Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry

The effectiveness of ChatGPT in assisting high school students in programming learning: evidence from a quasi-experimental research

Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

Does ChatGPT make you stupider?

My hardest Pokemon battle was with the data

Blog Starts 2025/09/10

Evil Works

Join Our Mailing List

Thank you!