top of page
Search

REVISIONISM AND MODEL COLLAPSE

I don’t want to be an alarmist, but when I read this particular X-post from Elon Musk I felt an apocalyptic dread spread through the future of mankind. I don’t think what Elon is writing is actually possible. At least not the way he suggests. But it does suggest two things:

  • Elon is into revisionism in a big way and that should worry us all.

  • Elon has realised that there is too much garbage in the foundational data.

I’d like to address both these items in this article.


ELON THE REVISIONIST

Why does it matter if Elon wants to correct the corpus of human knowledge? Isn’t that a good thing? Well, yes, if it was well intentioned.

Yuval Noah Harari said in an interview: “Stories are the greatest human invention. People need stories in order to cooperate. But there’s also something else very important: they can change the way they cooperate by changing the stories they believe.”

If Elon gets the opportunity to rewrite the stories we tell each other the way he sees fit, believe me, the story we tell ourselves will be very different. What makes Elon the right person for the job apart from the fact that he has way more money (and as a result power) than any single person should be allowed to have?


IN THE QUAGMIRE OF AI SLOP

What could possibly have created all this garbage in the foundational data? There are different estimates of how much of the content on the internet is now AI generated, but some suggests it could be more than 50% and it is ever increasing. And why is that bad? Studies show that AI models become increasingly unreliable the more AI generated content that used to train it. This is called model collapse and it works like this:

So, whilst we may think use of ChatGPT and other LLMs gives us all human knowledge at our fingertips to mould as we please, we are at the same time homogenising the data. We are not expanding the information, we are only repeating the same information, reinforcing it more and more.

A similar mechanism is demonstrated when you try and generate an image of someone writing with their left hand. There just isn’t enough images in the training set, so you pretty much invariably get an image of someone writing with their right. And the more images that are generated of people writing with their right hand, the more there is to consume into the training dataset.

As an individual person we might feel like we’ve opened Pandora’s box. But it seems humankind’s Pandora’s box may end up shrinking as a result.


References:

AI produces gibberish when trained on too much AI-generated data - https://www.nature.com/articles/d41586-024-02355-z

The Curse of Recursion: Training on Generated Data Makes Models Forget - c

ChatGPT Has Already Polluted the Internet So Badly That It's Hobbling Future AI Development - https://futurism.com/chatgpt-polluted-ruined-ai-development

The launch of ChatGPT polluted the world forever, like the first atomic weapons tests - https://www.theregister.com/2025/06/15/ai_model_collapse_pollution/

 
 
 

Comments


© 2024 by Mikael Svanström
bottom of page