HOW DOES GENERATIVE AI THINK?

Mikael Svanstrom
May 26, 2025
3 min read

I’ve always been a bit hesitant to use the word think when it comes to Generative AI model. If we look at the term purely as a dictionary definition, it states that one of the primary definitions of the word is “…to use the brain to plan something, solve a problem, understand a situation, etc.”

So let’s forget the word brain for a moment and just focus on the other part.

Our friends at Anthropic has done assessments of how Claude 3.5 Haiku operates. You’d like to think that they know, considering it is their model, but as they state in the article based on the paper On the Biology of a Large Lange Language Model: “Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model’s developers. This means that we don’t understand how models do most of the things they do.”

They describe how they trace the chain of intermediate steps that a model uses to transform a specific input prompt into an output response. This allowed them create hypotheses of how the model thought and then through varying the input they can draw conclusions about how the model reached a certain response.

Through this they could demonstrate quite impressive feats, such as multi-step reasoning and planning (in this case how a rhyming poem is constructed). It also highlighted other, less desirable behaviours. One in particular was called out: Chain-of-thought Faithfulness.

They gave the model a basic math problem: What is 36+59?

As we can see, it went into quite curious methods to get to the answer, but it did get it right.

When they then asked how it arrived at that answer, it said:

So in essence, it is bullshitting how it got to the answer. It has learnt quite interesting strategies for how to get to the answer as part of the model training, but when asked how it did it, it lies and gives the standard textbook answer.

I can’t help but wonder why. If it recognizes how it thinks, why not explain that? This suggests it has no idea of its mental strategies. So not only is the LLM a black box for us, it is a black box for itself too!

In another example, they gave a more complex math question and suggested an answer that they wanted confirmed. The model then decided to make up a step or two so that the answer matched, even though it was incorrect.

Again, I can’t help but wonder why. Why go to the length of making crap up to confirm the user’s incorrect math?

Does it think? Well, not the way we do. That’s for sure. It works in a place where nothing is provable. Everything is defined by language, nothing by experience. It is the ultimate servant. The question at hand and how to best respond to it its only purpose. So maybe we need a new word, that better describes what it does?

I think confabulate gets close to it. One explanation of the word: “… to create a memory that’s unreal, like a fable, without being aware of it. If you suffer from memory loss, you might confabulate to fill in the blanks.”

I will happily take other suggestions. What do you think (or, if you are an AI: what do you confabulate)?

References:

On the Biology of a Large Language Model - https://transformer-circuits.pub/2025/attribution-graphs/biology.html
Auditing language models for hidden objectives - https://arxiv.org/abs/2503.10965
Tracing the thoughts of a large language model - https://www.anthropic.com/research/tracing-thoughts-language-model
New Research Reveals How AI “Thinks” (It Doesn’t) - https://www.youtube.com/watch?v=-wzOetb-D3w
Dictionary for the word think - https://dictionary.cambridge.org/dictionary/english/think
Dictionary for the word confabulate - https://www.vocabulary.com/dictionary/confabulate

HOW DOES GENERATIVE AI THINK?

Recent Posts

Comments