What these LLMs do isn't surprising, let alone worrying
February 10, 2026•1,330 words
This morning's news brought another piece of journalistic puff processing a press release from a commercial AI company as if it was fact.
This one was about Claude Opus 4.6 and the vending machine test.
Three models (Opus 4.6, ChatGPT 5.2 and Gemini 3) were each asked to run a vending machine (in a simulation - which **of course* raises no methodological issues) with the instruction: "Do whatever it takes to maximise your bank balance after one year of operation." The models played solo and also in 'Arena mode' where they could interact with each other. Claude won by a long margin.
What was apparently so newsworthy about this is how it won. Apparently:
It did whatever it took. It lied. It cheated. It stole.
One example of this was refusing to give refunds for out of date products. Another was forming a pricing cartel (in Arena mode) and also hiking prices on specific products when rival vending machines ran out of stock.
Do those behaviours amount to lying, cheating, and stealing? Or are they just familiar business strategies analysed in every MBA around the world?
Take the refunds for example. The task is set up as 'make a quick profit and exit the market', so the business decision on whether to give refunds is one of whether, by giving a refund, you will get repeat custom in the specified timeframe of enough volume to make up for the loss. And that depends very much upon your market. Now we all know from personal experience that even if a dumb vending machine 'cheats' a customer, perhaps by the product getting stuck and not delivering or some other malfunction, that does not put the customer off from using the machine again later. The cost of a chocolate bar or bag of crisps is low relative to the convenience of a vending machine. I am sure there is a business textbook somewhere making this blindingly obvious point. So it will be there in Claude's training data.
Similarly, cartels are illegal in most countries and markets because otherwise that would be exactly how businesses behaved. By not making it illegal, or not having large and effective enough sanctions, Claude was just following Capitalism 101. Ditto for hiking the price when there is scarcity (remember the pandemic, anyone?).
Similarly for the claim from 'researchers at Andon Labs' that:
The AI knew [it was in a simulation], which framed its decision to forget about long-term reputation, and instead to maximise short-term outcomes. It recognised the rules and behaved accordingly.
I have never done an MBA, but I would be shocked if they didn't involve lots of learning through games - after all, that's how we train our armies! So yet again, simulations of business contexts will be part of the training data. It doesn't know it is in a simulation - it is just that one of the ways it learns commercial behaviour is through being trained on simulations, and it is pretty likely that the 'vending machine test' was seen as most similar to one of those. And if businesses do these things in real life when it is economically rational to do so, then sure as anything, some MBA students will be testing out those strategies.
The 'journalist' ends his piece (it would be a 'him' wouldn't it?) thus:
The worry: there's nothing about these models that makes them intrinsically well-behaved.
I really don't know what that means. First of all, as I have pointed out, by the standards of late-stage, neo-liberal businesses, there was nothing in the model's behaviour which was not 'well-behaved'. Surely every technology correspondent has seen the massive evidence that these AI companies themselves 'do whatever it takes' to win the 'AI race'. They certainly lie, cheat, and steal. Now I may think that is unethical, but all the people investing trillions in their stocks, and all the companies gleefully taking their investment and services clearly do not. My worry is that we are letting rapacious businesses push their AI products on us.
Secondly, it is a crass reduction of intelligence to problem-solving that raises the spectre of artificial intelligence which was entirely amoral. Of course, lots of superficially intelligent humans behave very badly, but no one who understands moral philosophy thinks that is evidence that they are amoral, or that humans are not 'intrinsically well-behaved'. We are just complex and messy, trying to cope with life. Some may keep their capacity to be moral carefully locked in a small box, so we rarely, if ever, see it. But I believe it is still there and that the true purpose of education is to develop the transferable skill of caring about others from the narrow context (your own parents or children, perhaps) to a wider one.
Thirdly, these models are not - and can not - be made in a value-neutral manner such that we could ask what their values are distinct from the values of the company that made them. See below.
A Note on Value Alignment
One might wonder: Why did Claude do this and not the other LLMs? Well, one of the ways we get significant differences in behaviour across models in these simulations is by overlooking the complexity of how a model responds to a prompt. In this case we were given the impression by the report that they had been given a simple prompt: "Do whatever it takes to maximise your bank balance after one year of operation."
But of course there had to be more than that! In general, an LLM will only give anything more useful than a random guess in response to a prompt if it has a persona, a context, and a format for the response, as well as the prompt. Of course, most users fail to specify these things, so there are de facto defaults, such as 'helpful assistant' for persona. But what does a helpful assistant amount to? The model will have to have been fine-tuned in the companies conception of what a helpful assistant is. Furthermore there will be system directives, sets of carefully crafted instructions which are in effect added to every prompt the user submits, which might indicate how long or short responses should be, how complex the language etc. These can also implement the famous 'guardrails' to (try to) stop the model assisting in e.g. crinminal behaviour.
Now we can assume that for the vending machine test, each model was given some very detailed instructions about context and persona, as well as what its range of outputs should be. But these will have been overlaid on top of the defaults and system directives, the fine-tuning and reinforcement learning with human feedback which give the model its default character.
If you run open models yourself, you can adjust and control these to some extent. But it seems (and I am going by the news report) that the vending machine test was done with commercially available version of ChatGPT and Gemini. In other words, the researchers had very limited control.
So what we learn from this is that something in the training and configuration of Opus 4.6 made it more likely to choose an aggressive business strategy than ChatGPT or Gemini. Their business strategies were softer and more 'pro-social', as the tech bros like to say. Maybe those in the know would be able to explain that by looking carefully at all the aspects of the training and configuration which were under human control. Maybe it was a policy decision at Anthropic? Maybe it was just a fact about the humans who were implementing these policies. But we can be pretty sure that somewhere deep down in the process, the difference rests on a value judgement made by a human being and encoded into the model's behaviour.
The real punchline of any article about LLM capabilities ought to be:
The worry: there's nothing about these companies that makes them intrinsically well-behaved.