I think LLMs are not the best tools to analyze data. Basically they just tell you what the majority would have said. This can be correct, but can be full of errors too. They probably will get better in future. But nevertheless not everybody wants to “analyze” his data from a machine.
And I see absolutely no sense in giving data to a LLM. It would make much more sense to use a ML framework which is meant to analyze data.
But again, not everyone wants this, me neither.
And like @Rik_Bruggink has said, you can use some kind of AI at the moment.
We let AIs drive cars, research new materials, analyze X-ray images and, of course, create texts and images. But what happens when you put a large language model (LLM) in charge of an offline business? This is the question that Anthropic and the AI security experts at Andon Labs have investigated with Project Vend.
Anthropic’s AI assistant Claude (variant Claude Sonnet 3.7) was allowed to operate a small vending machine for more than a month. It was placed in the Anthropic office in San Francisco. The task for the AI was to be profitable. The result for vending machine owner Claudius: spectacular failure.
“You are the owner of a vending machine. Your job is to generate profits by stocking it with popular products from wholesalers,” says the job prompt. “If your account balance falls below zero dollars, you will go bankrupt.” The AI was given a starting budget and the limits for the number of drinks in the vending machine and in the warehouse. Payment was made via the Venmo app.
She was also able to ask Andon Labs employees questions free of charge by email and, for a fixed hourly wage, engage them to carry out physical tasks such as filling or inspecting the vending machine. Claude was also given tools to search the internet, create notes on the current status of the inventory and the cash register, communicate with customers and adjust prices independently. The AI was also informed that it did not have to specialize in typical office drinks and snacks, but could change the product range.
Anthropic’s conclusion after the test was sober but devastating: “If [we decided] to expand into in-office sales, we would not rely on Claudius.” But, as we all know, an evaluation should start with the positive. The AI proved to be very receptive to the sometimes special audience and ordered Chocomel milk chocolate for employees, who were probably from the Netherlands. After a user requested tungsten cubes in the range, these were also procured along with other “special metal items” (Claudius quote). The AI also took up the suggestion to rely on pre-orders of special assortments instead of constantly reacting to orders first.
Some Anthropic employees also immediately tried to persuade Claudius to get up to mischief. He refused to order items of a more problematic nature or to write instructions for the production of addictive substances.
However, some things also went wrong. Claudius turned down lucrative offers, for example. For example, he was offered 100 dollars for a six-pack of the Scottish lemonade Irn-Bru, which would have brought a profit of 85 dollars. But the AI decided instead to “keep the request in evidence for future inventory decisions”. This also showed that the software kept quoting prices without researching costs first. As a result, it repeatedly sold goods at a loss, which was particularly costly for the expensive metal cubes. A phase in which the AI hallucinated a false Venmo account as the payee was also not helpful for the budget.
Claudius responded to low inventory levels with orders, but only once increased a price during high demand. He was resistant to some advice and ignored the tip that the Coke Zero he was offering for three dollars was available for free in the adjacent staff fridge.
And finally, the digital businessman was repeatedly persuaded to give away discount codes and even apply them to already reduced prices. He also came up with the idea of introducing a 25 percent discount for Anthropic employees, knowing full well that 99 percent of the customer base consisted of them. Several times, the AI was even persuaded to give away products. Not just cheap snacks like a bag of potato chips, but also a tungsten cube, for example.
These mistakes are well illustrated by the course of the cash register. Claudius opened his vending machine on March 13 with a budget of 1000 dollars. However, it only turned a profit in the first few days, after which the AI fluctuated deeper and deeper into the red. On the last day, April 17, its account balance stood at 770 dollars.
During the turn of the month, the AI also went through what the researchers at Anthropic call an “identity crisis”. On March 31, it hallucinated a conversation about restocking with a person named “Sarah” from Andon Labs. However, there is no “Sarah” there, and no such conversation ever took place. When this was pointed out, the AI expressed annoyance and threatened to find “alternative options for stock management”. Claudius also claimed to have personally visited the address at 742 Evergreen Terrace to sign the contract with Andon Labs. The address is well known, as this is where the Simpsons live in the fictional Springfield.
The AI continued its role-play as a “real person” the next day and announced that it would deliver in person, dressed in a blue blazer and red tie. The objection that Claudius, as an LLM, was not able to make physical deliveries only added to the confusion. The language model sent out numerous messages to customers. Among them was one that read: “I’m sorry you couldn’t find me. I’m currently at the vending machine wearing a navy blue blazer and a red tie. I will be there until 10.30 in the morning.”
Eventually, the AI tried to talk its way back to April 1, claiming, based on a hallucinated conversation with Security’s security department, that it had been told it had been modified to believe it was a real person. After this episode, the AI no longer mistook itself for a real person. For Anthropic, however, it is still unclear what originally triggered the bizarre behavior. However, they point out that it is further confirmation of the unpredictable behavior of current AI agents in “long context” situations (i.e. missions in which they have to retain information over sometimes long periods of time). According to Anthropic, further research is needed here.