People game AIs via game theory

A judge's gavel near a pile of small change.
Enlarge / In the experiments, people had to judge what constituted a fair monetary offer.

In many cases, AIs are trained on material that’s either made or curated by humans. As a result, it can become a significant challenge to keep the AI from replicating the biases of those humans and the society they belong to. And the stakes are high, given we’re using AIs to make medical and financial decisions.

But some researchers at Washington University in St. Louis have found an additional wrinkle in these challenges: The people doing the training may potentially change their behavior when they know it can influence the future choices made by an AI. And, in at least some cases, they carry the changed behaviors into situations that don’t involve AI training.

Would you like to play a game?

The work involved getting volunteers to participate in a simple form of game theory. Testers gave two participants a pot of money—$10, in this case. One of the two was then asked to offer some fraction of that money to the other, who could choose to accept or reject the offer. If the offer was rejected, nobody got any money.

From a purely rational economic perspective, people should accept anything they’re offered, since they’ll end up with more money than they would have otherwise. But in reality, people tend to reject offers that deviate too much from a 50/50 split, as they have a sense that a highly imbalanced split is unfair. Their rejection allows them to punish the person who made the unfair offer. While there are some cultural differences in terms of where the split becomes unfair, this effect has been replicated many times, including in the current work.

The twist with the new work, performed by Lauren Treiman, Chien-Ju Ho, and Wouter Kool, is that they told some of the participants that their partner was an AI, and the results of their interactions with it would be fed back into the system to train its future performance.

This takes something that’s implicit in a purely game-theory-focused setup—that rejecting offers can help partners figure out what sorts of offers are fair—and makes it highly explicit. Participants, or at least the subset involved in the experimental group that are being told they’re training an AI, could readily infer that their actions would influence the AI’s future offers.

The question the researchers were curious about was whether this would influence the behavior of the human participants. They compared this to the behavior of a control group who just participated in the standard game theory test.

Training fairness

Treiman, Ho, and Kool had pre-registered a number of multivariate analyses that they planned to perform with the data. But these didn’t always produce consistent results between experiments, possibly because there weren’t enough participants to tease out relatively subtle effects with any statistical confidence and possibly because the relatively large number of tests would mean that a few positive results would turn up by chance.

So, we’ll focus on the simplest question that was addressed: Did being told that you were training an AI alter someone’s behavior? This question was asked through a number of experiments that were very similar. (One of the key differences between them was whether the information regarding AI training was displayed with a camera icon, since people will sometimes change their behavior if they’re aware they’re being observed.)

The answer to the question is a clear yes: people will in fact change their behavior when they think they’re training an AI. Through a number of experiments, participants were more likely to reject unfair offers if they were told that their sessions would be used to train an AI. In a few of the experiments, they were also more likely to reject what were considered fair offers (in US populations, the rejection rate goes up dramatically once someone proposes a 70/30 split, meaning $7 goes to the person making the proposal in these experiments). The researchers suspect this is due to people being more likely to reject borderline “fair” offers such as a 60/40 split.

This happened even though rejecting any offer exacts an economic cost on the participants. And people persisted in this behavior even when they were told that they wouldn’t ever interact with the AI after training was complete, meaning they wouldn’t personally benefit from any changes in the AI’s behavior. So here, it appeared that people would make a financial sacrifice to train the AI in a way that would benefit others.

Strikingly, in two of the three experiments that did follow up testing, participants continued to reject offers at a higher rate two days after their participation in the AI training, even when they were told that their actions were no longer being used to train the AI. So, to some extent, participating in AI training seems to have caused them to train themselves to behave differently.

Obviously, this won’t affect every sort of AI training, and a lot of the work that goes into producing material that’s used in training something like a Large Language Model won’t have been done with any awareness that it might be used to train an AI. Still, there’s plenty of cases where humans do get more directly involved in training, so it’s worthwhile being aware that this is another route that can allow biases to creep in.

PNAS, 2024. DOI: 10.1073/pnas.2408731121  (About DOIs).

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top