ChatGPT Is Replacing Humans in Studies on Human Behavior

Spread the love

Parts Unknown de Anthony Bourdain . En cada episodio, el chef visita aldeas remotas en todo el mundo, documentando las vidas, comidas y culturas de las tribus regionales con un corazón y una mente abiertos.

The show provides a glimpse into humanity’s astonishing diversity. Social scientists have a similar goal—understanding the behavior of different people, groups, and cultures—but use a variety of methods in controlled situations. For both, the stars of these pursuits are the subjects: humans.

But what if you replaced humans with AI chatbots?

The idea sounds preposterous. Yet thanks to the advent of ChatGPT and other large language models (LLMs), social scientists are flirting with the idea of using these tools to rapidly construct diverse groups of “simulated humans” and run experiments to probe their behavior and values as a proxy to their biological counterparts.

If you’re imagining digitally recreated human minds, that’s not it. The idea is to tap into ChatGPT’s expertise at mimicking human responses. Because the models scrape enormous amounts of online data—blogs, Youtube comments, fan fiction, books—they readily capture relationships between words in multiple languages. These sophisticated algorithms can also decode nuanced aspects of language, such as irony, sarcasm, metaphors, and emotional tones, a critical aspect of human communication in every culture. These strengths set LLMs up to mimic multiple synthetic personalities with a wide range of beliefs.

Another bonus? Compared to human participants, ChatGPT and other LLMs don’t get tired, allowing scientists to collect data and test theories about human behavior with unprecedented speed.

The idea, though controversial, already has support. A recent article reviewing the nascent field found that in certain carefully-designed scenarios, ChatGPT’s responses correlated with those of roughly 95 percent of human participants.

AI “could change the game for social science research,” said Dr. Igor Grossman at the University of Waterloo, who with colleagues recently penned a look-ahead article in Science. The key for using Homo silicus in research? Careful bias management and data fidelity, said the team.

Probing the Human Societal Mind

What exactly is social science?

Put simply, it’s studying how humans—either as individuals or as a group—behave under different circumstances, how they interact with each other and develop as a culture. It’s an umbrella of academic pursuit with multiple branches: economics, political science, anthropology, and psychology.

The discipline tackles a wide range of topics prominent in the current zeitgeist. What’s the impact of social media on mental health? What are current public attitudes towards climate change as severe weather episodes increase? How do different cultures value methods of communication—and what triggers misunderstandings?

A social science study starts with a question and a hypothesis. One of my favorites: do cultures tolerate body odor differently? (No kidding, the topic has been studied quite a bit, and yes, there is a difference!)

Scientists then use a variety of methods, like questionnaires, behavioral tests, observation, and modeling to test their ideas. Surveys are an especially popular tool, because the questions can be stringently designed and vetted and easily reach a wide range of people when distributed online. Scientists then analyze written responses and draw insights into human behavior. In other words, a participant’s use of language is critical for these studies.

So how does ChatGPT fit in?

The ‘Homo Silicus’

To Grossman, the LLMs behind chatbots such as ChatGPT or Google’s Bard represent an unprecedented opportunity to redesign social science experiments.

Because they are trained on massive datasets, LLMs “can represent a vast array of human experiences and perspectives,” said the authors. Because the models “roam” freely without borders across the internet—like people who often travel internationally—they may adopt and display a wider range of responses compared to recruited human subjects.

ChatGPT also doesn’t get influenced by other members of a study or get tired, potentially allowing it to generate less biased responses. These traits may be especially useful in “high-risk projects”—for example, mimicking the responses of people living in countries at war or under difficult regimes through social media posts. In turn, the responses could inform real-world interventions.

Similarly, LLMs trained on cultural hot topics such as gender identity or misinformation could reproduce different theoretical or ideological schools of thought to inform policies. Rather than painstakingly polling hundreds of thousands of human participants, the AI can rapidly generate responses based on online discourse.

Potential real-life uses aside, LLMs can also act as digital subjects that interact with human participants in social science experiments, somewhat similar to nonplayer characters (NPC) in video games. For example, the LLM could adopt different “personalities” and interact with human volunteers across the globe online using text by asking them the same question. Because algorithms don’t sleep, it could run 24/7. The resulting data may then help scientists explore how diverse cultures evaluate similar information and how opinions—and misinformation—spread.

Baby Steps

The idea of using chatbots in lieu of humans in studies isn’t yet mainstream.

But there’s early proof that it could work. A preprint study released this month from Georgia Tech, Microsoft Research, and Olin College found that an LLM replicated human responses in numerous classical psychology experiments, including the infamous Milgram shock experiments.

Yet a critical question remains: how well can these models really capture a human’s response?

There are several stumbling blocks.

First is the quality of the algorithm and the training data. Most online content is dominated by just a handful of languages. An LLM trained on these data could easily mimic the sentiment, perspective, or even moral judgment of people who use those languages—in turn inheriting bias from the training data.

“Esta reproducción del sesgo es una gran preocupación porque podría amplificar las mismas disparidades que los científicos sociales se esfuerzan por descubrir en su investigación”, dijo Grossman.

A algunos científicos también les preocupa que los LLM solo estén regurgitando lo que se les dice. Es la antítesis de un estudio de ciencias sociales, en el que el punto principal es capturar a la humanidad en toda su belleza diversa y compleja. Por otro lado, se sabe que ChatGPT y modelos similares “ alucinan ”, inventando información que suena plausible pero es falsa.

Por ahora, “los grandes modelos de lenguaje se basan en ‘sombras’ de experiencias humanas”, dijo Grossman. Debido a que estos sistemas de IA son en gran medida cajas negras, es difícil entender cómo o por qué generan ciertas respuestas, un poco preocupante cuando se usan como representantes humanos en experimentos de comportamiento.

A pesar de las limitaciones, “los LLM permiten a los científicos sociales romper con los métodos de investigación tradicionales y abordar su trabajo de manera innovadora”, dijeron los autores. Como primer paso, Homo silicus podría ayudar a intercambiar ideas y probar rápidamente hipótesis, y las prometedoras se validarían aún más en poblaciones humanas.

Pero para que las ciencias sociales realmente den la bienvenida a la IA, deberá haber transparencia, equidad e igualdad de acceso a estos poderosos sistemas. Los LLM son difíciles y costosos de capacitar, y los modelos recientes se cierran cada vez más detrás de fuertes muros de pago.

“Debemos asegurarnos de que los LLM de ciencias sociales, como todos los modelos científicos, sean de código abierto, lo que significa que sus algoritmos e, idealmente, los datos estén disponibles para que todos puedan examinarlos, probarlos y modificarlos”, dijo la autora del estudio, la Dra. Dawn Parker de la Universidad de Waterloo. “Solo manteniendo la transparencia y la replicabilidad podemos asegurarnos de que la investigación en ciencias sociales asistida por IA realmente contribuya a nuestra comprensión de la experiencia humana”.

Fuente:

Fan, S. (2023, 25 julio). ChatGPT is replacing humans in studies on human behavior—and it works surprisingly well. Singularity Hub. https://singularityhub.com/2023/07/25/chatgpt-is-replacing-humans-in-studies-on-human-behavior-and-its-working-surprisingly-well/

ChatGPT Is Replacing Humans in Studies on Human Behavior—and It Works Surprisingly Well

Probing the Human Societal Mind

The ‘Homo Silicus’

Baby Steps

Leave a Reply Cancel reply