Making Automation Work for Social Scientists
Many scientists of all stripes are optimistic about new research opportunities afforded by “AI.” This term refers to a diverse set of automated and semiautomated technologies including (but not limited to) predictive machine learning systems, automated transcription, generative models that produce images and text, and chatbots built on large language models (LLMs) that simulate conversation. Within social science, researchers envision using these technologies to accelerate and expand their abilities to gather, analyze, and simulate human data.1 Some have even proposed automating the entire research pipeline, with LLMs replacing human subjects and scientists.2
I’ve watched these developments with a mixture of excitement and alarm. My lab at Princeton University has used predictive machine learning to analyze public discourse on social media, and recent advances in LLMs enable us to pursue research questions that were out of reach just a few years ago.3 At the same time, I worry that widespread adoption of automation across the research pipeline is outpacing thoughtful consideration of its many risks and harms, including harms to us as knowledge workers.4 To be clear: my worries don’t reflect a nostalgia for some idyllic analog past. Rather, I’m concerned with what scientific futures we foreclose if we cede our epistemic agency to automated systems.
I’ve begun to find a way through this minefield in conversation with my brilliant and generous collaborator Lisa Messeri, an anthropologist of science and technology who investigates how expert communities create new fields of inquiry and innovation.5 Lisa introduced me to scholarship illuminating how political, economic, and cultural forces shape our collective imagination about what technology is and can be. Here, I offer some insights that have emerged from our collaboration and shape the way I approach research in my lab, with the aim of preserving our epistemic agency under intense pressure to give it away.6
First, industry marketing can mislead us into thinking that new technologies are more capable than they actually are. Technology companies are selling AI with overhyped claims of superhuman or even magical abilities.7 Some scientists imagine LLMs as oracles, capable of extracting “objective” truths from the published literature or large datasets.8 Lisa and I name this imaginary as an illusion of objectivity: LLMs reflect the viewpoints of their engineers and those represented in their training data.9 Rather than removing bias from the scientific process, this technology entrenches dominant viewpoints while simultaneously obscuring them from view.10 Just as social scientists have begun to make progress in diversifying the questions we ask and the people we study, wholesale adoption of automation threatens to reverse that progress.
When automated tools are controlled by private industry, scientists cannot access information crucial for vetting the capabilities of these tools. Accounting for biases in models is challenging because of their opacity. This is especially the case for proprietary LLMs like the GPT product line, because OpenAI has not disclosed the contents of its models’ training data. Moreover, the outputs of proprietary LLMs vary unpredictably over time, as engineers tweak model parameters behind the scenes to satisfy corporate goals.11 We cannot build a robust and reproducible social science using models whose construction is opaque, whose outputs are unstable, and whose control rests in the hands of CEOs with very different goals than ours.12
Social scientists are increasingly seeking financial support from the technology industry, especially as public funding becomes scarce. This has at least two consequences for our science. First, we are incentivized to make our research questions fit the capabilities of automated tools. But not all questions are amenable to computational analysis.13 We suffer from an illusion of exploratory breadth when we mistake the subset of questions automated tools can address with the broader set of questions we can ask about the social world.14 Second, industry funding encourages scientists to focus on questions that are friendly to technology companies.15 These complementary forces can lead to the development of scientific monocultures, where a narrow set of methods and questions dominates knowledge production, making science less innovative and robust.16
The varied risks of automation for social science all flow from the outsourcing of scientific judgment to automated systems and the companies that control them. Recognizing this suggests concrete steps we can take to reclaim our epistemic agency in the short term and preserve it in the long term. Instead of being seduced by fantasies of “superintelligent” general purpose systems that can replace the work of scientists, we can build automated tools for specific scientific tasks. Rather than relying on opaque proprietary models, we can insist on using open-source models with transparent documentation.17 To prevent the formation of scientific monocultures, we can recognize that qualitative and ethnographic work offers distinctive ways of understanding those aspects of social life that resist quantification, and we can advocate for continued investment in these diverse methodologies.18 We should also require our colleagues to be more transparent about conflicts of interest that arise from industry funding, encouraging industry-independent research instead.19
These things called “AI” will surely change in the years to come, but this does not mean current insights have an expiration date. Our colleagues in the humanities and humanistic social sciences have long recognized that technological advances rarely create entirely new dilemmas, instead reproducing old dilemmas in new forms. The communal values of science have always been in tension with private commercial interests.20 Now, by aiming to commodify knowledge production itself, the technology industry makes this tension harder to ignore. Our urgent task at present is to question the uncritical embrace of artificial intelligence and articulate a future social science in which automation works for us.
Author’s Note
Thanks to Emily Bender, Lisa Messeri, and Alondra Nelson for helpful comments.