ÇďżűĘÓƵ

AI and Mental Health Care: Issues, Challenges, and Opportunities

QUESTION 9: What are the most significant scholarly questions that will need to be answered as AI’s role in mental health care evolves?

Back to table of contents
Project
AI and Mental Health Care

Background

As stated earlier, potential interventions to be used in biomedical and behavioral care settings are usually subjected to rigorous research and scholarly analysis before they are allowed to be deployed. However, in the case of AI and mental health care, that has not happened; many AI applications have been deployed before that kind of rigorous research has been done. That is unfortunate and may have led to some significant problems, including ineffective and even potentially dangerous applications being freely available. For those reasons, a full-scale research program is badly needed, even if to some it may soon be too late. Some problems will be correctable. A greater number of randomized clinical trials can provide the evidence base from which to build and evolve AI into safe and effective tools. Population health studies can help ensure appropriate guidelines for AI in mental health care and equitable access to these tools. The findings from such research enable more strategic funding allocation, helping policymakers and private funders invest in novel care pathways and tools with the greatest potential for clinical value and equitable impact. Evidence standards and benchmark requirements can create productive pressure among developers, rewarding systems that demonstrate real-world utility rather than marketability alone. Without such feedback loops, hype may outpace performance, and useful innovation may be crowded out.

Responses

A photo of Marian Croak, a person with dark skin and dark hair, wearing business attire and smiling at the viewer.

Marian Croak
 

Recently, researchers conducted the first clinical trial of a purpose-built therapy bot.194 They used custom-built datasets containing evidence-based practices with a human in the loop to monitor the bot’s responses. Although the study was small (210 participants), the therapy bot showed promising results, especially for alleviating depression. Given the widespread use of generative AI for therapeutic purposes, the fact that this is the first and only clinical trial highlights the need for more rigorous scientific research.

Areas that need deeper exploration include:

  • examining both short- and long-term outcomes for stand-alone therapeutic bots and ones with a human in the loop compared to one-on-one human therapy;

  • creating best practices for the use of datasets that mitigate hallucinations and algorithmic bias and that contain representative data from a wide range of demographic groups;

  • designing large, transparent, and explainable datasets that capture different modalities and psychological conditions;

  • investigating the ability of AI systems to learn emotional intelligence and empathy;

  • identifying the risks and benefits of anthropomorphic attachment to AI therapeutic bots;

  • improving the reliability and accuracy of diagnostic determination;

  • creating filters and evaluation tools for minimizing harmful responses/advice;

  • measuring the effectiveness of using AI assistants to help reduce the time therapists spend on administrative tasks; and

  • examining workforce disruptions produced by AI mental health tools.

To truly understand the effectiveness of using AI in the end-to-end therapeutic process, we must engage the expertise of social scientists, ethicists, security and privacy specialists, legal professionals, neuroscientists, demographically diverse clients, mental health professionals (psychologists, psychiatrists, and social workers), user experience researchers and designers, computer scientists specializing in natural language processing, AI, and machine learning, as well as others. This interdisciplinary collaboration will help to ensure that proposed solutions are technically sound, ethical, safe, easy to use, and truly beneficial for clients and practitioners.

As these AI tools are deployed, they will need continuous monitoring and auditing to ensure they are performing as intended. AI models, especially generative ones, dynamically change as a result of statistical factors and their adaptation to real-world input. Depending on the application of AI, different metrics will need to be established that target benchmarks set across different parameters. Subsequently, evaluation tests or audits need to be periodically or continuously conducted to ensure the tool is within range of its target benchmarks. If measurements dip below the benchmark, either automated or manual adjustments to the tool are needed. Relevant metrics include reliability, availability, accuracy, appropriateness of the response, user sentiment and engagement, fairness, and measures of privacy and security.

As tech companies race to deploy new advances in AI, they are clearly outpacing the slower progress of governance and policy frameworks created by the public sector, including international and domestic government entities. Research suggests that policymakers should adopt more-innovative approaches to creating governance frameworks and regulatory guidelines.195 The National Institute of Standards and Technology’s approach to setting guidelines for responsible AI governance is an example of an innovative policy due to its extensive reliance on industry collaboration and use of version controls to enable fast changes to policies as technology advances.

 

A photo of Alison Darcy, a person with light skin and long gray-brown hair, wearing a green top and facing the viewer.

Alison Darcy
 

Over the past few decades, numerous technologies have generated enthusiasm as potential solutions to long-standing challenges in mental health care. The Internet was meant to resolve access; smartphones promised always-on support and Big Data insights; gamification aimed to improve adherence. But AI is arguably the advance that most fully unlocks these benefits—by offering a human-centered interface: conversation.

AI doesn’t just scale interventions; it could reshape them. For the first time, we can reach people in the flow of daily life, through an interface they can engage with even when motivation and mood are low. This marks an exciting moment for intervention science but also raises important questions.

We may be tempted to tightly control or replicate human-led models. Yet doing so may limit discovery. Rather than copying human beings, we should ask: What are AIs uniquely good at? Human beings are essential for connection, intuition, and relational depth. AIs excel in consistency, availability, and data processing. We do not need to force them into a mold that doesn’t suit their strengths.

I believe the question of whether an “AI can make a good therapist” is unhelpfully blunt and tends to get bogged down in an impractical narrative of AIs replacing human beings. Early studies challenge the belief that AIs can’t be empathic or helpful. For instance, people often disclose more to AIs than to human beings.196 Some people report feeling more empathized with by AI than by physicians in online contexts.197 And AI (natural language processing) algorithms may be better suited to prediction and detection of psychosis than human beings.198

Going forward, we need to adopt first-principles thinking to assess what works—not based on fidelity to human therapy but on what produces benefit. Most critical, the only human who must remain “in the loop” is the user themself. Research on Edward Deci and Richard Ryan’s self-determination theory tells us that autonomy is a key driver of positive outcomes.199

We must, of course, evaluate efficacy and safety rigorously—but also stay open to unexpected benefits, not just unintended harms.

That being said, what are the major areas of scholarly focus?

Therapeutic mechanism

All major technology shifts generate new behaviors. Just as smartphones sparked the cultural norm of photographing meals, AI may give rise to new forms of therapeutic engagement; that is, enable the creation of novel therapeutic mechanisms. It could also amplify known mechanisms. For example, CBT as a therapeutic approach invites language to be formally examined as a proxy for thinking, and thoughts are systematically gathered and considered as a window into distorted beliefs. CBT also leans heavily on data, relying on real-time gathering of symptoms like moods and the context in which they arise, for assessment and evaluation. Not surprisingly, all purpose-built chatbots to date—for example, Woebot, Wysa, and Therabot—have been built upon a CBT framework wherein the role of the therapist is characterized by “collaborative empiricism.” How an AI should show up in a different type of therapy wherein the role of the therapist is characterized differently is not yet well understood. For example, can AI be deployed in a family systems therapeutic approach? If so, how might that be operationalized? Research should also explore whether AIs can facilitate therapeutic methods that are based on creative self-expression, since this has been identified as an active ingredient of effective intervention for young people and appears to be a particular strength of emerging AI models.200 Determining this seems even more relevant given the recent paper in The Lancet Psychiatry that examined the outcomes from NHS’s Talking Therapies for anxiety and depression in over three hundred thousand patients. The results suggest that young adults’ outcomes were poorer than those of working adults. Young adults were less likely to meet measures for reliable improvement and were more likely to meet criteria for reliable deterioration. These data point to the youth mental health needs that require adaptations, which AI-supported mental health care is well-positioned to meet.201

Efficacy and safety of AI-based interventions

The literature base around DMHIs, particularly digital CBT, is already mature. Increasingly, RCTs have tested AI-based chatbots—both traditional rule-based and GenAI-driven—utilizing a variety of comparison conditions. Two studies have compared chatbot-based interventions to more traditional human-delivered care, with both studies showing similar findings (noninferiority for depression and slightly superior findings in the case of anxiety).202 Future research on efficacy and safety should explore the longer-term effects of chatbot-based interventions; for example, whether therapeutic effects diminish over time and what usage patterns—different as they are structurally from classic treatment—might be deployed to avoid this. However, the most urgent question is how to deploy these interventions within the health care system.

AI-enabled therapeutic systems and structures

AI has the potential to enable entirely new care structures—stepped care, precision care, continuous engagement models, and so on. If AI is to have a specific role in public health, how do we consider the role of frontier models in supporting positive mental health and even offering early intervention. What we don’t know about health care models is how the interplay between data ownership and crisis intervention would work (or whether it should even be part of AI-enabled health care).We don’t yet fully understand or have an agreed-upon definition of the legal and ethical frameworks around data ownership, liability, and trust, and how we might avoid system-level harms like deskilling of clinicians.

Future studies could ask:

  • How can AI help us triage more effectively?

  • Can AI personalize care pathways based on lived data?

  • How does AI shift the role of the clinician over time?

This is not a call for unchecked optimism but to think more creatively and rigorously about how we evaluate and frame the role of AI in mental health, including by drawing from the considerable multi­disciplinary expertise that may contribute to this endeavor. The opportunity is not to replace human beings but to better support them and to serve the many people the system currently leaves out.

 

A photo of Arthur Kleinman, a person with light skin, gray hair, and a gray beard and mustache, wearing a brown jacket and blue shirt, and smiling at the viewer.

Arthur Kleinman
 

AI must be assessed in the everyday conversations between clinicians, patients, and families. This is where its contribution in improving communication and relationships needs to be demonstrated. Interdisciplinary collaboration is essential to this kind of work. Pairing AI experts with clinicians and social scientists is the way to organize appropriate use and evaluation of all technological interventions in health care.203

The continuous evaluation and refinement of AI tools is not only a matter of safeguards but of building AI systems that iteratively examine the evidence about interventions in order to constantly provide feedback for the improvement of practices. Here AI can build on engineering systems approaches that utilize these kinds of feedback loops. Input from software engineers working with clinicians would be helpful in developing best practices.

The empirical reality is that the United States does not have a single mental health care system but rather a chaotic and perhaps ununifiable collection of private and public systems. For severe chronic mental illness, principally schizophrenia, the public mental health system is so broken that the criminal justice system is now regarded by experts as the functioning mental health care system for most of these patients. How AI-driven interventions will figure here is going to be based on research, and this is one of the areas in which research that examines practices that augment professional caregivers should be very useful. Looking at the history of mental health practices, the private sector is likely to produce both useful examples and many examples of inappropriate use and abuse. Regulations are where we will have to work out the many questions about how the public and private sectors will relate to responsible AI applications. Given the absence of a unified system of mental health care, private and public sectors will most likely remain isolated from each other and fail in effective collaboration. I seriously doubt the utility of attempting with AI what has not happened with any other mental health intervention. Then again, perhaps this is an area where AI can help, albeit in a different way: by generating the knowledge to systematize and possibly even integrate mental health care practices and practitioners who, for far too long, have failed to collaborate.204

 

A photo of Robert Levenson, a person with light skin, long brown curly hair, and a brown mustache, wearing a dark suit and white shirt, and facing the viewer.

Robert Levenson
 

What research priorities should guide the development of AI mental health tools?

Much as with the development of a new drug, the holy trinity of research progresses from establishing safety, to evaluating efficacy and effectiveness, and finally to understanding mechanisms of action. Formal safety trials should be initiated as soon as possible for some of the more common kinds of AIMHIs, many of which already exist in the wild, challenging efforts to obtain more than anecdotal safety data about them. Such trials should be designed to generate safety data that can be compared with more conventional human therapist approaches (and perhaps combined human-bot approaches as well) when dealing with similar problems and populations. Important issues to be tracked in these trials include clients’/patients’ thoughts and acts related to harm of self and others as well as measures of mental health symptoms and well-being. In the efficacy stage, RCTs that compare AIMHIs with one another and with more conventional active treatments will be important. Psychotherapy research has often revealed that everything works better than nothing and that differences among therapies are nonexistent or small (the so-called dodo bird verdict).205 But we cannot assume that this will be the case for AIMHIs without well-designed research. Finally, existing therapy research has consistently revealed that the most potent mechanisms of action for a wide range of psychotherapies are the common, nonspecific ones (e.g., expectations, attention, placebo, alliance, time). AIMHIs bring characteristics and abilities to the table that are unique (e.g., lack of distraction and fatigue, near instantaneous access to huge bodies of knowledge, full recall of the details of past therapy sessions and client histories) and thus may have different mechanisms of action than their human counterparts.

 

A photo of Daniel Barron, a person with light skin and short brown hair, wearing a dark business suit and white shirt and smiling at the viewer.

Daniel Barron
 

Continuous, interdisciplinary research is essential to ensure that AI in mental health care evolves responsibly, effectively, and equitably. The most pressing scholarly questions must center not on AI in the abstract but on the specific clinical and administrative tasks it is meant to perform—and whether it does those jobs better than existing methods. (See Table 1 for an example of task breakdown.)

First, we need rigorous, task-level validation: In which clinical jobs can AI actually improve accuracy, safety, cost, or access? Research should focus on job-specific performance rather than generalized hype. Pablo Cruz-Gonzalez and colleagues demonstrate AI’s promise in certain diagnostic and intervention tasks but emphasize the need for more diverse datasets and model transparency, especially when algorithms are deployed across different patient populations.206

Second, we need more-robust frameworks for real-world evaluation. Validating AI tools in the lab is not enough. We must also study how they behave in complex, variable, and often unpredictable clinical environments. Hassan Auf and colleagues underscore the lack of empirical research on human-AI interaction in real-world settings, particularly when AI is used for decision support.207 Participatory design—engaging clinicians, patients, ethicists, and human-computer interaction specialists—is not optional; it is essential to building trust and ensuring tools are fit for purpose.

Third, we need longitudinal, postdeployment sur­veillance. Once an AI tool is “in the wild,” continuously assessing its performance, drift, and unintended consequences is critical. The FAITA-Mental Health framework offers one approach, linking performance back to the task-level job the AI is meant to do.208 Gauthier Chassang and colleagues call for provider-led postmarket surveillance models that pair clinical insight with user experience, a vision that could mirror pharmacovigilance for digital therapeutics.209

Finally, we must investigate how scholarly insight can shape standards and funding. Public-private partnerships could establish evaluation protocols that tie reimbursement or regulatory approval to task-level evidence. This would incentivize developers to prioritize clinically meaningful tools over flashy demos. It might also—perhaps unintuitively—help clinicians better understand the operational logic of their own systems by forcing a clear delineation of who does what and why.

In sum, the most urgent scholarly agenda for AI in mental health is not technological; it is functional. The field must interrogate which jobs are actually performed today, which jobs an AI should do, how well it does them, and what it costs when it fails.

 

A photo of Henry T. Greely, a person with light skin, gray hair and gray mustache, wearing glasses and a blue shirt, facing the viewer with his hand on his chin.

Hank Greely
 

Research and development are essential to our understanding not just of the safety and effectiveness of these methods but even of what methods are being used and how. Some publicly accessible, and preferably published, peer-reviewed research is necessary before new approaches are tried. However, given the speed of the introduction and modification of these approaches, I fear getting sufficient evidence of safety and efficacy before clinical use will be impossible. That makes it all the more important that these methods include rigorous monitoring and assessment procedures, ideally independent from corporate sponsors.

Endnotes

  • 194

    Heinz et al., “Evaluating Therabot.”

  • 195

    Jordan Nelson, Anderson Wills, and Jane Owen, “,” January 2025.

  • 196

    Gale M. Lucas, Jonathan Gratch, Aisha King, and Louis-Philippe Morency, “,” Computers in Human Behavior 37 (2014): 94–100.

  • 197

    John W. Ayers, Adam Poliak, Mark Dredze, et al., “,” JAMA Internal Medicine 183 (6) (2023): 589–596.

  • 198

    Cheryl M. Corcoran, Vijay A. Mittal, Carrie E. Bearden, et al., “,” Schizophrenia Research 226 (2020): 158–166.

  • 199

    Edward L. Deci and Richard M. Ryan, “,” Canadian Psychology/Psychologie canadienne 49 (3) (2008): 182–185.

  • 200

    Matthew P. Somerville, Helen MacIntyre, Amy Harrison, and Iris B. Mauss, (Wellcome, 2022).

  • 201

    Rob Saunders, Jae Won Suh, Joshua E. J. Buckman, et al., “,” The Lancet Psychiatry 12 (9) (2025): 650–659.

  • 202

    Heinz, Mackin, Trudeau, et al., “Randomized Trial of a Generative AI Chatbot for Mental Health Treatment”; and Chen Chen, Kok Tai Lam, Ka Man Yip, et al., “,” JMIR Human Factors 12 (2025): e65785.

  • 203

    Arthur Kleinman, Hongtu Chen, Sue E. Levkoff, et al., “,” Frontiers in Public Health 9 (2021): 729149.

  • 204

    [1]Ibid.; Laura Sampson, Laura D. Kubzansky, and Karestan C. Koenen, “,” ¶Ůæ»ĺ˛ą±ôłÜ˛ő 152 (4) (2023): 24–44; Jonathan M. Metzl, “,” ¶Ůæ»ĺ˛ą±ôłÜ˛ő 152 (4) (2023): 92–110; Gary Belkin, “,” ¶Ůæ»ĺ˛ą±ôłÜ˛ő 152 (4) (2023): 111–129; Joseph P. Gone, “,” ¶Ůæ»ĺ˛ą±ôłÜ˛ő 152 (4) (2023): 130–150; and Isaac R. Galatzer-Levy, Gabriel J. Aranovich, and Thomas R. Insel, “” ¶Ůæ»ĺ˛ą±ôłÜ˛ő 152 (4) (2023): 228–244.

  • 205

    Lester Luborsky, Robert Rosenthal, Louis Diguer, et al., “,” Clinical Psychology: Science and Practice 9 (1) (2002): 2–12.

  • 206

    Pablo Cruz-Gonzalez, Aaron Wan-Jia He, Elly PoPo Lam, et al., “,” Psychological Medicine 55 (2025): e18.

  • 207

    Hassan Auf, Petra Svedberg, Jens Nygren, Monika Nair, and Lina E. Lundgren, “,” Journal of Medical Internet Research 27 (2025): e63548.

  • 208

    Golden and Aboujaoude, “The Framework for AI Tool Assessment in Mental Health.”

  • 209

    Gauthier Chassang, Jérôme Béranger, and Emmanuelle Rial-Sebbag, “,” International Journal of Environmental Research and Public Health 22 (4) (2025): 568.