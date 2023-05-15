For some patients in the University of Wisconsin Health system, the next question they ask their doctor might be answered by artificial intelligence – as part of a pilot project to see if AI chatbots can save providers time.

“Since COVID, electronic messages from patients have increased 57 percent,” said Frank Liao, senior director for digital and emerging technologies at UW Health. “We’re using AI to help providers be more efficient in responding, to help save focus and attention for other clinical duties.”

That’s just one of thousands of possible ways large AI language models like OpenAI’s ChatGPT or Google’s Med-Palm could transform medicine. AI chatbots trained on publicly available internet data have shown they can pass medical licensing exams and provide cogent answers to complex diagnostic questions. One recent study even found that ChatGPT was better than doctors at responding to patient questions posted online.

But there are downsides, too:

AI models are often trained on inadvertently biased data, which can worsen inequities in health outcomes.

Some experts worry that these AI models, also called generative AI, are evolving too fast for regulators to keep up.

models, also called generative AI, are evolving too fast for regulators to keep up. One major unanswered question is whether all uses of generative AI models qualify as medical devices subject to Food and Drug Administration approval.

“It’s probably the most impactful technology in healthcare since the smartphone and the internet,” said Aaron Neinstein, vice president of digital health at the University of California San Francisco. “These models are too powerful, and moving too quickly for our human comprehension to try to imagine the future and plan for it. The only way we’re going to figure this out is to try to put some constraints and boundaries around the systems, start using them and see what works.”

How AI could help

Large language models’ most immediate impact might be cutting down paperwork, which takes the average doctor up 15.5 hours each week.

“There’s so much administrative overhead in healthcare,” said Neinstein, including electronic health records, insurance documentation, and electronically sending prescriptions to specific pharmacies. AI models are already able to help draft paperwork, referrals and reports, he said.

Such time-saving tools are already being tested at some hospitals, including UW Health, where ten providers are currently testing out the AI tool. So far, some providers have said that about half of message drafts don’t require much editing, said Liao. “Even those that require more editing still save providers time,” he said. “It’s giving their brains a jumpstart.” Liao’s team will monitor how well the software does, and whether it actually saves time before giving more providers access.

AI could also help address another time sink: taking notes. Several companies have already developed systems that listen to a conversation between a doctor and patient and write up a summary, which the doctor would review before putting into the record. (Many doctors now use human scribes for this purpose, but AI may ultimately prove cheaper and more efficient.)

“When physicians are able to spend more time with the patient at the bedside, patient outcomes improve,” said Shinjini Kundu, a physician at Johns Hopkins University.

Potential for bias – and mistrust

Existing models are still prone to “hallucinations,” where the AI comes up with a nonsensical, irrelevant or factually incorrect response to a query. “The field has to advance quite a bit in verifying that when an output is generated, whether it’s trustworthy or whether the model is uncertain,” said Michael Moor, a computer scientist and trained physician at Stanford. “Until then, we need to keep our hands on the wheel.”

And even when an AI kicks out a response that comports with the data it’s been fed, that data itself can be biased in ways that translate to unequal outcomes. Marginalized groups, including Indigenous people and Black communities, are often underrepresented in medical data, resulting in products that don’t work as well for certain groups. These problems could get worse with generative AI models, said Moor. “Very large language models can currently exhibit more social biases than their smaller counterparts.”

Even if AI models address these challenges, they still may be a hard sell for patients. Fewer than half of respondents to a Pew Research poll published in February expect AI to improve patient outcomes, and 60 percent said they’d feel uncomfortable with a provider relying on AI for their care. And 75 percent of U.S. adults expressed concern that health care providers would adopt AI too fast, underlining the importance of proper regulation.

Are regulators ready?

One major unanswered question is whether generative AI models qualify as medical devices subject to Food and Drug Administration approval.

If an AI algorithm is intended for use in the diagnosis, treatment or prevention of disease, the FDA must approve it before it can be sold and used. If it’s used for administrative purposes the algorithm doesn’t need FDA approval. For now, AI chatbots like the one UW Health is testing seem to fall outside of FDA’s purview.

“Chatbots increasingly straddle a very challenging line between devices that present information, like a search engine, and devices that take that step into impacting treatment plans,” said Danny Tobey, a lawyer and medical doctor at DLA Piper. “That’s a line that we’re going to have to figure out.”

Where any given generative AI model falls along this continuum depends on its intended use, said Bradley Merrill Thompson, an attorney at Epstein Becker Green who specializes in FDA enforcement of AI. The fact that a single model can perform tasks all along that continuum “ really challenges the regulatory framework,” he said.

That’s largely because the more a model can do, the more difficult it becomes to demonstrate its accuracy and potential costs. Until now, most AI tools have been for very specific functions. “That’s not a legal requirement that it be narrower,” said Thompson. “It’s a practical challenge of designing studies to cover and quantify each of the different functions or outcomes,” he said. Doing those kinds of evaluations “could take a lifetime,” he said.

The FDA is starting to change guidance in response to these challenges, but even commissioner Robert Califf acknowledges the agency needs to do more. “I think we’re behind, and it’s going to be really hard to catch up,” he said in a speech to the National Health council on May 8.

In April, the agency released plans on how to deal with the fact that AI models change over time as they’re fed new data. There’s no question that technology is changing faster than legislation, or even regulatory interpretation, can keep up,” said Tobey. “The old idea of analyzing a single question, asking if [the tool] is meeting this or that mark, becomes not just obsolete but a complete impossibility.

Without a clear framework for federal regulation, hospitals could be left to decide whether an AI-powered tool is safe and effective – to protect patient safety and to shield their facility from legal liability. Doctors can still veto recommendations from an AI tool, although there is a danger that some may assume AI decisions are more objective or infallible than those made by humans – even when they’re not.

Others think small tweaks may not be enough. “Regulators need to set the agenda, not just make safeguards,” said John Ayers, an epidemiologist at the University of California San Diego and co-author of the study showing ChatGPT outperformed physicians in responding to patient questions. To ensure such tools actually help patients, Ayers hopes regulators will take a more active role, including requiring that such tools have demonstrated benefits for patients.

In the meantime, Congress is trying to address potential loopholes. Last month, Senate Majority Leader Chuck Schumer announced that he is leading an effort to develop legislation to regulate artificial intelligence, including its medical uses.

“These models still have severe limitations when it comes to the health domain,” Moor said. But “the things these models are doing now were considered absolutely distant future things two or three years ago. It’s very hard to say where we will be in two or three years.”