A medical AI system from Google has managed to match experienced primary care physicians in handling complex clinical presentations. That is the finding of new research published in the respected scientific journal Nature. But behind the impressive numbers lie significant caveats that cast the results in a different light.

What is AMIE?

AMIE – short for Articulate Medical Intelligence Explorer – is a conversational AI system developed by Google, built on the company's Gemini platform. The system is designed to simulate medical consultations through text-based dialogue, ask follow-up questions, and formulate treatment recommendations.

According to the study covered by Google AI Blog, AMIE was able to match primary care physicians across five medical specialties in simulated consultations. It is described as a breakthrough for conversational AI in medicine.

Google's medical AI matches doctors – but only in simulated tests - Bilde 1

Simulated patients – not real ones

The most important weakness of the study is that it did not involve a single real human patient. The trials were conducted with either AI-based patient agents or actors following detailed scripts, according to the independent research review of the study.

Experts note that text-based consultations fail to capture what actually happens during a medical examination: body language, non-verbal cues, physical examination, and the social and cultural context surrounding a patient. All of these are central to sound diagnostics.

The doctors were set up to lose – they worked under guidelines they don't use day-to-day, and faced scenarios constructed with "clean, correct answers."

It is also highlighted that the physicians who participated – recruited from Canada and India – were evaluated against British clinical guidelines they do not necessarily apply in their daily practice. This, combined with the fact that they communicated only via text, may have given the AI a structural advantage that does not reflect real clinical conditions.

Errors and lack of transparency

Despite strong results in simulated scenarios, potential reasoning errors were found in AMIE's treatment recommendations. Experts warn that AI systems can present incorrect recommendations with a high degree of confidence – without the user necessarily noticing.

Another significant point: AMIE is not open source. This means independent researchers cannot replicate or verify the system's internal logic, which is problematic in a sector where transparency is essential.

It is also worth noting that an earlier study involving real patients showed that human physicians produced more practical and cost-effective treatment plans than AMIE. That study was not addressed in the new Nature article.

A study with real patients yielded a different result – human doctors won.

Experts agree: not ready for the clinic

The broad professional consensus is clear: AMIE is not ready to be used independently on real patients. What is needed are prospective studies in genuine clinical settings before anything definitive can be said about safety, efficacy, and clinical utility.

Experts envision AI in medicine functioning as a "trusted clinical assistant" – a tool that supports healthcare professionals with analysis and information processing, while the final decision remains with humans. Clinical judgment, patient communication, and the management of uncertainty are tasks that cannot be automated away.

AMIE undoubtedly represents a technological advance in medical AI. But the road from controlled simulations to safe clinical practice is longer than the study's headlines might suggest.