Share:

Powered by Google

Sorry, something went wrong and the translator is not available.

Sorry, something went wrong with the translation request.

loading Translating

Veterinary AI radiology tools scrutinized in new study

All six platforms evaluated were judged unfit for clinical use overall; makers disagree

Published: May 04, 2026

Listen to this story.

Journal of the American Veterinary Medical Association photo
This radiograph of a dog's abdomen clearly shows the presence of a river stone the dog had swallowed as a white oval-shaped body in its intestinal region. When the case was sent to six AI platforms for analysis, only three diagnosed the obstruction correctly, researchers found.

When Dr. Stephen Joslyn and colleagues tested the accuracy of artificial intelligence programs that help veterinarians read radiographs, they discovered that the increasingly popular tools made mistakes. One case was particularly illustrative.

A dog had swallowed a river stone, the object lodging in its small intestine. Joslyn and his team at Murdoch University veterinary school in Australia sent two radiographs of the dog's abdomen to six different AI platforms for analysis; four were based in the United States, one in France and one in South Korea.

Despite the rock appearing clearly in both X-rays as an oval object in the dog's intestinal region, two of the platforms classified the case as "normal." Their AIs did not see the rock, or if they did, they didn't find it abnormal. A third platform classified the case as "abnormal" but didn't detect the intestinal obstruction. Instead, it diagnosed five conditions that the patient didn't actually have, such an apparent enlargement of the dog's spleen and a foreign body in another organ: the stomach, which was a distinct distance away from the rock.

The other three platforms did correctly detect the rock in the intestine in that specific case. But based on an analysis of 53 cases in total, the collective performance of the six AI platforms was "low to moderate," the researchers found. "Even the best-performing algorithm had notable limitations, and none appeared suitable for clinical use in their current form," their paper concludes.

Joslyn confirmed in an interview that this apparent lack of suitability applies not only to using the tools autonomously. He doesn't think veterinarians should be using them at all yet, even as an assist — especially practitioners with limited radiography experience.

"It's ticking a dopamine box for them," Joslyn said. "They don't know how bad it is, and they feel like it's helping. It's the blind leading the blind, unfortunately, at this stage."

In brief

The research, published in the Journal of the American Veterinary Medical Association, reinforces yearslong concerns held by some practitioners that readings from AI platforms should be viewed with caution, despite their manufacturers touting high levels of accuracy that have been corroborated by several other research papers.

When contacted by the VIN News Service, representatives of companies behind the tools all acknowledged their products have limitations but rejected the idea that they're unfit for clinical use, stressing that they should function as an aid to veterinarians alongside evidence found by other diagnostics and the patient's clinical signs and medical history.

Use of the technology has increased rapidly since the first products started becoming available commercially in the veterinary realm about eight years ago, launched by the U.S. companies SignalPET and Vetology.

Mars, the world's biggest owner of veterinary practices — including under the brands VCA, Banfield and BluePearl in the U.S., Anicura in continental Europe and Linnaeus in the United Kingdom — launched its own radiology AI product, RapidRead, in 2024 through its Antech veterinary diagnostics division. England's IVC Evidensia, the world's second biggest owner of veterinary practices, started rolling out SignalPET's platform in 2022 to its thousands of practices in Europe and Canada. The other three platforms tested in the Murdoch University study are products developed by the U.S.'s Radimal, France's FAS and South Korea's SK Telecom.

Veterinarians have expressed mixed opinions of AI radiology tools on message boards of the Veterinary Information Network, an online community for the profession and parent of the VIN News Service. Some praise the platforms, saying that in their experience, the tools identified conditions that they alone wouldn't have noticed. Others found they delivered ambiguous readings or identified conditions that didn't exist.

The Murdoch University researchers selected their 53 cases from general practice hospitals in Australia, and all were verified by confirmatory diagnostics such as surgery and histopathology. The cases were submitted to each of the AI platforms between September and December 2024. The average accuracy rate of the platforms ranged between 70% and 90%. However, the "balanced accuracy," a refined measure that accounts for the number of correct positive diagnoses of actual disease (known as sensitivity) and the number of correct diagnoses of cases without disease (known as specificity), ranged between 60% and 69%.

The researchers also applied a performance metric known as the Matthews correlation coefficient (MCC), which evaluates the performance of a binary classification model when there are unbalanced datasets — such as more abnormal than normal cases. MCC readings for the 53 cases varied widely, the paper states, with the poorest-performing platforms showing results indicative of performance below chance — in other words, worse than a coin toss.

"What we tried to show with the other metrics is that even if something is shown to be accurate, it can be very, very misleading," Joslyn told VIN News. "The classic example to demonstrate that is if you just say 'no' to everything, if your testing dataset doesn't have that condition, you can be 100% accurate."

Other studies more favorable to AI

The study's sobering findings contrast with previous research papers, including one published last year in Frontiers in Veterinary Science that found that when presented with 50 selected cases, SignalPET's product performed just as well as — and sometimes better than — 11 human radiology specialists who participated in the study. The authors noted that the AI performed better in confirming normal cases than in detecting abnormal findings. It also performed better when assessing cases that the authors deemed to have "low ambiguity," as opposed to tougher, "high ambiguity," cases.

Joslyn and some of his Murdoch University colleagues submitted a commentary to the same journal last year criticizing that study's methodology. They noted that the AI's interpretations weren't validated by "gold standard" means, such as actual surgery or clinical pathology findings, but by a consensus of radiologists — and by the AI itself. "The ground truth should be independent from the variable being evaluated, so including the AI's own output in establishing the correct answer is a form of circular logic — the tool being evaluated helps decide whether its prediction is considered correct," the commentary states. Among other concerns, the piece also notes that some of the 11 radiologists assessed more cases than others, introducing potential bias, particularly if lower-performing radiologists reviewed more cases.

Separately, a series of three papers about the Vetology AI tool, published in 2022 and 2023, suggest promise for using the tool for triage and screening, the Murdoch team said in their own paper. The Murdoch researchers go on to say that the Vetology evaluations do not allow adequate assessment of the platform to a broad range of cases "due to use of high-quality radiographs from a single teaching institution with clearly visible pathology."

More recently, a paper published in February in the journal Veterinary Radiology and Ultrasound found that Antech's product was as good as human radiologists at identifying heart failure in dogs and cats. It, too, was based on radiographs taken at a teaching hospital, Joslyn noted.

He maintains that the performance of all six platforms is limited by the vast variations in the quality of radiographs they're asked to analyze. AI tools applied in human medicine, he noted, while not perfect, have performed better than in veterinary medicine.

"In human healthcare, we have trained radiographers who tell the patient to stand still, take a deep breath and don't move. The quality of radiographs is high wherever you go," he said. "Whereas in the vet world, we have GP vets that take most of the X-rays, and they're doing it while they're trying to do a dental exam, and they're understaffed and they haven't had their lunch."

Moreover, variability among veterinary patients is broader than among humans, he said, pointing out, for example, that dogs alone come in a huge range of shapes and sizes. "These tools are not generalizing to real-world cases," he said. "The AI companies say they performed well, based on the training dataset that they've used, but when the AI is asked to evaluate cases from the mom and pop clinic down the road, it doesn't perform well."

Companies respond

Dr. James Barr, Antech's chief medical officer, welcomed the publication of the Murdoch team's research, maintaining that it raises valuable questions about the veracity of AI tools. Veterinarians, he added, should use the company's RapidRead product to assist their clinical judgment, not replace it.

"When you think about the way that clinical medicine works, you don't have the benefit of a crystal ball to see the future and then retrospectively work back," he said in an interview. "You deal with what you're dealing with now, and we treat the AI very much in the same way that we do with any other clinical perspective."

Barr pushed back on any suggestion that RapidRead is of little help to practitioners. The product, he said, has been trained with more than 16 million X-rays taken from real-world scenarios and vetted by a team of board-certified radiologists. "They work hand in hand with the developers, and they're very dogged about the quality of the product before it's released," he said.

RapidRead continues to improve over time, Barr said, noting the research for the JAVMA paper was conducted around 18 months ago.

"AI years are almost like dog years," agreed Radimal founder and CEO Alan Weissman. "Everything's grown up so much since then."

Weissman, too, welcomed the Murdoch team's paper for its inquisitiveness, and he acknowledged the AI tools have room for improvement.

"We really emphasize this as a triage tool, an education tool, a way to gather insights quickly while the patient is on their way to seeing the specialist," he said. "And it's also about helping with the compliance of the owner. If you give them more detailed information about what's going on with their dog, they're more likely to proceed with treatments or other diagnostics."

Vetology CEO and founder Dr. Seth Wallack said the company is still analyzing the study. Off the top, he noted that its assessment of 53 cases is a relatively small number. "It takes only a few results to really affect the sensitivity and specificity," Wallack said. "AI training and testing requires tons of examples — hundreds to thousands of cases."

Wallack maintains that Vetology offers the most transparency of any platform by openly publishing its performance data, which is based on more than 300,000 X-rays reviewed by radiologists. Joslyn welcomed the data release as "a great step in the right direction."

More studies on the way

In their paper, the Murdoch University researchers acknowledge their small sample size is a limitation. Joslyn, a veterinary radiologist himself and founder of a tech company that links pet health records to their microchips, said finding radiographs that were backed by confirmatory diagnostics and associated case notes was challenging.

"Over time, we want to have a better pipeline to access these cases to then have far more robust metrics," he said.

The team is now assessing repeatability: If the same radiograph is submitted to a platform twice, will it give the same result? They're also studying how presenting poorly taken radiographs to the AI affects its performance. "What we're hoping to do with all this is to set ourselves up as an external validating lab, where we not only look at AI radiology systems and benchmark them, but open it up to other diagnostic tests using AI, like for cytology."


Share:

 
SAID=27