Rare use of 'exact binomial confidence interval' helps keep UA accreditation on course
Listen to this story.
Coin toss art
Art by Tamara Rees
A binomial confidence interval classically is used to determine a range of probable outcomes in a scenario involving two possibilities, such as a coin toss ending in heads or tails.
The 79.4% national veterinary licensing exam pass rate achieved by the second-ever class of the University of Arizona College of Veterinary Medicine falls just short of the minimum 80% specified in accreditation standards for veterinary schools in the United States.
But the subpar test performance isn't the setback it might seem, according to the program dean. That's because the standards allow for an alternative method for computing a pass rate that falls below the minimum.
Specifically, the dean, Dr. Julie Funk, told the VIN News Service last fall that the 79.4% "score meets the accreditation criterion because the exact binomial confidence interval includes 85%."
The accreditor, the American Veterinary Medical Association Council on Education, would not confirm that it applied the statistical calculation to UA because it does not answer questions about the accreditation status of individual programs.
Assuming Funk is correct, UA's case appears to be the first time the calculation has been applied at a U.S. school in a way that could impact its accreditation.
As a new school that opened in 2020, UA's three-year veterinary program became eligible for full accreditation following the graduation of its first class in 2023. However, only 72% of that class passed the North American Veterinary Licensing Examination, notably short of the 80% threshold. To stay on track to full accreditation, new schools have two years to achieve an 80% pass rate, which is one of multiple COE standards.
Moreover, the accreditor identified other deficiencies at UA, namely in clinical resources, student support, faculty, curriculum and research. The program must address them all by June 2025 to achieve full accreditation.
While a school may operate without accreditation, approval by an accreditor authorized by the U.S. Department of Education is needed for its students to access federal financial aid, including loans.
Detailed information on the school's progress addressing deficiencies is not made public, but its licensing exam pass rate is. The explanation that applying the "exact binomial confidence interval" effectively enables the school to meet the exam standard spurred VIN News to explore the meaning and purpose of the calculation and its use in accreditation.
Of coin tosses and pass-fail tests
One of the 11 standards of accreditation laid out by the COE requires an outcomes assessment and includes the expectation that "80% or more of each college's graduating senior students sitting for the NAVLE will have passed at the time of graduation."
A footnote (see sidebar) to this criterion states that schools falling below an 80% pass rate will be subject to a secondary analysis called the "95% exact binomial confidence interval." It entails taking the number of test takers and the number who passed to compute a range of probable pass rates. The COE discards the low end of the range and uses the figure at the high end of the range as an alternative to the raw pass rate. (More on the calculation below.)
If a school's high-range number — called the "upper limit on the confidence interval" — is below 85% for two consecutive years, the school will be placed on probationary accreditation, according to the standard. If the subpar exam outcome continues for four consecutive years, the program is to be placed on terminal accreditation. In other words, it loses accreditation.
So what exactly is the exact binomial confidence interval? It is, in the simplest terms, a calculation producing a range of values based on data derived from a random process.
A classic example used to explain its utility is a coin toss. On each toss, there is a 50% chance of getting tails, but that doesn't mean 100 tosses will produce 50 tails. It's possible to get 47 tails or 53 tails. "There is randomness in that number," explained Guenther Walther, a Stanford University statistics professor. "The confidence interval gets around this by giving a range of plausible values."
In the case of a coin toss, the binomial confidence interval yields a range of possible rates for the coin landing heads or tails, such that there's a 95% chance that the range contains the true rate.
However, Philip Stark, a statistics professor at the University of California, Berkeley, is quick to point out: "Exam taking is not like coin tossing."
Stark is adamant that the confidence interval doesn't map well to the case of students taking a pass-fail exam.
"There is no way to make a confidence interval for anything," Stark said. "There's just, 'This many students took [the exam], this many passed.' Done, right? There's nothing random."
Walther agreed but allowed that the confidence interval could be useful in analyzing pass rates at programs in which a small proportion of students take the exam. However, he said, that application has limits.
When and why did the COE start using the confidence interval?
Confidence interval footnote
The confidence interval for the licensing exam standard was developed by the COE with the assistance of a veterinary statistician and epidemiologist and adopted in 2014, according to a statement provided to VIN News by the AVMA's media relations department.
The calculation was introduced when the COE was "considering how to work with international colleges that have small numbers of students sit for the NAVLE," the statement reads. "The use of a confidence interval allows the COE to determine with 95% confidence that the observed passing percentage is compatible with a passing rate of 80% or more in an entire class."
The statement continues: "The statistical approach makes it possible to aggregate small sample sizes over time to provide additional evidence that the pass rate for graduates of a college would have been 80% or better."
The COE did not elaborate on why it adopted the new analysis in 2014, since by its own account, it had been accrediting schools outside the U.S. and Canada for 40 years prior.
Three years before the COE adopted the pass-rate calculation, in 2011, it accredited programs at three foreign schools: Ross University in St. Kitts, St. George's University in Grenada and Universidad Nacional Autónoma de México (UNAM) in Mexico. The majority of students at Ross and St. George's are from the U.S. or Canada. They take the NAVLE and return to their home countries to practice. That's not the case with UNAM.
Only a handful of UNAM students, if any, take the NAVLE each year, as it is not required to practice in Mexico, and most graduates do not leave the country to practice.
The COE told VIN News that the accreditation of schools in Mexico and the West Indies was not the reason for adopting the confidence interval.
In cases where no program graduates take the NAVLE, the COE uses "other student educational outcomes in assessing compliance with the standard."
Although the confidence interval was adopted to measure outcomes at schools with few students taking the NAVLE, it is applied to all programs with students taking the exam.
Asked why, the COE stated, "That was a decision of the COE at that time. All decisions regarding standards and their revisions go through a process that requires stakeholder input. The Council welcomes and considers input from all stakeholders .... The Council routinely does an in-depth review of three to four standards each year."
Applying the confidence interval to pass rates
Stanford's Walther said that using the confidence interval could be a good way to assess pass rates for programs with small numbers of students taking the NAVLE.
To illustrate, he said, suppose five students took the exam and three passed. It would take only one student "having a bad day" to dramatically throw off the pass rate. "So, the pass rate has a lot of chance error in it," he reasoned.
The confidence interval takes that into account, he elaborated, by computing a range of plausible values for p (p being the chance that eligible students from the school will pass). "Getting an interval for p is like extrapolating to a larger class," he explained, because p is the probability that any given student from the school will pass, regardless of the number who participate.
When asked whether the situation would be different if the graduating class included a majority of students for whom English or French (in which the NAVLE is offered) is a second language, Walther, in effect, said yes.
"If the five students self-select because of language skills … this can throw off the conclusions in a major way," he said. "In essence, we are now selecting a group of students that have a higher chance of passing to take the test. Then the above calculation doesn't give a valid confidence interval anymore, and it's not easy to correct that."
Stark, at UC Berkeley, said that using the binomial confidence interval to figure the probability of pass rates regardless of whether there is high participation or low participation doesn't make sense.
"If you know the population, why are you calculating a confidence rate?" he said. "There is no rationale for applying this approach."
He sketched out what he called a "cartoon" in which the binomial confidence interval would be appropriate. In this scenario, every student goes into the exam with a coin. Each tosses the coin. Heads, they pass. Tails, they fail. Every student uses the same coin.
"The binomial confidence interval is a confidence interval for the probability that that coin lands heads," he said. "That's not for the rate of passing, and it's in a crazy cartoon where every student has the same chance of passing, which we know is not true. Some students know the material better. Some are better test takers."
Stark said the confidence interval could make sense in a group of test takers selected at random. But in the real world, the test takers are not a random sample. "They are the students who chose to take the test, and in general, those are probably the ones who think they're going to pass," he said.
A closer look at the math
Another issue is one of fairness, according to Dr. Mark Rishniw, an adjunct professor and researcher at Cornell University's School of Veterinary Medicine, where he sometimes teaches statistical analysis. He said that applying the confidence interval to extrapolate pass rates for schools with few students taking the NAVLE — the impetus for using the probability calculation in the first place — gives those programs an advantage.
To demonstrate, Rishniw provided two examples using "the confidence interval of a proportion" equation at VassarStats.net, a free computation website.
At College A, 120 out of 200 students pass the test.
At College B, three out of five students pass.
In each case, the pass rate is 60%.
Applying the COE's alternative analysis for those schools with a raw pass rate below 80%, College A has a confidence interval with a lower limit of 53% and an upper limit of 66%. So it falls short of the 85% upper limit it needs under the alternative analysis.
College B, on the other hand, has a confidence interval with a lower limit of 23% and an upper limit that is almost 90%, meeting the standard.
"If a school has very, very few students taking the test, then they're much more likely to have a really reasonable chance of qualifying for accreditation because their confidence interval is so broad," said Rishniw, who is also the director of research at the Veterinary Information Network, an online community for the profession and parent of VIN News.
In contrast, the bigger the sample, the narrower the confidence interval. In general, for most programs with many test takers, it's unlikely the confidence interval will make a significant statistical difference affecting accreditation, Rishniw said. But it worked for UA.
Because its pass rate was 79.4%, the confidence interval could be applied. With 107 UA students taking the test and 85 passing, the confidence interval yields a lower limit of 70% and an upper limit of 86%, which is just above the COE standard. If one more student had failed and the raw pass rate dropped to 78.5%, the program would still meet the standard using the confidence interval.
What do other accreditors do?
VIN News contacted accreditors of dental, medical, nursing and pharmacy schools in the U.S. to learn whether they use national licensing exam pass rates in their accreditation standards and, if so, how. None uses the exact binomial confidence interval in their standards.
The Commission on Dental Accreditation, the only accrediting organization for dental medicine schools, is the most distinct from the COE in that it doesn't use pass rates as a measure of program performance, according to a review of standards on its website.
The Liaison Committee on Medical Education (LCME), which accredits programs that grant doctor of medicine degrees, does include pass rates in its outcomes assessment standards, but it treats them differently from the veterinary education accreditor.
The LCME includes in its standards student performance on the first two parts of the three-part U.S. Medical Licensing Examination. The LCME determines whether a medical education program's performance on the examinations is within or outside of a "target" range based on national aggregate performance on these examinations over the most recent two-year period, according to LCME Co-Secretary Veronica Catanese.
She said these pass rates are not "hard and fast," nor are they a "trigger." Rather, they serve as a "dipstick" to help programs assess trends in student performance and to identify areas that warrant a closer look to identify and address the factors contributing to the performance concerns.
The organizations whose approach to the licensing exam pass rate most closely parallels the veterinary accreditor are three that accredit nursing schools.
All specify an 80% pass rate on the national licensing exam. Of the three, the Accreditation Commission for Education in Nursing (ACEN) is the only nursing accreditor recognized to accredit programs domestically and internationally, making it most comparable to the COE.
One way the ACEN standard differs from the COE's, though, is that it applies to a variety of licensing exams, not just one, according to Kathy Chappell, ACEN's chief executive officer. It "encompasses any required examination that is required for nursing practice in that country (or, if none is required, the peers describe the process for licensure in that country)," Chappell explained by email.
On the pharmacy side, the Accreditation Council for Pharmacy Education uses a "first-time passing rate" on its licensing exam "as an annual monitoring parameter," said Greg Boyer, the organization's associate executive director.
Any program with a passing rate below a certain level — a level determined by the mean performance of all programs — "is asked to do a root cause analysis and explain steps to be taken to improve," he said.
Programs that continue to fall below the standard are required to meet with the accreditation board to discuss steps for improvement. And starting this July, hitting the two-standard-deviation threshold five years in any seven-year period "will result in withdrawal of accreditation," he said.
The pharmacy school accreditor offers recognition to international programs, but unlike the COE, it does not require non-U.S. programs to adhere to prescribed standards, such as passing a licensing exam at a specified rate. The accreditation also does not qualify students to take the U.S. pharmacy licensing exam nor make them eligible for federal student loans.
Rather, it is "primarily a quality improvement process," Boyer said. "It provides a means for programs to demonstrate that they have engaged an outside regulatory body in an evaluation process," he said, using criteria "that were developed with input from international pharmacy education experts."