AI vs Embryologists: A Trial Funded by the AI Manufacturer Found Its Own Product Didn't Work

Last updated: February 2026

Journal Club #4. We read the latest fertility research so you don’t have to.

The Paper

“Deep learning versus manual morphology-based embryo selection in IVF: a randomized, double-blind noninferiority trial”

Illingworth PJ, Venetis C, Gardner DK, Nelson SM, Berntsen J, Larman MG, et al. Nature Medicine, 2024;30(11):3114–3120.

Read the full paper on PubMed →

DOI: 10.1038/s41591-024-03166-5 | Trial registration: ANZCTR 379161

Why This Matters

Some clinics now offer AI-powered embryo selection as an add-on, typically $500 to $1,500 per cycle. The pitch: a deep learning algorithm analyzes time-lapse images of your embryos and picks the one most likely to implant. Faster than a human. More consistent. Data-driven.

This is the first large randomized trial testing whether that pitch holds up. 1,066 patients across 14 IVF clinics in Australia, the UK, Denmark, and Sweden. The AI system: iDAScore v1.2, made by Vitrolife.

The question for patients is direct: if your clinic charges extra for AI embryo selection, are you paying for better outcomes or better marketing?

What “Noninferiority” Means

This trial wasn’t designed to show AI is better than embryologists. It tried to show AI is “not meaningfully worse.” That’s a noninferiority design. The researchers set a bar: AI pregnancy rates could be up to 5 percentage points lower than human selection and still be considered acceptable. The AI failed to clear even that bar.

Key Findings

Clinical pregnancy rates: AI did not demonstrate noninferiority

	iDAScore (AI)	Morphology (embryologist)	Difference (95% CI)
ITT	46.5% (248/533)	48.2% (257/533)	-1.7% (-7.7 to 4.3)
Per-protocol	47.4% (237/500)	48.8% (245/502)	-1.4% (-7.6 to 4.8)

The lower bound of the confidence interval (-7.7%) crossed the pre-specified -5% noninferiority margin. In plain terms: the trial could not rule out that AI selection is meaningfully worse than a trained embryologist.

Live birth rates trended lower with AI

	iDAScore	Morphology	Risk difference (95% CI)
Live birth rate	39.8% (212/533)	43.5% (232/533)	-3.9% (-9.9 to 2.2)

Not statistically significant (P=0.24). But a 3.7 percentage point gap in live births is not a number patients can afford to ignore. For every 27 women who use AI selection instead of a human embryologist, the point estimate suggests one fewer baby.

Frozen transfers: AI performed significantly worse

Transfer type	iDAScore	Morphology	P value
Fresh	48.1%	44.5%	0.35
Freeze-all	49.5%	61.3%	0.032

This subgroup analysis was post-hoc, so treat it as hypothesis-generating. But the gap is 11.8 percentage points in frozen transfers. Freeze-all protocols are increasingly common across European clinics. If AI struggles specifically with embryos that have been vitrified, that matters for how most patients will encounter this technology.

AI was faster. That’s all it proved.

	iDAScore	Embryologist
Assessment time	21.3 seconds	208.3 seconds

Ten times faster. Undeniable. But “faster” is a benefit for the clinic’s workflow, not for your pregnancy rate. A blastocyst assessment that takes three minutes instead of twenty seconds has no effect on your cycle timeline. You won’t notice the difference. Your embryo might.

When AI and embryologists agreed, outcomes were similar

In 65.8% of cases, the algorithm and the embryologist picked the same embryo. When they disagreed (34.2% of cases), pregnancy rates were slightly lower with the AI choice: 44.7% versus 48.3%.

Who Paid for This

Vitrolife funded the trial. Vitrolife manufactures iDAScore. Vitrolife also manufactures the EmbryoScope time-lapse incubators that iDAScore requires to run.

Two co-authors (Berntsen and Larman) are Vitrolife employees who own company shares and hold pending iDAScore patents. A third (Hardarson) was previously employed by Vitrolife and is an iDAScore patent holder. Other authors received Vitrolife research grants, speaking fees, and travel funding.

The manufacturer designed a trial for its own product, with its own employees as co-authors, and still could not demonstrate that the AI was non-inferior to a human. The trial’s own conclusion: “This study was not able to demonstrate noninferiority of deep learning for clinical pregnancy rate.”

The Editorial That Followed

Nature Medicine published an editorial alongside the trial: “The inconvenient reality of AI-assisted embryo selection in IVF” (Kieslinger et al., 2024). The editors were direct.

They described AI in IVF as being “hyped for its potential” while noting that “inflated expectations of new technologies are not always justified.” They placed AI embryo selection on a Gartner hype cycle, calling time-lapse systems with or without AI “non-essential extra treatments” that are “not proven to be effective.”

Their sharpest observation: the fertility industry sells innovations “in the absence of high-quality evidence, during the peak of inflated expectations.” This trial, they argued, is what happens when you actually test the hype.

What This Means for Patients

If your clinic offers AI embryo selection as an add-on, ask what evidence supports it. This is the largest randomized trial to date, and the AI did not match a trained embryologist. “Our software uses deep learning” is a technology description, not a clinical outcome. You wouldn’t pay extra for a surgeon who operates faster but not better. The same logic applies here.

The frozen transfer result deserves attention. Most single embryo transfer protocols in Europe now involve a freeze-all approach. The one subgroup where AI significantly underperformed is the one most patients will actually experience.

A power problem makes this worse, not better. The trial was designed expecting a 35.4% pregnancy rate in the control group. The actual rate was 48.2%. That mismatch means the trial was underpowered: to properly test noninferiority at real-world pregnancy rates, you’d need roughly 7,800 patients. Nobody is running that trial. The existing data is what patients have to work with, and it doesn’t support the add-on.

Limitations

The trial used iDAScore v1.2 specifically. Newer versions or competing AI systems may perform differently, though none have published noninferiority data from randomized trials.
All 14 clinics used EmbryoScope time-lapse incubators (Vitrolife hardware). Results may not generalize to AI systems running on different platforms.
The frozen transfer subgroup finding is post-hoc and needs prospective confirmation. The interaction was statistically significant (P=0.022), but the subgroups were not pre-specified.
The trial enrolled women under 42 with at least two blastocysts on day 5. Results may differ for older patients or those with fewer embryos.

Practical Takeaway

AI embryo selection is an add-on that costs $500 to $1,500 per cycle, runs only on the manufacturer’s own hardware, and failed to demonstrate noninferiority to a trained embryologist in the manufacturer’s own trial. The AI is faster. It is not better. Speed benefits the lab, not the patient.

Before paying for it, ask your clinic: is there a published randomized trial showing this AI improves live birth rates? As of today, the answer is no.

This is part of EuroFertile’s Journal Club. Summaries of recent fertility research, written for patients, not doctors. Browse all research summaries →

Comparing clinics across Europe? Get matched to clinics in your budget → | Estimate your costs →