In a few minutes, AI does what medical teams needed months to achieve

In a quiet corner of medical research, a new kind of assistant has begun rewriting the timetable of scientific progress.

Researchers testing artificial intelligence on real pregnancy data say they have watched a process that once dragged on for months unfold in minutes — with results good enough to compete with seasoned data scientists.

From months of code to minutes of computation

In most medical labs, the slowest step is rarely the experiment itself. The bottleneck is the data. Files arrive in different formats. Variables are messy. Code has to be written, checked, corrected, and written again. Building a predictive model can stretch across an entire season before anyone knows whether it actually works.

A US team from the University of California, San Francisco, and Wayne State University decided to test whether modern AI tools could tackle that bottleneck. Their question was simple: could AI systems that generate code from plain language instructions build reliable medical prediction models on their own?

The researchers focused on a high-stakes problem: predicting the risk of preterm birth using data from more than 1,000 pregnancies. Human teams and AI systems were handed the same datasets and the same mission.

What usually demands days of coding from experienced programmers was compressed into a few minutes of automated scripting.

Several AI tools produced functional analysis scripts in R or Python almost instantly. That speed let the researchers run multiple model variations one after another, without assembling large technical teams or spending weeks in debugging cycles.

The contrast with earlier work on the same data was sharp. During previous international competitions on these pregnancy datasets, more than 100 expert teams took around three months to build their best models. Turning those results into a peer-reviewed publication then took nearly two additional years.

The medical challenge behind the experiment

This was not a toy example picked just to showcase AI. Preterm birth is the leading cause of neonatal death and a major driver of lifelong motor and cognitive difficulties. In the United States alone, roughly 1,000 babies are born too early every day.

The research group assembled microbiological and clinical data from about 1,200 pregnant women, pooled from nine separate studies. Bringing these datasets together is already a difficult technical step: variables differ, measurement protocols vary, and missing information is common.

➡️ The sleep stage that consolidates memories and how to get more of it

➡️ Hairstyles after 70: four flattering haircuts for women with glasses that instantly make the face look younger

➡️ We may finally know what really causes social anxiety – and how to fix it

➡️ Psychology explains why some people consistently speak very loudly and what it may reveal about their personality

➡️ Feeling disconnected from joy has a psychological explanation you might not expect

➡️ He left his Tesla Cybertruck plugged in and went on holiday : two weeks later, the truck refused to start

➡️ Neither seeds nor cuttings needed: this simple trick multiplies rosemary successfully every time

➡️ Australia: rare brain-eating amoeba with 97% mortality rate found in tap water

Traditional statistics can struggle to draw strong signals from such large, complex, and noisy data. Modern machine learning models often do better, but they demand elaborate “pipelines”: sequences of operations that clean, transform, and analyse the data in the right order.

Designing that pipeline — not the mathematics itself — has become one of the main chokepoints in data-driven medicine.

This is exactly where the code-generating AI systems came in. Instead of a programmer spending days writing and revising scripts, the researchers described their goals in natural language. The AI then wrote the code to import the data, handle missing values, select predictors, train models, and evaluate performance.

When AI stands shoulder to shoulder with expert teams

The experiment involved eight different AI systems, all capable of generating code. Without human help on the programming steps, four of them produced usable analysis pipelines.

The surprise was not only that the code ran, but that the resulting models performed on par with — and sometimes better than — the top human teams from the earlier competitions. In other words, machines that had never “seen” the problem before could match models that had taken experts months to craft.

One telling example came from a small human group: a master’s student working with a high school pupil. With support from AI tools, they managed to build valid predictive models and prepare a scientific article within a few months. Under normal conditions, that workload would usually be out of reach for such a small, relatively junior team.

How long did the whole project really take?

From the first design of the study to the submission of the scientific paper, the new AI-driven project took around six months. That stretch included not only model building but also checking the analysis, interpreting the results, and drafting the manuscript.

For academic research, where timelines are often measured in years, that pace is unusually brisk. The researchers argue that shaving months off data analysis could allow clinical insights to reach patients sooner, particularly in fast-moving areas like maternal health or infectious disease.

By offloading the most repetitive coding tasks to machines, scientists can focus more attention on what the numbers actually mean.

What generative AI really changes in medical research

Generative AI tools used in this experiment are a specific family of systems. They do not just classify images or predict numbers. They read instructions written in everyday language and respond by writing code, building full analysis workflows from scratch.

For medical researchers, that shift is substantial. Instead of asking, “Can I find someone who knows how to implement this?” the central question becomes, “Am I asking the right scientific question?”

That change shows up in daily work in several ways:

Junior scientists can prototype complex models without deep programming skills.
Senior researchers can iterate through multiple analytical strategies in the same week.
Collaborations between hospitals and data teams become smoother because code is less of a barrier.
Data cleaning, often the dullest part of a project, can be partly automated and standardised.

In the study, some large language models were able to design complete pipelines in one go, including data processing, model training, and validation steps. Their performance reached the level of the strongest teams from the earlier global challenge on preterm birth prediction.

The catch: AI still needs a human hand on the wheel

The researchers do not pretend that AI systems are infallible. Not all tools worked as intended. Some produced code that failed immediately. Others ran but returned misleading or logically inconsistent results.

Medical data is fragile. A small error in variable definition or in the way patients are grouped can lead to biased models. Those models might look precise on paper but turn out to be useless, even dangerous, in a clinic.

Human expertise stays central: to phrase good questions, to check for hidden biases, and to reject models that look clever but are clinically meaningless.

Because of that, the team frames generative AI as a force multiplier, not a replacement. The vision is closer to a “data co-pilot” than a fully autonomous researcher. The machine proposes pipelines; the humans test, adjust, and interpret.

Key concepts worth unpacking

For readers less familiar with data science jargon, a few terms help clarify what is happening behind the scenes.

A “predictive model” is simply a mathematical tool that takes in measured information — for example, microbial profiles, age, or medical history — and outputs a probability, such as the chance of a preterm birth.

An “analysis pipeline” is the step-by-step recipe that leads from raw data to that probability. It usually includes:

data cleaning (fixing errors, standardising formats)
feature selection (choosing which variables to include)
model training (letting an algorithm learn from past cases)
validation (testing performance on unseen data)

Generative AI is being used not to change the mathematics of prediction, but to automate construction of that recipe. It writes the instructions that a human analyst would otherwise assemble by hand.

What this could mean in a real maternity ward

Imagine a large hospital caring for thousands of pregnant women each year. Today, teams might record dozens of measurements but only use a fraction in decision-making, because building and maintaining prediction tools is costly.

With mature AI-assisted pipelines, the hospital’s analysts could, at least in principle, test new models quickly: combining microbiome data, blood measures, and clinical notes to flag patients at higher risk of preterm birth. Obstetricians could then monitor those women more closely, adjust care pathways, or enrol them in prevention trials.

That scenario comes with serious conditions. Models would need rigorous external validation, clear communication of uncertainty, and ongoing monitoring for bias. Yet the prospect is no longer theoretical; the recent experiment shows that technical construction of such tools does not have to be the rate-limiting step.

Risks, benefits and what comes next

The benefits of speeding up data analysis are obvious: faster research cycles, more agile responses to emerging health threats, and wider participation from smaller or less well-funded teams. When code is less of a barrier, scientific questions, not programming skills, set the pace.

Still, new risks appear. Overreliance on AI-generated code could tempt some groups to skip thorough checks. Subtle statistical errors are hard to spot, even for experts, and generative models occasionally produce confident nonsense. There is also the concern of data privacy, as connecting hospital records to external AI tools must be handled with strict safeguards.

In practice, the most promising path may be a layered approach: AI for rapid prototyping, followed by careful human review and more traditional, slower validation studies. The recent work on preterm birth prediction suggests that this combination can compress months of programming into a few minutes, without discarding the safety net of scientific scrutiny.

As more labs experiment with these tools, the quiet revolution seen in this pregnancy study is likely to spread to oncology, cardiology, and infectious disease. The core idea remains the same: let machines handle the repetitive technical grind, so people can spend their limited time on the parts of medicine that still demand judgement, empathy, and doubt.

Originally posted 2026-03-03 14:47:06.