COMMENT
21 November 2024

AI could pose pandemic-scale biosecurity risks. Here’s how to make it safer

AI-enabled research might cause immense harm if it is used to design pathogens with worrying new properties. To prevent this, we need better collaboration between governments, AI developers and experts in biosafety and biosecurity.

Jaspreet Pannu⁰,
Sarah Gebauer¹,
Greg McKelvey Jr²,
Anita Cicero³ &
…
Tom Inglesby⁴

Jaspreet Pannu
1. Jaspreet Pannu is a fellow at the Center for Health Security, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, and a postdoctoral scholar in the School of Medicine, Stanford University, California, USA.
View author publications

You can also search for this author in PubMed Google Scholar
Sarah Gebauer
1. Sarah Gebauer is a senior physician policy researcher at RAND, Santa Monica, California, USA.
View author publications

You can also search for this author in PubMed Google Scholar
Greg McKelvey Jr
1. Greg McKelvey Jr is a senior physician policy researcher and professor of policy analysis at RAND, Arlington, Virginia, USA.
View author publications

You can also search for this author in PubMed Google Scholar
Anita Cicero
1. Anita Cicero is deputy director at the Center for Health Security and a senior scientist at the Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.
View author publications

You can also search for this author in PubMed Google Scholar
Tom Inglesby
1. Tom Inglesby is director at the Center for Health Security and a professor at the Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.
View author publications

You can also search for this author in PubMed Google Scholar

Robots working In science laboratory. — Artificial-intelligence models are able to translate an experimental method into code that runs a liquid-handling robot.Credit: Getty

Since July, researchers at Los Alamos National Laboratory in New Mexico have been assessing how the artificial intelligence (AI) model GPT-4o can assist humans with tasks in biological research. In the evaluations — which are being conducted to advance innovations in the biosciences, as well as to understand potential risks — humans ask GPT-4o various questions to help them achieve standard experimental tasks. These include maintaining and propagating cells in vitro; separating cells and other components in a sample using a centrifuge; and introducing foreign genetic material into a host organism.

In these assessments, researchers at Los Alamos are collaborating with OpenAI, the company in San Francisco, California, that developed GPT-4o. The tests are among a handful of efforts aiming to address potential biosafety and biosecurity issues posed by AI models since OpenAI made ChatGPT, a chatbot based on large language models (LLMs), publicly available in November 2022.

ChatGPT one year on: who is using it, how and why?

We argue that much more is needed.

Three of us investigate how scientific and technological innovations can affect public health and health security at the Johns Hopkins Center for Health Security in Baltimore, Maryland. Two of us research and develop solutions to public-policy challenges at the non-profit think tank RAND, which is headquartered in Santa Monica, California.

Although we see the promise of AI-assisted biological research to improve human health and well-being, this technology is still unpredictable and presents potentially significant risks. We urge governments to move faster to clarify which risks warrant most attention, and to determine what adequate testing and mitigation measures for these potential risks should entail. In short, we call for a more deliberate approach that draws on decades of government and scientific experience in reducing pandemic-scale risks in biological research¹.

Experiments at speed

GPT-4o is a ‘multimodal’ LLM. It can accept text, audio, image and video prompts, and has been trained on vast quantities of these formats scraped from the Internet and elsewhere — data that almost certainly include millions of peer-reviewed studies in biological research. Its abilities are still being tested, but previous work hints at its possible uses in the life sciences. For instance, in 2023, Microsoft (a major investor in OpenAI) published evaluations of GPT-4, an earlier version of GPT-4o, showing that the LLM could provide step-by-step instructions for using the protein-design tool Rosetta to design an antibody that can bind to the spike protein of the coronavirus SARS-CoV-2. It could also translate an experimental protocol into code for a robot that can handle liquids — a capability that is “expected to greatly speed up the automation of biology experiments”².

Also in 2023, researchers at Carnegie Mellon University in Pittsburgh, Pennsylvania, showed that a system using GPT-4, called Coscientist, could design, plan and perform complex experiments, such as chemical syntheses. In this case, the system was able to search documents, write code and control a robotic lab device³. And earlier this month, researchers at Stanford University in California and the Chan Zuckerberg Biohub in San Francisco introduced a Virtual Lab — a team of LLM agents powered by GPT4o that designed potent SARS-CoV-2 nanobodies (a type of antibody) with minimal human input⁴.

A gloved hand holds a Petri dish while checking an antibiotic in the laboratory. — Automating lab protocols using AI systems would improve scalability and reduce costs.Credit: Nattapon Malee/Getty

OpenAI released GPT-4o in May, and is expected to release its successor, GPT-5, in the coming months. Most other leading AI companies have similarly improved their models. So far, assessments have focused mainly on individual LLMs operating in isolation. But AI developers expect combinations of AI tools, including LLMs, robotics and automation technologies, to be able to conduct experiments — such as those involving the manipulation, design and synthesis of drug candidates, toxins or stretches of DNA — with minimal human involvement.

These advances promise to transform biomedical research. But they could also bring significant biosafety and biosecurity risks⁵. Indeed, several governments worldwide have taken steps to try to mitigate such risks of cutting-edge AI models (see ‘Racing to keep up’). In 2023, for example, the US government secured voluntary commitments from 15 leading AI companies to manage the risks posed by the technology. Later that year, US President Joe Biden signed an Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. Among other things, this requires companies to notify the government before they release models that are trained on “primarily biological sequence data” and that use “a quantity of computing power greater than 10²³ integer or floating-point operations.”

Racing to keep up

Since OpenAI made the chatbot ChatGPT publicly available in November 2022, governments and researchers in industry and academia have been trying to mitigate the risks of cutting-edge AI models.

21 July 2023: The US White House secures voluntary commitments from seven AI companies to test AI models for biosecurity and cybersecurity risks before releasing models. (Another eight companies agreed to commitments on 12 September 2023.)

26 July 2023: An industry body to promote the safe and responsible development of cutting-edge AI systems is established, called the Frontier Model Forum.

30 October 2023: US President Joe Biden signs an Executive Order on the Safe, Secure and Trustworthy Development and Use of AI.

1 November 2023: At the UK AI Safety Summit, 29 governments sign the Bletchley Declaration, which recognizes AI risks in “domains such as cybersecurity and biotechnology.”

2 November 2023: The UK and US AI safety institutes are announced. The UK AI Safety Institute is subsequently set up with nearly US$130 million in funding. (The US AI Safety Institute later receives funds of $10 million.)

8 March 2024: More than 170 scientists agree to voluntary commitments for the responsible use of AI for biodesign; implementation is yet to happen.

21–22 May 2024: At the AI Seoul Summit, 16 companies agree to the Frontier AI Safety Commitments, stating that they will publish “a safety framework focused on severe risks” before the February 2025 AI Summit in Paris.

20–21 November 2024: First meeting of ten governments participating in the International Network of AI Safety Institutes in San Francisco, California.

10–11 February 2025: France will host the AI Action Summit in Paris. (As of late November 2024, 3 of the 16 AI firms that agreed to publish safety frameworks ahead of this meeting have done so.)

The United Kingdom, the United States, Canada, Japan and Singapore have now established government institutes focused on AI safety to develop standards and tools for risk management. Other countries have committed to doing the same, with those five nations and Australia, France, Kenya and South Korea making up the founding members of an International Network of AI Safety Institutes, together with the European Union, which has established a safety unit in its AI Office.

These are impressive accomplishments in a short time frame, and should be supported. How much risk reduction has been achieved from all this activity, however, is unclear — in part because much of the work of these institutions has not yet been made public.

Safety testing

Separately from considerations of risk, some developers of AI models have tried to determine what factors affect their models’ performance the most. One leading hypothesis follows a scaling law: LLM performance improves with increases in model size, data-set size and computational power⁶. This is partly what influenced the US government’s decision to require AI companies to notify the Department of Commerce before releasing models that use a certain amount of computing power. But scaling laws will not reliably predict what capabilities could arise and when.

AI-generated images threaten science — here’s how researchers hope to spot them

In the meantime — in the absence of government policies on what risks urgently need addressing and how to mitigate them — companies such as OpenAI and Anthropic (also based in San Francisco) have followed evaluation protocols that they have developed in-house. (Many companies with AI systems, including Amazon in Seattle, Washington, Cohere in Canada, Mistral in Paris and xAI in San Francisco, have not yet made biosecurity evaluations of their models publicly available¹.) In these cases, safety testing has entailed automated assessments, including those using multiple-choice questions (see go.nature.com/4tgj3p9); studies in which humans attempt to elicit harmful capabilities from the model being evaluated (known as red teaming; go.nature.com/3z4kg2p); and controlled trials in which individuals or groups are asked to perform a task with or without access to an AI model (uplift studies; go.nature.com/3unhgmr).

In our view, even when companies are conducting their own evaluations, such assessments are problematic. Often, they are too narrowly focused on the development of bioweapons. For instance, the technology company Meta conducted studies to see whether its open-source LLM Llama 3.1 could increase the proliferation of “chemical and biological weapons” (see go.nature.com/3reyqgs). Likewise, the AI company Anthropic has assessed whether its model Claude could answer “advanced bioweapon-relevant questions” (see go.nature.com/48u8tyj).

A team of people in PPE work with a microscope and computers to understand a virus. — AI systems might enable the design of virus subtypes that evade immunity.Credit: Getty

The problem with this approach is that there is no publicly visible, agreed definition of ‘bioweapon’. When used in isolation, this term doesn’t differentiate between smaller-scale risks and large-scale ones. Various pathogens and toxins could plausibly be used as weapons, and many are listed in international non-proliferation agreements (see go.nature.com/3utzbw8). But few are likely to lead to the kinds of harm that could affect millions of people. Also, many pathogens, such as influenza and SARS-CoV-2, can cause severe societal disruption, but are not considered bioweapons.

Another issue is that evaluations have tended to focus too much on basic lab tasks. In the assessments being conducted by OpenAI in collaboration with Los Alamos researchers, for example, the capabilities being tested could be needed to develop something nefarious, such as a crop-destroying pathogen. But they are also essential steps for beneficial life-sciences research that do not — on their own — provide cause for alarm.

Added to all this, the evaluations conducted so far are resource-intensive and applicable mainly to LLMs. They generally involve a question-and-answer approach that requires humans to pose the questions or review a model’s answers. Finally, as mentioned earlier, evaluators need to examine how multiple AI systems operate in concert⁷ — something that is currently being requested by the US government but overlooked in industry, because companies are incentivized to test only their own models.

How to prioritize

So what does a better approach look like?

Given that resources are finite and progress in AI is rapid, we urge governments and AI developers to focus first on mitigating those harms that could result in the greatest loss of life and disruption to society. Outbreaks involving transmissible pathogens belong to this category — whether those pathogens affect humans, non-human animals or plants.

In our view, developers of AI models — working with safety and security experts — need to specify which AI capabilities are most likely to lead to this kind of pandemic-scale harm. A list of ‘capabilities of concern’ that various experts generally concur on, even if they disagree on some issues, offers a more robust starting point than does a list generated by individual companies or specialist academic groups.

Generative AI could revolutionize health care — but not if control is ceded to big tech

As a proof of principle, in June, we gathered 17 experts in AI, computational biology, infectious diseases, public health, biosecurity and science policy for a one-day hybrid workshop near Washington DC. The aim was to determine what AI-enabled capabilities in biological research would be most likely to enable a pandemic level of death and disruption — whether caused by a pandemic in humans or a widespread animal or crop disease. Views among workshop participants differed. Still, the majority of the group members rated 7 AI capabilities from a list of 17 as being “moderately likely” or “very likely” to enable new global outbreaks of human, animal or plant pathogens. These are:

Optimizing and generating designs for new virus subtypes that can evade immunity. A study⁸ showing that an AI model can generate viable designs for subtypes of SARS-CoV-2 that can escape human immunity was published in Nature in 2023.

Enjoying our latest content?
Login or create an account to continue

Access the most recent journalism from Nature's award-winning team
Explore the latest features & opinion covering groundbreaking research

Access through your institution

Continue with Google

Continue with ORCiD

Nature 635, 808-811 (2024)

doi: https://doi.org/10.1038/d41586-024-03815-2

References

Pannu, J. et al. Preprint at SSRN at https://doi.org/10.2139/ssrn.4873106 (2024).
Microsoft Research AI4Science & Microsoft Azure Quantum. Preprint at arXiv https://doi.org/10.48550/arXiv.2311.07361 (2023).
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Nature 624, 570–578 (2023).
Article PubMed Google Scholar
Swanson, K., Wu, W., Bulaong, N. L., Pak, J. E. & Zou, J. Preprint at bioRxiv https://doi.org/10.1101/2024.11.11.623004 (2024).
Urbina, F., Lentzos, F., Invernizzi, C. & Ekins, S. Nature Mach. Intell. 4, 189–191 (2022).
Article PubMed Google Scholar
Kaplan, J. et al. Preprint at arXiv https://doi.org/10.48550/arXiv.2001.08361 (2020).
US AI Safety Institute at NIST. Managing Misuse Risk for Dual-Use Foundation Models (US National Institute of Standards and Technology, 2024).
Google Scholar
Thadani, N. N. et al. Nature 622, 818–825 (2023).
Article PubMed Google Scholar
Vaishnav, E. D. et al. Nature 603, 455–463 (2022).
Article PubMed Google Scholar
Rapp, J. T., Bremer, B. J. & Romero, P. A. Nature Chem. Eng. 1, 97–107 (2024).
Article PubMed Google Scholar

Download references

Reprints and permissions

Competing Interests

A.C. serves as a member of the WHO Technical Advisory Committee on the Responsible Use of the Life Sciences and Dual-Use Research. T.I. serves as chair of the WHO Technical Advisory Committee on the Health Security Interface.

Subjects

Latest on:

Don’t let watermarks stigmatize AI-generated research content

Correspondence 26 NOV 24

Computational technologies of the Human Cell Atlas

Technology Feature 20 NOV 24

Quantum computing: physics–AI collaboration quashes quantum errors

News & Views 20 NOV 24

Why tumour geography matters — and how to map it

Technology Feature 25 NOV 24

Notching up a win: fresh tools for activating Notch

Technology Feature 21 NOV 24

China’s regional cities are now major players in world science

Nature Index 20 NOV 24

Why a teenager’s bird-flu infection is ringing alarm bells for scientists

News Explainer 20 NOV 24

US trust in scientists plunged during the pandemic — but it’s starting to recover

News 14 NOV 24

Hidden players: the bacteria-killing viruses of the gut microbiome

Outlook 31 OCT 24

Jobs

Principal Investigator Positions at the Chinese Institutes for Medical Research, Beijing

Cancer Biology, Molecular and Cellular Therapeutics, Regenerative Medicine, Immunology and Infectious Diseases, Genetics and etc...

Beijing, China

The Chinese Institutes for Medical Research (CIMR), Beijing
The recruitment for Earth Science High-talent in IDSSE, CAS

Seeking global talents in the field of Earth Science and Ocean Engineering.

Sanya, Hainan, China

Institute of Deep-sea Science and Engineering, Chinese Academy of Sciences
Assistant/Associate/Full Professor Positions in SPOE at Guangdong University of Technology

School Profile The School of Physics and Optoelectronic Engineering (SPOE) at Guangdong University of Technology is a leading research institute in...

Guangzhou, Guangdong (CN)

Guangdong University of Technology-SPOE
Faculty Positions in Advanced Materials Thrust, Function Hub, HKUST(GZ)

Faculty Positions in Advanced Materials Thrust, Function Hub, HKUST(GZ).

Guangzhou, Guangdong, China

The Hong Kong University of Science and Technology (Guangzhou)
Associate or Senior Editor, Nature Energy

Job Title: Associate or Senior Editor, Nature Energy Location: New York, Jersey City, Philadelphia or Shanghai— Hybrid Working Application Deadline...

New York City, New York (US)

Springer Nature Ltd

AI could pose pandemic-scale biosecurity risks. Here’s how to make it safer

Experiments at speed

Racing to keep up

Safety testing

How to prioritize

Enjoying our latest content?
Login or create an account to continue

References

Competing Interests

Subjects

Latest on:

Jobs

Principal Investigator Positions at the Chinese Institutes for Medical Research, Beijing

The recruitment for Earth Science High-talent in IDSSE, CAS

Assistant/Associate/Full Professor Positions in SPOE at Guangdong University of Technology

Faculty Positions in Advanced Materials Thrust, Function Hub, HKUST(GZ)

Associate or Senior Editor, Nature Energy

Search

Quick links

Experiments at speed

Racing to keep up

Safety testing

How to prioritize

Enjoying our latest content? Login or create an account to continue

References

Competing Interests

Related Articles

Subjects

Latest on:

Jobs

Principal Investigator Positions at the Chinese Institutes for Medical Research, Beijing

The recruitment for Earth Science High-talent in IDSSE, CAS

Assistant/Associate/Full Professor Positions in SPOE at Guangdong University of Technology

Faculty Positions in Advanced Materials Thrust, Function Hub, HKUST(GZ)

Associate or Senior Editor, Nature Energy

Search

Quick links

Enjoying our latest content?
Login or create an account to continue