Reinforcement Learning helps fight antibiotic resistance

4 min readMay 29, 2021

Antibiotic resistance, a major health issue

In France, antibiotic resistance causes an estimated 12,500 deaths per year, according to a 2015 report submitted to the Ministry of Health. The majority of deaths affect young children under 12 months and people over 65 years of age. Worldwide: 700,000 people die each year from drug-resistant diseases, including 230,000 from multidrug-resistant tuberculosis.

According to a report by a group of international experts, antibiotic resistance is expected to cause “10 million deaths per year” worldwide by 2050, more than cancer. The deaths would occur mainly in Asia (4.7 million) and Africa (4.1 million). In Europe, the study forecasts an annual average of 390,000 deaths. It would be 317,000 in the United States.

New report calls for urgent action to avert antimicrobial resistance crisis

UN, international agencies and experts today released a groundbreaking report demanding immediate, coordinated and…

www.who.int

Evaluation of an antibiotic treatment

David Sontag and his team of researchers have developed a RL algorithm to validate a urinary tract infection treatment by prescribing antibiotics. The algorithm developed should guide the physician’s choice: prescribe the right antibiotic and avoid the phenomenon of resistance.

Treatment Policy Learning in Multiobjective Settings with Fully Observed Outcomes

In several medical decision-making problems, such as antibiotic prescription, laboratory testing can provide precise…

arxiv.org

The dataset used

The dataset used to develop the algorithm is the AMR-UTI Dataset. It is a cohort consisting of 15,806 microbiological specimens collected from 13,682 women with UTIs between 2007 and 2016. The train set is composed of data from 2007 to 2013 and the test set is composed of data from 2014 to 2016.

AMR-UTI: Antimicrobial Resistance in Urinary Tract Infections Dataset | MIT Clinical ML

AMR-UTI is a freely accessible dataset, derived from electronic health record (EHR) information on over 80,000 patients…

clinicalml.org

The implemented RL model

The algorithm directly learns a policy π (Pi) from the input patient data X and an output treatment decision A (action space).

This decision is to give one of four antibiotics: nitrofurantoin (NIT), trimethoprim-sulfamethoxazole (SXT), ciprofloxacin (CIP) and levofloxacin (LVX).

NIT and SXT are first-line (narrow spectrum) antibiotics, while CIP and LVX are second-line (broad spectrum) antibiotics.

Narrow-spectrum (first-line) antibiotics kill only a limited number of bacteria. They can target and kill the bacteria that are causing the disease while leaving other bacteria alive, which may be beneficial. This type of antibiotic is usually prescribed when the doctor knows exactly which bacteria are causing the infection.
Broad-spectrum (second-line) antibiotics are effective against many different bacteria, including some that are resistant to narrower-spectrum antibiotics. This type of antibiotic is prescribed when the doctor is not sure which bacteria is causing the infection or when the illness is caused by several different bacteria.
The use of narrow-spectrum antibiotics limits the development of multi-drug resistant strains of bacteria.

An antibiotic policy to prevent emergence of resistant bacilli — PubMed

Background: Fear of infection in neonatal intensive care units (NICUs) often leads to early use of empiric…

pubmed.ncbi.nlm.nih.gov

The Reward Function

To train their reinforcement learning algorithm, the researchers chose :

A patient agent (X),
A policy to learn (π),
An action (the antibiotic administered),
A reward(r) computed based on the patient’s resistance to the antibiotic and the class of the administered antibiotic (narrow spectrum-first line/broad spectrum-second line).
Treatment efficacy vectors Y are a function of a patient’s susceptibility to each antibiotic Yi (a) = 1 [patient i is susceptible to antibiotic a], treatment cost vector C is a function of the class of the selected antibiotic Ci (a) = 1 [a is a 2nd line antibiotic], and the composite treatment reward is a linear combination of the effectiveness and costs of each antibiotic using the preference ω ∈ [0, 1], given by ri =ω-Yi+(1-ω)-(1-Ci).

The cost function

The cost function chosen by David Sontag’s team to learn the π-policy is inspired by the “cost sensitive classification” previous works. It allows to transform a complex cost function into a simpler function to optimize.

Optimization of the fa(x) function by minimizing the above quantity

The results obtained

The objective of the researchers is to define a policy (goal on the graph) that decreases both the use of broad-spectrum antibiotics (second-line) and the inefficiency of antibiotic treatment.

The trained model (blue circle) performs better than the physicians (red crosses) and the recommendations (diamonds).

Direct Policy Learning varie en fonction de w

For 1245 cases in which clinicians opt for second-line antibiotics, the algorithm selected first-line antiobiotics 1014 times without compromising favorable treatment outcomes.

For antibiotic recommendation for uncomplicated UTIs, the algorithm developed by David Sontag and his team reduced second-line antibiotic use by 50% and inappropriate treatment by 20%.

Physicians’ decisions compared to the decisions of the RL algorithm

This article was written from these resources