Reinforcement Learning helps fight antibiotic resistance

Antibiotic resistance, a major health issue

In France, antibiotic resistance causes an estimated 12,500 deaths per year, according to a 2015 report submitted to the Ministry of Health. The majority of deaths affect young children under 12 months and people over 65 years of age. Worldwide: 700,000 people die each year from drug-resistant diseases, including 230,000 from multidrug-resistant tuberculosis.

Evaluation of an antibiotic treatment

David Sontag and his team of researchers have developed a RL algorithm to validate a urinary tract infection treatment by prescribing antibiotics. The algorithm developed should guide the physician’s choice: prescribe the right antibiotic and avoid the phenomenon of resistance.

The dataset used

The dataset used to develop the algorithm is the AMR-UTI Dataset. It is a cohort consisting of 15,806 microbiological specimens collected from 13,682 women with UTIs between 2007 and 2016. The train set is composed of data from 2007 to 2013 and the test set is composed of data from 2014 to 2016.

The implemented RL model

The algorithm directly learns a policy π (Pi) from the input patient data X and an output treatment decision A (action space).

The Reward Function

To train their reinforcement learning algorithm, the researchers chose :

  • A policy to learn (π),
  • An action (the antibiotic administered),
  • A reward(r) computed based on the patient’s resistance to the antibiotic and the class of the administered antibiotic (narrow spectrum-first line/broad spectrum-second line).
  • Treatment efficacy vectors Y are a function of a patient’s susceptibility to each antibiotic Yi (a) = 1 [patient i is susceptible to antibiotic a], treatment cost vector C is a function of the class of the selected antibiotic Ci (a) = 1 [a is a 2nd line antibiotic], and the composite treatment reward is a linear combination of the effectiveness and costs of each antibiotic using the preference ω ∈ [0, 1], given by ri =ω-Yi+(1-ω)-(1-Ci).

The cost function

The cost function chosen by David Sontag’s team to learn the π-policy is inspired by the “cost sensitive classification” previous works. It allows to transform a complex cost function into a simpler function to optimize.

Optimization of the fa(x) function by minimizing the above quantity

The results obtained

The objective of the researchers is to define a policy (goal on the graph) that decreases both the use of broad-spectrum antibiotics (second-line) and the inefficiency of antibiotic treatment.

Direct Policy Learning varie en fonction de w
Physicians’ decisions compared to the decisions of the RL algorithm

This article was written from these resources

Diplodocus interested in the applications of artificial intelligence to healthcare. Twitter : @