Taking causality into account to broaden the scope of AI
For David Sontag, Researcher at MIT, Artificial Intelligence is increasingly used in the field of medicine, but mostly for diagnostic applications that only require predictions: detection of lung cancer, skin cancer, diabetes.
However, if Artificial Intelligence is to be used decisively in the health field, causality must be taken into account in order to determine the impact of a treatment on a patient’s pathology and the evolution of the disease. Causality will also help determine which treatment is most effective.
“Causal inference is a part of AI and machine learning. And not surprisingly, some of the best research in causal inference & ML is being done by researchers in medical AI such as @suchisaria.” — Thomas G. Dietterich (@tdietterich) May 4, 2019
Causal Inference to determine the effects of a treatment
In the context of a clinical trial of a drug. The question of treatment efficacy can be framed as a causal inference problem using a causal diagram.
The variable X represents the context, in this case the patient’s medical history. The variable T is the treatment:
T = 0, the patient is in the control group,
T = 1, the patient is treated with the drug.
The variable Y is the result of the treatment: Y(0) result observed for a patient in the control group and Y(1) result observed for a treated patient.
The effects of a treatment are assessed by two calculations: Average Treatement Effect (ATE) and Conditional ATE (CATE).
- ATE is used to determine on average which treatment is more effective.
- CATE is used to determine for a patient with a specific medical history which treatment is best.
- When E [Y1|x, T = 1]-E [Y0|x, T = 0] ] ≠E [Y1]-E [Y0] there is a confounding factor and no valid conclusion can be made about the effectiveness of one treatment over another.
- In the case studied below: -075 ≠0.75. It is impossible to determine which of treatment A or treatment B is more effective in controlling blood glucose.
To avoid Simpson’s paradox, there are three solutions: randomization of the trial, adjustment and re-weighting.
Researchers use machine learning algorithms to estimate the treatment effect (ATE and CATE) implementing covariate adjustment and propensity score re-weighting.
Machine Learning to estimate the treatment effect
To calculate the CATE and ATE, the researchers first train algorithms that take as inputs X (patient history) and T (treatment) and produce as output Y (observed results).
The classical linear regression models do not work properly. They obtain good accuracy on individuals with observed results but poor performance on conterfactuals (unobserved results).
On the “effect of model misspecification” diagram, the linear regression model (orange line) is perfect for the observed results (orange dots) but the generalization on a new distribution of untreated individuals (blue dots) is non-existent.
Causal inference models are different from classical Machine Learning algorithms because they have to perform well on two different distributions: treated distribution (red solid circles) and counterfactual treated distribution (red dotted circles).
Y1(x) on the “Covariate adjustment” scheme fulfills this requirement. The same reasoning applies for Y0(x) which fits both the “control” distribution (blue solid circles) and the “counterfactual control” distribution (blue dotted circles).
In recent years, researchers have deployed non-linear Machine Learning algorithms that meet this need for adjustment: random forests and bayesian trees, gaussian processes. Now, following the example of David Sontag and his teams, they are trying to train models based on Neural Networks.
The model described in the research paper “Estimating individual treatment effect: generalization bounds and algorithms” is based on a DNN (Deep neural network)
Papers with Code — Estimating individual treatment effect: generalization bounds and algorithms
There is intense interest in applying machine learning to problems of causal inference in fields such as healthcare…
The architecture of their model is based on two parts:
- The function Φ takes input only from X to generate a shared representation for T =0 and T= 1.
- Then they use two branches of different layers (layers in blue and layers in orange) to perform the Y0 and Y1 predictions.
Propensity score re-weighting
Propensity score re-weighting is another tool for estimating ATE. The idea is to transform an observational study into a pseudo-randomized trial by changing the weighting of the samples. In the case below, the weighting of the “control group” (unobserved) points is increased while the weighting of the “treatment group” points is decreased.
The use of the Propensity score re-weighting also opens the field of possibilities in the optimization of algorithms such as the RCFR “re-weighted counterfactual regression” developed by David Sontag and his team of researchers.
A weighting function W(x) is added to the initial TARNet model cost function. The new cost function to be minimized is : re-weighted regression + regularizing based on imbalances.
TARNet and counterfactual regression models perform best in assessing ATE and CATE.
This article was written from the following videos…
MIA: David Sontag, Fredrik Johansson, AI for health needs causality
MIT 6.S897 Machine Learning for Healthcare, Spring 2019-Causal Inference Part 1
MIT 6.S897 Machine Learning for Healthcare, Spring 2019-Causal Inference Part 2
More videos about causality and machine learning
Causality and Increasing Model Reliability — SUCHI SARIA
Towards Discovering Casual Representations — Yoshua Bengio