Rethinking drug discovery in the era of digital biology-Daphné Koller

DiploDoc
4 min readMay 20, 2021

--

The biotechnology revolution and the machine learning revolution

Daphné Koller is a professor in the Department of Computer Science at Stanford University. She is also the co-founder, with Andrew Ng, of Coursera, the pioneering online course company.

Her research area is artificial intelligence and its applications in biomedicine. In 2012, she founded the start-up Insitro, which aims to discover new drugs through Machine Learning.

For Daphné Koller, two revolutions are converging: the biotechnology revolution and the Machine Learning revolution.

Machine Learning is able to identify relationships among the mountains of data generated by biotechnologies that humans would be unable to analyze.

Machine Learning allows to obtain a geometrical representation (manifold)

from the data of a multitude of patients (feature representation). The geometric proximity in this representation (manifold) corresponds to a biological proximity.

The revolution in biotechnology is generating more and more data that can be used to train Machine Learning models. Genomic data has recently exploded thanks to the decrease in sequencing costs.

The phenotype data did not show the same increase.

Some projects such as UK Biobank Data (UK) and All of us (USA) are beginning to fill this data gap on phenotypes. The data collected are varied: blood tests, MRI, demographic and environmental factors, medical records.

Understanding the progression of NASH disease through Machine Learning

NASH (non-alcoholic steatohepatitis), the accumulation of fat in the liver cells causes the progressive destruction of liver cells that can lead to serious complications (cirrhosis).

In the most advanced stages, this disease can also lead to liver cancer or liver failure with a liver transplant as the only option for the patient. In the French adult population, NASH disease affects one French person out of 5. Its incidence tends to increase with that of obesity.

In April 2019, the American pharmaceutical company, Gilead Sciences, announced the kick-off of a three-year research collaboration with Insitro ($15M upfront plus additional $35M in the short term) to find new drugs that could reverse or at least slow the progression of the disease. If certain goals are met, Insitro will receive up to an additional $1 billion.

The goal of Koller’s research is to identify genetic drivers from liver biopsy of Nash patients to explain the progression of the disease. A convolutional neural network (CNN) is trained to classify the images in four categories corresponding to symptoms of Nash disease: fibrosis, steatosis, lobular inflammation, ballooning of hepatocytes.

By comparing the results of the classification of the biopsy images (phenotype) carried out by the CNN and the genetic sequencing (DNA, RNA) of the biopsies : the researchers discovered two new pilot mutations that explain the progression of the disease.

Generating “In Vitro” training data

To train its Machine Learning models, Insitro has chosen an original approach: create its own “In Vitro” data using iPSC and CRISPR techniques. Thanks to these innovative techniques, Insitro obtains massive and diversified datasets, which allows the development of better performing models.

iPSC techniques allow any adult cell to be transformed into a stem cell. These cells are called iPS for induced pluripotent stem cells.

CRISPR makes it possible to modify genes in living cells. For example, it is possible to modify the genes of patients with Nash’s disease by introducing the two new mutations responsible for the progression of the disease.

Article written from these resources

--

--

DiploDoc
DiploDoc

Written by DiploDoc

Diplodocus interested in the applications of artificial intelligence to healthcare.

No responses yet