AI is able to identify the structure of proteins to create new drugs

Deep Learning boosts biology research

Biology researchers are increasingly using Deep Learning models to develop their knowledge in the rapidly growing fields of biology: Genomics and Proteomics.

  • Proteomics refers to the science that studies proteomes, all the proteins of a cell, tissue, organ or organism. Proteomics could help to unravel the mystery of giant viruses and discover new drugs.

Machine Learning and Genomics

DeepVariant to predict variants

Many models have been developed in the field of genomics in recent years. They are based on CNN (Convolutional neural network), RNN (Recurrent neural network), LSTM (Long Short Term Memory), GANs (Generative Adversarial Networks) and Autoencoders (AE) architectures: DeepTarget, DeepChrome, DeepVariant.

  • Model outputs: DeepVariant generates examples of all possible combinations of two different alleles (6 combinations)
  • Analysis: it is not difficult to deduce from the model predictions that the most likely alleles at this location are the reference allele ‘AT’ and the alleles ‘ATATTT’.

Machine learning and Proteomics

BERTology to discover protein structure

In June 2020, researchers at Salesforce Research, published a paper “BERTology Meets Biology: Interpreting Attention in Protein Language Models” that shows the use of the Natutal Language Processing model BERT in protein structure analysis. BERTology allows to study the three levels of protein structure:

  • Secondary structure: specific protein shapes (alpha helix, beta leaflet).
  • The tertiary structure: spatial folding (3D structure, contact between amino acids, binding sites).

AlphaFold 2, a revolution in the field of biology

CASP competition: comparison of protein structure
2018-AlfaFold / 2020-AlfaFold2
  • step 2 : the reconstruction of the protein structure from the obtained distance matrix (via the gradient descent system).
Winner of the CASP 2020-AlphaFold 2 competition

Attention is all you need

BERTology and AlphaFold2 are both based on neural networks with attention mechanisms (Transformers) that are used by NLP models such as GPT-3 and BERT to memorize, for example, the correlation between a pronoun and a noun in a sentence to translate.

AlphaFold 2 : from the competition to the fight against COVID-19

In 2020, Deepmind teams used AlfaFold to generate the structure of proteins associated with SARS-CoV-2, the virus that causes Covid-19.


More on DeepVariant

More on BERTology

more on AlphaFold 2

Diplodocus interested in the applications of artificial intelligence to healthcare. Twitter : @