Natural Language Processing can diagnose lung infections (BERT, BigBird)

5 min readMay 17, 2021

The NLP used to create labeled datasets

To advance research and train algorithms such as CheXNeXt, researchers need labeled data sets of chest X-rays. CheXpert, MIMIC-CXR, PadChest, ChestX-ray14, IU X-Ray are the most commonly used datasets.

It is tedious to manually assign labels to the images in these datasets, so researchers create labelers. To create CheXpert, Stanford researchers developed an NLP model to extract image labels using the de-identified radiological reports associated with each ray.

This work resulted in the creation of a tagged data set consisting of 224,316 chest x-rays of 65,240 patients who underwent radiographic examinations at Stanford University Medical Center between October 2002 and July 2017. This dataset is open source and allows researchers to propose new models to detect lung pathologies such as COVID-19.

CheXpert: A Large Dataset of Chest X-Rays and Competition for Automated Chest X-Ray Interpretation.

Please read the Stanford University School of Medicine CheXpert Dataset Research Use Agreement. Once you register to…

stanfordmlgroup.github.io

CheXpert and ChestXray-14 to diagnose Covid-19

The CheXpert and ChestXray-14 datasets, enriched with COVID-19 data, allowed IEEE researchers to train two CMTNet and ReCoNet models capable of classifying COVID-19 and non-COVID-19 thorax X-rays and provide visual segmentation of the X-ray to locate anomalies.

CMTNet

Multi-Task Driven Explainable Diagnosis of COVID-19 using Chest X-ray Images

With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the…

arxiv.org

ReCoNet

ReCoNet: Multi-level Preprocessing of Chest X-rays for COVID-19 Detection Using Convolutional…

Life-threatening COVID-19 detection from radiomic features has become a dire need of the present time for infection…

www.medrxiv.org

BERT : a new era for the NLP

The research paper on CheXpert indicates that Stanford researchers to develop this label used the following NLP libraries:
- NLTK (Bird, Klein, and Loper 2009) Natural Language Toolkit, an advanced platform for building Python algorithms
- the Bllip parser (Charniak and Johnson 2005; McClosky 2010), a Python parser library.
- Stanford CoreNLP (De Marneffe et al. 2014), a natural language processing library in Java.

To carry out the entire process of natural word processing, researchers have at their disposal other NLP libraries: spaCy, Gensim, SparkNLP, PyTorch-NLP, Scikit-learn, Tensorflow, Transformers.

The publication of the BERT model (Bidirectional Encoder Representations from Transformers), in 2018, by Google means for experts the beginning of a new era in the field of Natural Language Processing (NLP).

BERT builds on the strengths of the two previous models ELMo (taking into account the context of words in the sentence) and Open AI GPT (attention mechanism to distinguish the most important words in the sentence).

BERT has been trained on a Bookcorpus of 800M words and the English language wikipedia of 2.5000M words. It takes 3 days to pre-train with 16TPU. He is able to predict a word and predict what the next sentence will be. For annecdote, there are two French versions of BERT: CamemBERT and FlauBERT.

Architecture de BERT vs ELMo et Open AI GPT

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations…

jalammar.github.io

CheXbert: improving the CheXpert labeler

The Stanford researchers who developed the CheXpert model proposed a new, more powerful CheXbert model for labeling radiology reports. As its name suggests, CheXbert is based on the training of a BERT model.

BERT is first trained on rule-based annotations from the labeler CheXpert, and then refined on a small set of radiologist annotations supplemented by automated back-translation.

Through the use of BERT, CheXbert is able to outperform CheXpert, establishing a new State of the Art for labeling reports on one of the largest chest x-ray datasets.

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report…gorithme

The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing…

arxiv.org

Researchers are able to compare the performance of different T-auto, CheXpert and CheXbert labelers and understand how they interpret the words in the sentences and how they generate the label.

BigBird for more efficient genome sequencing

At the end of July 2020, Google researchers published a new research paper on arxiv that presents BigBird: a new model of NLP that outperforms BERT. BigBird uses the “Sparse Attention” mechanism which allows it to process sequences up to 8 times longer than is possible with BERT with the same computing power.

One of the uses of BigBird identified by researchers is its application in the field of DNA sequencing. DNA sequence analysis can be used to identify, diagnose and potentially find treatments for genetic diseases. It is also used to analyze viruses and find vaccines.

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP…

arxiv.org

Go further…. Natural Language Processing (NLP) Zero to Hero with Tensorflow

Go further…. Natural Language Processing (NLP) Zero to Hero with Tensorflow

Go further…. with BERT

Learn BERT — most powerful NLP algorithm by Google

Understand and apply Google’s game-changing NLP algorithm to real-world tasks. Build 2 NLP applications.

www.udemy.com

Going further …with BigBird

Natural Language Processing can diagnose lung infections (BERT, BigBird)

The NLP used to create labeled datasets

CheXpert: A Large Dataset of Chest X-Rays and Competition for Automated Chest X-Ray Interpretation.

Please read the Stanford University School of Medicine CheXpert Dataset Research Use Agreement. Once you register to…

CheXpert and ChestXray-14 to diagnose Covid-19

Multi-Task Driven Explainable Diagnosis of COVID-19 using Chest X-ray Images

With increasing number of COVID-19 cases globally, all the countries are ramping up the testing numbers. While the…

ReCoNet: Multi-level Preprocessing of Chest X-rays for COVID-19 Detection Using Convolutional…

Life-threatening COVID-19 detection from radiomic features has become a dire need of the present time for infection…

BERT : a new era for the NLP

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Discussions: Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments) Translations…

CheXbert: improving the CheXpert labeler

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report…gorithme

The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing…

BigBird for more efficient genome sequencing

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP…

Learn BERT — most powerful NLP algorithm by Google

Understand and apply Google’s game-changing NLP algorithm to real-world tasks. Build 2 NLP applications.

Written by DiploDoc

No responses yet