Are OpenAI ChatGPT and GPT-4 the future of healthcare?

DiploDoc
6 min readMar 31, 2023

--

From the Transformers to the ChatGPT buzz

The Transformer revolution begins with the publication of two research papers in 2017 “Attention is All you need” and in 2018 “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” by Google. Those two papers propelled research in the field of NLP (Natural Language Processing) into a new era.

Initially used for translation tasks, Transformers are quickly used in the field of computer vision. They are widely used in healthcare for : medical chatbots, medical diagnoses, medical image segmentation, protein structure and new drugs discovery.

On July 2, 2022, Open AI publishes a 75-page research paper, “Language Models are Few-Shot Learner” which presents their GPT-3 (Generative Pre-trained Transformer 3) model.

  • GPT-3 is a transformer composed of more than 175 billion parameters. To compare: its predecessor, GPT-2, was based on 1.5 billion parameters. This data corresponds to the number of values ​​that the neural network tries to optimize during its training.
  • To train GPT-3, Open AI researchers used a gigantic dataset that includes billions of documents from the web.
  • Zero-shot, one-shot, few-shot learning are fine-tuning methods used to optimize GPT-3 performance. The model can perform on some tasks from very few examples.
Few Shot Learning vs traditional fine-tuning
ChatGPT : differences between GPT-3 and the other Transformers

GPT-3 is at the origin of the research boom in the field of LLMs. These new models now monopolize the researchers of the major players in artificial intelligence: Meta, Microsoft, Baidu, Hugging Face, Nvidia, DeepMind, Google.

State of AI 2022 (slide 34)-shorturl.at/impOS
ChatGPT : What is a LLM ?

GPT-3 is also at the origin of ChatGPT launched in November 2022, which had a resounding echo in the AI ​​community but also in the media and the industry. OpenAI states on its blog that ChatGPT is based on a GPT-3.5 series model, trained on Microsoft Azure AI.

ChatGPT : what is the difference between GPT-3 and ChatGPT?

To provide general public access to ChatGPT, Open AI has developed an LLM called InstructGPT. This model is optimized via supervised learning like GPT-3 but also via the RLHF “Reinforcement Learning via Human Feedback”.

Step 1 : supervised training I Step 2 &Step 3 : RHLF

The model is learning from the appreciation of its predictions by human labelers.This technique reduces the risk of model errors. To learn more about Reinforcement Learning from Human Feedback (RHLF), you can check out this article from Hugging Face.

ChatGPT : “Act as a doctor “ !

The ChatGPT model is capable of generating human-like text. By providing it with a prompt, it can generate responses that continue the conversation or complete the initial prompt. On GitHub you will find a variety of prompts that can be used with ChatGPT.

Doctor Prompts
ChatGPT : Doctor Prompt results

Some doctors have been busy testing and tricking ChatGPT. Among them Doctor Mike who finds that ChatGPT often answers his questions correctly except when the patient’s diagnosis requires more context such as blood test results or the patient’s medical history. GPT-4 makes it possible to take into account of the context in an astonishing way.

GPT-4 applied to medicine

On March 22, 2023, OpenAI unveiled the most recent version of its language models, GPT-4.

GPT-4 is a multimodal LLM (Large Language Model) capable of accepting both text and images as input and producing text as output. GPT-4 is the model on which the augmented version of ChatGPT was developed: ChatGPT Plus.

For some experts in Artificial Intelligence, GPT-4 is a first step towards AGI: general artificial intelligence, this artificial intelligence capable of solving a wide variety of problems, just like what humans do.

On March 20, 2023, Microsoft and Open AI published a research paper “GPT-4 on Medical Challenge Problems” which shows the astonishing performances of their new model for making a diagnosis or training doctors. GPT-4 outperforms GPT-3.5 models as well as models specifically adapted to medical knowledge like Med-PaLM and Flan-PaLM 540B.

To assess the ability of GPT-4 to make diagnoses, the researchers used publicly available MedQA, PubMedQA, MedMCQA, and MMLU datasets that contain questions based on medical literature and clinical cases. They also used questions from the exams that certify doctors in the United States: the United States Medical Licensing Examination (USMLE) officially published by the National Board of Medical Examiners (NBME). Here is an example of a medical question posed to GPT-4 and the answer of the model.

GPT-4 diagnosis

On all these medical datasets, GPT-4 performs better than other GPT-3.5 and Flan-PaLM 540B models. It widens the gap significantly in terms of correct answers compared to the previous models.

Performances on USMLE
Performances on MedQA, PubMedQA, MedMCQA et MMLU

GPT-4 is able to make a diagnosis and justify its answer without having seen the images of an endoscopy specified in the prompt.

GPT-4 is able to explain his diagnosis but also to justify why he rejected the other answers of the prompt. He can also accurately explain to a medical student his misdiagnoses.

This model can also modify the results of the laboratories slightly compared to the initial prompt which has the immediate effect of modifying the diagnosis. This use of GPT-4 potentially makes it a very effective doctor training tool.

According to the researchers, GPT-4 and its successors could provide healthcare professionals with detailed analyzes and help in developing differential diagnoses from patient history and laboratory results. They insist however on the fact that the accuracy of the results provided by the models depends on the quality of the prompt, ethnic and demographic biases contained in the datasets used for training the models.

They warn that LLMs deprived of the relevant information to make the right diagnoses could have harmful consequences on the health of patients. Il is necessary to be extremely careful about their uses and regularly confront them with the empirical knowledge of doctors.

Resources used to write this article

  • Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367

--

--

DiploDoc
DiploDoc

Written by DiploDoc

Diplodocus interested in the applications of artificial intelligence to healthcare.

No responses yet