The Machine Learning help fight Type 2 diabetes.

4 min readMay 12, 2021

Risk stratification

Risk stratification allows healthcare providers to identify the right level of care and services for distinct subgroups of patients.

It involves assigning a risk status to patients for a particular condition and then using this information to guide care and optimize health care spending costs. Risk stratification is used to assess this type of risk:
- morbidity risks for premature infants.
- admission of patients to a coronary care unit.
- the likelihood of readmission to hospital.

Traditional risk stratification with a scoring grid

Traditionally, risk stratification was estimated using scoring grids. These scoring grids, which were not widely used by caregivers, have been replaced by Machine Learning algorithms with numerous input variables.

Machine learning models are less difficult to develop and to be adopted by medical staff. They are also more accurate, as instead of relying on a few questions, they are trained with thousands of variables.

Type 2 diabetes

In 2019, 1 in 11 people worldwide, or 463 million people, had diabetes (3.3 million French people had diabetes).

90% of diabetics are type 2. Type 2 diabetes is a disease characterized by chronic hyperglycemia, high level of glucose (sugar) in the blood.

The disease usually appears after the age of 40 and is diagnosed at an average age close to 65. The incidence is highest between 75 and 79 years of age with 20% of men and 14% of women treated for this disease.

Type 2 diabetes is affecting more and more young people, including adolescents and even children. Nutritional imbalances and sedentary lifestyle are increasingly contributing to the “spread” of T2D.

Machine Learning to predict the risk of T2D

David Sontag, a researcher and professor at MIT, explains in his lecture how the Machine Learning has replaced the scoring grids in the analysis of risk factors for type 2 diabetes.

Population-Level Prediction of Type 2 Diabetes From Claims Data and Analysis of Risk Factors …

We present a new approach to population health, in which data-driven predictive models are learned for outcomes such as…

pubmed.ncbi.nlm.nih.gov

Stratification des risques avec Machine Learning (42000 variables en entrée)

From the claims, pharmaceutical records and laboratory results of 4.1 million individuals between 2005 and 2009 : 42000 variables were selected that describe the complete health status and history of each individual.

Machine learning was then used to select predictor variables and train the model on these periods : 2009–2011, 2010–2012 and 2011–2013.

The model used is a logistic regression with L1 regularization. The L1 regularization performs a selection by assigning to the insignificant input variables of the model a zero weight and to the useful variables a non-zero weight.

After training the model 769 variables were selected as predictive. The model shows that sleep apnea, shortness of breath, esophageal reflux are risk factors for Type 2 diabetes.

Machine Learning makes it possible to generate hypotheses of risk factors for a disease. It allows to evaluate the risks within a population and to set up appropriate prevention policies.

Non-stationary side of health data

David Sontag also emphasizes the non-stationary side of health data. Data change over time. Data collection systems break down, input variables may change. It is necessary to continually evaluate the performance of a model using new data collected.

Good practice: create a test dataset with data collected over a future period of time relative to the dates of the training and validation datasets.

This article was written from these resources…
MIT 6.S897 Machine Learning for Healthcare, Spring 2020- Lesson 4 & 5 Risk Stratification