Risk stratification
Risk stratification allows healthcare providers to identify the right level of care and services for distinct subgroups of patients.
It involves assigning a risk status to patients for a particular condition and then using this information to guide care and optimize health care spending costs. Risk stratification is used to assess this type of risk:
- morbidity risks for premature infants.
- admission of patients to a coronary care unit.
- the likelihood of readmission to hospital.
Traditionally, risk stratification was estimated using scoring grids. These scoring grids, which were not widely used by caregivers, have been replaced by Machine Learning algorithms with numerous input variables.
Machine learning models are less difficult to develop and to be adopted by medical staff. They are also more accurate, as instead of relying on a few questions, they are trained with thousands of variables.
Type 2 diabetes
In 2019, 1 in 11 people worldwide, or 463 million people, had diabetes (3.3 million French people had diabetes).
90% of diabetics are type 2. Type 2 diabetes is a disease characterized by chronic hyperglycemia, high level of glucose (sugar) in the blood.
The disease usually appears after the age of 40 and is diagnosed at an average age close to 65. The incidence is highest between 75 and 79 years of age with 20% of men and 14% of women treated for this disease.
Type 2 diabetes is affecting more and more young people, including adolescents and even children. Nutritional imbalances and sedentary lifestyle are increasingly contributing to the “spread” of T2D.
Machine Learning to predict the risk of T2D
David Sontag, a researcher and professor at MIT, explains in his lecture how the Machine Learning has replaced the scoring grids in the analysis of risk factors for type 2 diabetes.
From the claims, pharmaceutical records and laboratory results of 4.1 million individuals between 2005 and 2009 : 42000 variables were selected that describe the complete health status and history of each individual.
Machine learning was then used to select predictor variables and train the model on these periods : 2009–2011, 2010–2012 and 2011–2013.
The model used is a logistic regression with L1 regularization. The L1 regularization performs a selection by assigning to the insignificant input variables of the model a zero weight and to the useful variables a non-zero weight.
After training the model 769 variables were selected as predictive. The model shows that sleep apnea, shortness of breath, esophageal reflux are risk factors for Type 2 diabetes.
Machine Learning makes it possible to generate hypotheses of risk factors for a disease. It allows to evaluate the risks within a population and to set up appropriate prevention policies.
Non-stationary side of health data
David Sontag also emphasizes the non-stationary side of health data. Data change over time. Data collection systems break down, input variables may change. It is necessary to continually evaluate the performance of a model using new data collected.
This article was written from these resources…
MIT 6.S897 Machine Learning for Healthcare, Spring 2020- Lesson 4 & 5 Risk Stratification