MIMIC-III: clinical data available to researchers
The MIMIC-III database is available free of charge: the dataset consists of approximately 47000 unique patients with more than 650000 diagnoses. Each patient and diagnosis is composed of a rich list of attributes including patient demographics, medication lists, diagnostic history and other potentially predictive medical characteristics in patients.
MIMIC-III integrates comprehensive, “de-identified” clinical data from patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts, and makes it widely available to researchers worldwide under a data use agreement.
Open source access to data allows clinical studies to be replicated and improved in ways that would not otherwise be possible. Publications on arXiv show the intensive use of MIMIC-II and MIMIC-III by researchers.
Model creation from MIMIC-III Clinical Database
Using MIMIC-III Clinical Database, researchers can develop a regression logistic model to assess the impact of demographics on hospital mortality.
Other algorithms can be trained with MIMIC-III: decision trees, random forests, SVM and neural networks as in the example available on Kaggle.
Diversity and typology of clinical data
All providers and health acts are likely to generate data used to train machine learning models :
-Demographic data.
- Data on vital signs.
- Data related to physician prescriptions.
- Data generated by laboratory tests.
- Microbiology data.
- All data extracted from written notes: hospital discharge, emergency room notes, radiologists’ notes, laboratory reports.
- Data extracted from invoices from health care providers (dentists, life aids, etc…).
- Data extracted from administration notes: transfers, services.
- Radiological images.
- Data generated by cell phones or connected watches (data from the quantified self) …..
There are two types of clinical data:
- Structured data : coherent organization; a table with rows and columns (MIMIC-III).
- Unstructured data : used with Deep Learning algorithms (CNN, RNN).
- Clinical Text: Quite different from ordinary written language (acronyms).
- Images following MRI, scans, ultrasounds in 3D.
- Signals: Measurements from a sensor, usually at regular time intervals.
This article was written from these resources…