Machine learning can identify newly diagnosed patients with CLL at high risk of infection

Nat Commun. 2020 Jan 17;11(1):363. doi: 10.1038/s41467-019-14225-8.

Abstract

Infections have become the major cause of morbidity and mortality among patients with chronic lymphocytic leukemia (CLL) due to immune dysfunction and cytotoxic CLL treatment. Yet, predictive models for infection are missing. In this work, we develop the CLL Treatment-Infection Model (CLL-TIM) that identifies patients at risk of infection or CLL treatment within 2 years of diagnosis as validated on both internal and external cohorts. CLL-TIM is an ensemble algorithm composed of 28 machine learning algorithms based on data from 4,149 patients with CLL. The model is capable of dealing with heterogeneous data, including the high rates of missing data to be expected in the real-world setting, with a precision of 72% and a recall of 75%. To address concerns regarding the use of complex machine learning algorithms in the clinic, for each patient with CLL, CLL-TIM provides explainable predictions through uncertainty estimates and personalized risk factors.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Algorithms
  • Antineoplastic Agents / therapeutic use
  • Benchmarking
  • Cohort Studies
  • Databases, Factual
  • Female
  • Humans
  • Infections / diagnosis*
  • Infections / etiology
  • Kaplan-Meier Estimate
  • Leukemia, Lymphocytic, Chronic, B-Cell / complications*
  • Leukemia, Lymphocytic, Chronic, B-Cell / drug therapy
  • Machine Learning*
  • Male
  • Middle Aged
  • Risk Factors*

Substances

  • Antineoplastic Agents