October 23, 2017—Researchers at Cincinnati Children’s Hospital Medical Center are combining machine learning and systems biology to distinguish disease severity in idiopathic pulmonary fibrosis (IPF), a lung disease that kills an estimated 40,000 people annually in the United States.

Using computational approaches, the team classified patients into six subgroups based on their gene expression profiles and identified candidate genes that can potentially be used for early diagnosis or to develop novel, perhaps personalized treatments. Their findings were published October 20 in the journal BMC Pulmonary Medicine.

Idiopathic pulmonary fibrosis patients stratified by disease severity have different gene expression profiles (in BMC Pulmonary Medicine). Researchers hope to build on this finding to tailor therapy or perhaps find new therapies.

Physicians use a patient’s medical history, radiology reports and tissue examinations to diagnose IPF and determine the severity of the disease. However, factors underlying the progression of the disease—and how to best treat individual patients—remain unclear. Some of these clues could be hidden in our genes.

The research team, led by Anil Goud Jegga, DVM (Biomedical Informatics) in collaboration with the lab of Satish K. Madala, PhD (Pulmonary Medicine) is using data-driven clustering analysis to unearth these clues. Because the gene expression profiles of patients with IPF are just as diverse as the physical presentations of the disease, there is a lot to discover.

IPF is an irreversible and fatal disease in which tissue deep in the lungs becomes increasingly thick and stiff, or scarred, over time. Some patients experience a slow, steady loss of lung function over five or more years. Others progress rapidly and die within one year of diagnosis.

Photo of Anil Jegga
Anil Jegga is using machine learning and systems biology to tackle idiopathic pulmonary fibrosis

“It’s very heterogeneous,” says Jegga. “IPF patients exhibit anywhere from mild to severe reduction in lung function. We wanted to see if heterogeneous gene expression patterns are related with disease severity in IPF. By looking for patterns, we could clearly see the correlation between phenotype or clinical symptoms and gene expression profiles.”

“We used publicly available gene expression profiles in lung tissues from 131 IPF patients and 12 healthy individuals,” added Yunguan Wang, graduate student and first author on the study, “and then used computational approaches to separate the patients into six subgroups based on their gene expression profiles.” The identified IPF subgroups were found to be distinct in disease severity as reflected in their lung function measures.

Next, using the open-source ToppGene web app, the team pinpointed novel genes that could be most likely associated with IPF. “There are 12 or 13 genes reported to be genetically associated with IPF. Identifying additional candidate genes will enable discovery of biological pathways that could potentially be used for early diagnosis or targeted with novel treatments,” says Jegga. “We used the ‘known’ genes to ‘train’ our machine learning-based system, discovered new genes, and ranked the dysregulated genes in IPF subgroups based on their similarity to known genes associated with IPF—it’s guilt by association.”

Deciding disease subtypes solely on severity can sometimes be misleading, since different pathways can contribute to a very similar phenotype. Data-driven analysis such as clustering doesn’t begin from what we already know, but rather from the patterns represented in the data.

Identifying IPF subgroups can help facilitate more personalized treatments. Currently, there are two drugs—pirfenidone and nintedanib—approved by the FDA for the treatment of IPF. However, these drugs only slow disease progression and neither stop nor reverse lung fibrosis. Additionally, these drugs are not effective in all patients of IPF.

“Instead of giving all IPF patients the same drug, can we find drugs for different subgroups?” says Jegga. “As we analyze the data sets, we can potentially discover pharmacologically tractable molecular pathways correlated with disease severity, which in turn could enable tailoring of therapy or lead to new therapies for IPF.”

In a previous paper, Jegga’s and Madala’s teams found elevated activity of a particular protein, Hsp90 (Heat shock protein 90) in fibroblasts of IPF patients. Fibroblasts are cells in connective tissues that produce collagen and other fibers that have not yet progressed to form scar tissue. They also identified Hsp90 inhibitors as a potentially effective therapy to stop fibroblast activation in IPF.


Contact Information

Jill Williams, 513-803-0520, jill.williams@cchmc.org


Data-Driven Analyses Enable Characterization of Idiopathic Pulmonary Fibrosis