|Computational Bioinformatics and Machine Learning Models to Identify the Diseasome and Neurological Disease Comorbidities|
|Md Habibur Rahman|
|Place of Conferral||中国科学院自动化研究所|
|Keyword||疾病共病鉴定 生存分析 生物信息学 机器学习 2 型糖尿病 神经系统疾病 胶质母细胞瘤 通路 基因本体 蛋白质|
In a patient suffering a disease, comorbidity is a second (or further) disease co-occurring in the same patient at the same time. The existence of comorbidity can complicate or cause the failure of a standard treatment given for the primary disease. Thus, compared with an individual of one single disease, individuals having disease comorbidities can (depending on the diseases involved) have a higher risk of severe illness or mortality. The study of disease comorbidity interactions by using multi-omics, disease-gene association (i.e., diseasome) and molecular data has improved our present knowledge of pathogenic mechanisms for many diseases and led to significant advances in diagnosis, prognosis, and treatment. However, as the global burden of diseases has increased, disease comorbidity has increasingly become a major clinical and biomedical problem. Identification and characterization of comorbidity interactions are important not only for understanding complex pathophysiologies, but also for the design rational and creative pharmacotherapeutic developments, and for patient self-management, health care utilization and treatment strategy.
Due to shared risk factors (including genetic, molecular, environmental, and lifestyle-based factors) certain comorbidities, including cancers, are more likely to occur in the same patient. As the etiology of non-infectious diseases are always complex and their risk factors tend to overlap, their biological basis and underlying molecular mechanisms that underlie this comorbidity are still poorly understood. This complexity not only makes molecular mechanisms of individual diseases elusive and difficult to study but makes comorbidity interaction even more challenging. Besides this, compared to traditional studies, most of the comorbidity studies have concentrated on the role of a single clinical or molecular or phenotype data to identify how disease comorbidities interact. In this study, we have designed and developed a bioinformatics and machine learning approach that can identify important mediators of comorbidity interactions by utilising genetic, multi-omics and molecular-level data. Our research focuses on network-based and machine learning based bioinformatics models development to identify disease comorbidities. We have applied our developed models in two different projects. One is the identification of type 2 diabetes (T2D) and neurological diseases (NDs) comorbidity interaction, the other is the identification of comorbidity interactions between central nervous system (CNS) disorders, also known as NDs and glioblastoma, a type of central nervous system cancer, and how this may affect, the survival of the cancer patients involved.
We first proposed a high-throughput network-based quantitative bioinformatics pipeline using agnostic approaches to identify molecular biomarkers for type 2 diabetes that are linked to the progression of neurological diseases. We exploited gene expression transcriptomic datasets from control and disease-affected tissues of T2D and ND patients for comparisons. We employed a linear model for microarray data (LIMMA) to these datasets and identified differentially expressed genes (DEGs) by comparing affected and control individuals. 197 DEGs were common to both the T2D and the ND datasets of which 99 were up-regulated and 98 were down-regulated in affected individuals. These overlapping DEGs (i.e., those seen in both T2D and ND datasets) revealed the involvement of significant cell signaling associated molecular pathways. These were then used to extract the most significant gene ontology (GO) terms. The critical or ‘hub’ proteins in the identified pathways were identified using protein-protein interaction analysis; many hub proteins have not previously been described as playing a role in these diseases. To reveal some of the transcriptional and post-transcriptional regulators of the DEGs, we used DEG-transcription factor (TF) interactions analysis and DEG-microRNAs (miRNAs) interaction analysis, respectively. We performed validation of these results with gold benchmark databases and literature searching, which clarified which genes and pathways had been previously been linked to NDs or T2D and which are novel. Thus, our transcriptomic data analysis has identified novel potential links between NDs and T2D pathologies that may underlie comorbidity interactions, links that may include potential targets for therapeutic intervention.
In this network-based bioinformatics approach, we identified only novel biological processes involved in disease comorbidity and thus their semantic similarity was not determined. The semantic similarity measuring approach computes the similarity of the gene ontology and gene products to assess the proximity in terms of disease concepts. Thus, in further computation-based analyses, we determined the semantic similarity between T2D and ND comorbidity. For this, we designed a bioinformatics pipeline to analyse, utilize and combine gene expression, GO and molecular pathway data by incorporating Gene Set Enrichment Analysis and Semantic Similarity. To reduce bias, we used several publicly available datasets for T2D and NDs from different sources and cell types to maximize the power of this approach. We also computed the proximity between T2D and neurological pathologies using genes and GO term semantic similarity that enhances the identification and characterization of comorbidity interactions beyond simply identifying novel biological processes involved in each disease. We performed the validation of the results with gold benchmark databases and literature searches.
Finally, we developed machine learning models and moved on to the identification of cancer comorbidity with NDs, and survival prediction in cancer patients using bioinformatics and machine learning approaches. Glioblastoma is a common malignant brain tumor with a high mortality rate which often presents as a comorbidity with NDs. We employed a quantitative analytical bioinformatics framework to unravel shared genes and cell signaling pathways that can link the NDs and glioblastoma. We acquired datasets from the National Center for Biotechnology Information (NCBI) and The Cancer Genome Atlas (TCGA) datasets from studies comparing normal tissue with diseases/glioblastoma tissue. After identifying differentially expressed genes (DEGs) employing our framework, the disease-gene association network, signaling pathway, GO enrichment analysis, as well as the protein-protein interaction (PPI) networks were performed to predict the function of these DEGs. We expanded our study to evaluate which clinical factors and genes play significant roles in determining survival time in GBM patients using a Cox proportional hazards (Cox PH) model and product-limit (PL) estimator through both univariate and multivariate analysis. In this study, 177 DEGs (129 with upregulated expression and 48 downregulated) were identified. Among these, 54 genes were associated with an effect on patient survival. Diseasome networks, molecular pathways, ontological pathways, protein-protein interaction (PPI) networks, and survival analysis of the significant genes all indicate ways that NDs may influence the progression of glioblastoma, growth or establishment. The shared DEGs identified here may also function as biomarkers for glioblastoma prognosis and potential targets for therapies. We have also validated all of our identified signature genes and pathways through the use of gold benchmark databases dbGaP, OMIM, OMIM Expanded and literature reviews. These provide further proof to support the involvement of our identified genes in pathological processes underlying the glioblastoma progression. This work has the potential to develop new diagnostic approaches and lead to the design of new treatments.
|Table of Contents|
|Md Habibur Rahman. Computational Bioinformatics and Machine Learning Models to Identify the Diseasome and Neurological Disease Comorbidities[D]. 中国科学院自动化研究所. 中国科学院大学,2020.|
|Files in This Item:|
|Md Habibur Rahman.pd（12630KB）||学位论文||开放获取||CC BY-NC-SA||Application Full Text|
|Recommend this item|
|Export to Endnote|
|Similar articles in Google Scholar|
|[Md Habibur Rahman]'s Articles|
|Similar articles in Baidu academic|
|[Md Habibur Rahman]'s Articles|
|Similar articles in Bing Scholar|
|[Md Habibur Rahman]'s Articles|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.