Shylin Joe S
M.Sc. Bioinformatics
1 yr 9 mo
Duration
Research Thesis
Title
Advanced Statistical Approach for Metagenomics Analysis Addressing Data Heterogeneity and Covariates
Objectives
i. To develop a statistical approach for core microbiome identification and differential abundance analysis considering data heterogeneity and covariates. ii. To develop a web application for the proposed approach
Abstract
Metagenomics is the direct genetic analysis of genomes contained within an environmental sample. Data heterogeneity and covariates are two main challenges in the statistical analysis of metagenomics data. The core microbiome is certain microbial taxa that are consistently present in a particular environment; it maintains plant health, ecosystem stability, and various biological functions. Differential abundance analysis aims to identify taxa whose abundances vary significantly across conditions. There are several tools/packages available for core microbiome identification and differential abundance analysis, each has its own limitation. This study addresses these gaps by introducing an innovative approach for core microbiome identification and differential abundance analysis by developing a user-friendly web tool. In this study, Arabidopsis thaliana core root microbiome data have been used as a demo dataset. The developed approach entails multiple phases involving filtering, normalization, exploratory analysis, diversity analysis, core microbiome identification, testing the significance of the identified core, differential abundance analysis, adjusting effects of covariates, and visualization of results. To mitigate data heterogeneity, five filtering methods (abundance, occurrence, abundance and occurrence, membership, and hard cut-off filter) and eleven normalization methods (TMM, TMMwsp, RLE, GMPR, TSS, CSS, CLR, SRS, upperquartile, rrarefy, and invlogit) are provided. By revealing condition-specific microbial patterns, the identification of the core microbiome by group improves biological understanding, functional significance, and targeted applications. The significance of the identified core can be tested using four statistical methods (F-test, Kruskal-Wallis test, Levene’s test, and Fligner-Killeen test) were implemented. Further, this tool supports exploratory analysis (boxplot, density plot, and MDS plot) and diversity analysis such as alpha diversity (richness, evenness, Shannon and Simpson indices) and beta diversity (Bray-Curtis, Jaccard, and Euclidean). For differential abundance analysis, various statistical tests such as exact test, quasi-likelihood ratio test and quasi-likelihood F test have been provided, along with the options for covariate adjustment and multiple testing correction. Finally, a web tool for Core Microbiome Identification and Differential abundance analysis (CoreMDA) has been developed which is freely accessible at https://dabin-iasri.shinyapps.io/CoreMDA/. This is an interactive tool which allows researchers to perform core microbiome identification and differential abundance analysis using customized workflows based on user-defined objectives by uploading the datasets.