Shylin Joe S

M.Sc. Bioinformatics

22006

Dr. Sudhir Shrivastava

1 yr 9 mo

Duration

Research Thesis

Title

Advanced Statistical Approach for Metagenomics Analysis Addressing Data Heterogeneity and Covariates

Objectives

i. To develop a statistical approach for core microbiome identification and differential abundance analysis considering data heterogeneity and covariates. ii. To develop a web application for the proposed approach

Abstract

Metagenomics is the direct genetic analysis of genomes contained within an environmental sample. Data heterogeneity and covariates are two main challenges in the statistical analysis of metagenomics data. The core microbiome is certain microbial taxa that are consistently present in a particular environment; it maintains plant health, ecosystem stability, and various biological functions. Differential abundance analysis aims to identify taxa whose abundances vary significantly across conditions. There are several tools/packages available for core microbiome identification and differential abundance analysis, each has its own limitation. This study addresses these gaps by introducing an innovative approach for core microbiome identification and differential abundance analysis by developing a user-friendly web tool. In this study, Arabidopsis thaliana core root microbiome data have been used as a demo dataset. The developed approach entails multiple phases involving filtering, normalization, exploratory analysis, diversity analysis, core microbiome identification, testing the significance of the identified core, differential abundance analysis, adjusting effects of covariates, and visualization of results. To mitigate data heterogeneity, five filtering methods (abundance, occurrence, abundance and occurrence, membership, and hard cut-off filter) and eleven normalization methods (TMM, TMMwsp, RLE, GMPR, TSS, CSS, CLR, SRS, upperquartile, rrarefy, and invlogit) are provided. By revealing condition-specific microbial patterns, the identification of the core microbiome by group improves biological understanding, functional significance, and targeted applications. The significance of the identified core can be tested using four statistical methods (F-test, Kruskal-Wallis test, Levene’s test, and Fligner-Killeen test) were implemented. Further, this tool supports exploratory analysis (boxplot, density plot, and MDS plot) and diversity analysis such as alpha diversity (richness, evenness, Shannon and Simpson indices) and beta diversity (Bray-Curtis, Jaccard, and Euclidean). For differential abundance analysis, various statistical tests such as exact test, quasi-likelihood ratio test and quasi-likelihood F test have been provided, along with the options for covariate adjustment and multiple testing correction. Finally, a web tool for Core Microbiome Identification and Differential abundance analysis (CoreMDA) has been developed which is freely accessible at https://dabin-iasri.shinyapps.io/CoreMDA/. This is an interactive tool which allows researchers to perform core microbiome identification and differential abundance analysis using customized workflows based on user-defined objectives by uploading the datasets.

Resources (1)

CoreMDA: Core Microbiome Identification and Differential abundance analysis

Tool 2025

Open Resource →

Academic Details

Program

MSc

Roll Number

22006

Batch Year

2023-24

Fellowship

Institute

Admission

Nov 2023

Completion

Aug 2025