Research Thesis
Title
Development of a GWAS analysis method for detection of epistasis in crops
Objectives
i) To develop an approach for epistasis detection in GWAS using learning techniques ii) To validate the developed approach for detection of epistatic interactions in crops iii) To develop a prediction tool using the proposed strategy
Abstract
Epistasis, or interaction between genetic variations at two or more loci within or between genes, plays a critical role in shaping the genetic architecture of complex traits. However, detecting epistatic interactions remains a significant challenge in genome-wide association studies (GWAS), particularly in crop species where high-dimensional genomic data and polygenic regulation complicate signal identification. Existing epistasis detection approaches mostly focused on one of the two basic constraints of epistasis detection: computational efficiency or interaction detection power. In the present study the emphasis shifted towards developing a balanced machine learning based two-stage framework for robust epistasis detection in crops that reduces the computational burden without compromising the detection power and assessing their utility in genomic prediction. In the first stage i.e. the shortlisting stage, comparative analysis of machine learning algorithms, namely, Adaboost, Artificial Neural Networks, Random Forest, Stepwise Regression, Ridge Regression, LASSO, and Elastic Net on simulated datasets revealed ridge regression as most effective for shortlisting marginal SNPs and random forest for non-marginal SNPs. Stage two (epistasis detection stage) subjected shortlisted SNPs to information-theoretic (Information Gain, Maximal Information Coefficient) and statistical (Chi-square) interaction tests, uncovering both marginal and non-marginal epistatic interactions. Benchmarking against existing tools (MACOED, BEAM, BOOST) showed superior power and accuracy of the proposed method. The developed approach was also biologically validated on Glycine max dataset comprising 2,662 accessions, with days to flowering as the trait of interest. The analysis identified 102 marginal and 107 non-marginal epistatic interactions, including both intra- and inter-chromosomal links, highlighting the polygenic and networked regulation of flowering time. Importantly, genomic prediction models showed that incorporating epistatic loci improved accuracy from 78.07% (maineffect loci only) to 87.36%, underscoring the importance of accounting for epistasis in trait prediction. For implementation and practical application, the framework was encapsulated into a user-friendly R package named EpiFusion, incorporating dedicated modules for shortlisting and interaction detection. Overall, the study highlights the value of combining machine learning and statistical modeling for epistasis detection in crops and by providing a flexible, efficient, and biologically relevant framework, the developed approach not only advances methodological capabilities in GWAS but also contributes to precision breeding by strengthening the predictive power of genomic selection strategies.
Publications (1)
Comparative analysis of machine learning models for shortlisting SNPsto facilitate detection of marginal epistasis in GWAS
Dasmandal, T., Sinha, D., Rai, A., Mishra, D. C., & Archak, S