Parinita Das
Ph.D. Bioinformatics
4
Publications
4 yr
Duration
Research Thesis
Title
A Study on Development of Artificial Intelligence-Based Methodology for Identification of Copy Number Variation in Crops
Objectives
1. To develop a machine learning/deep learning-based model for identification of copy number variations (CNVs) 2. To develop a web-based tool/ R-package using the proposed strategy 3. To implement the developed approach for development of CNV atlas
Abstract
Copy number variations (CNVs), encompassing deletions and duplications of DNA segments, are critical genomic features that influence gene expression, adaptation, and phenotypic variation. These structural variations play a pivotal role in genome evolution, trait expression, and environmental adaptability across plants. This research introduces MLDeCNV, a novel machine learning-based framework for the accurate detection and interpretation of copy number variations (CNVs) in genomic data, specifically targeting next-generation sequencing (NGS) data. CNVs, which involve alterations in the number of DNA copies, can significantly influence gene expression and contribute to phenotypic diversity. Traditional CNV detection methods often struggle with small CNVs or those in regions with low read-depth signals, leading to incomplete detection. To overcome these challenges, MLDeCNV integrates 32 features derived from NGS data and combines outputs from multiple CNV detection tools with experimental validation using PCR and aCGH. A key aspect of the framework is the application of the Smote-TomekLinks data-balancing technique, which enhances the model’s accuracy by addressing class imbalances commonly found in CNV prediction. MLDeCNV outperforms existing CNV detection tools like Delly, CNVnator, and Manta, demonstrating robust performance across different species, including rice, Arabidopsis, and pomegranate, with an impressive AUC of 0.96. The study also highlights the practical utility of MLDeCNV by developing a web-based tool that simplifies CNV detection for researchers by accepting standard genomic inputs and offering an intuitive interface for classifying CNVs into deletions, duplications, or no CNV. Additionally, the research presents a genome-wide analysis of CNVs in black pepper and bitter gourd, uncovering thousands of CNVs and mapping them to critical agronomic traits such as stress resilience and pathogen defense. This work contributes significantly to the field of agricultural genomics, showing how CNVs can be leveraged for crop improvement, marker-assisted breeding, and understanding species adaptation. The study’s findings underscore the value of integrating CNV data with genome-wide association studies (GWAS) to identify important loci linked to key traits, positioning MLDeCNV as a valuable resource for advancing genomic research in agriculture and evolutionary biology.
Publications (4)
A comprehensive review on genomic resources in medicinally and industrially important major spices for future breeding programs: status, utility and challenges.
Das, Parinita, Chandra, Tilak, Negi, Ankita, Jaiswal, Sarika, Iquebal, Mir Asif, Rai, Anil and Kumar, Dinesh
Comprehensive Analysis of Copy Number Variation in diverse Bitter Gourd accessions.
Das, Parinita, Jaiswal, Sarika, Iquebal, MA, Angadi, UB, Kumar, Dinesh.
Genome-wide identification of copy number variation in black pepper and development of its atlas.
Das, Parinita, Sheeja TE, Saha, Bibek, Fayad A, Chandra, Tilak, Angadi, UB, Shivakumar MS, Muhammed Azharudheen TP, Jaiswal, Sarika, Iquebal, Mir Asif, Kumar, Dinesh.
MLDeCNV: A machine-learning approach for accurate detection of copy number variants from whole genome sequencing
Das, Parinita, Saha, Bibek, Sharma, Nitesh Kumar, Iquebal, Mir Asif, Papanicolaou, Alexie, Angadi, U B, Kumar, Dinesh, Jaiswal, Sarika
Resources (2)
Testimonial
"I'm deeply grateful to ICAR-IASRI for shaping my professional and personal growth. Here. I've learned Bioinformatics from the very basic to the higher level, which truly helped me to secure my career."