Publications
Browse our research publications and academic works
Publications by Year
Publication Types
Development of Hapmap Database and Visualization Tool for Tea
Author: Dipankar Mandal
2022-23
Integrating GWAS Module with HtP-DAP for SNP-trait Associations Mining
Author: Surapuram Aswini
2022-23
Genome-wide association studies (GWAS) provide a crucial methodology for identifying genetic variants associated with traits in organisms. These studies are important for understanding the genetic basis of complex traits, which can aid in improving crop performance, human health, and livestock breeding. This thesis seamlessly integrates a GWAS analysis tool with the existing phenomics data analysis platform, HtP-DAP, aimed at enhancing and streamlining GWAS analysis workflows. The tool addresses key challenges in GWAS by offering robust preprocessing capabilities, including data filtering based on allelic frequency thresholds, imputation of missing genotypic data, and file conversion to ensure compatibility with various analysis pipelines. A major feature of the tool is its comprehensive set of relatedness analysis functions, which include kinship estimation, principal component analysis (PCA), and multi-dimensional scaling (MDS). These analyses provide critical insights into the underlying genetic architecture of populations, facilitating more accurate GWAS results. The GWAS analysis itself is highly flexible, supporting both single-locus models, which test individual markers for trait associations, and multi-locus models, which examine interactions between multiple markers. Result visualization is a key component of the tool, offering users the ability to generate clear and informative graphical outputs, such as Manhattan plots to highlight significant associations, circular Manhattan plots for a more compact genome-wide view, and Q-Q plots to assess the quality of the GWAS results and also provide a platform for presenting results in a meaningful way for publication or further research. The tool’s backend leverages the power of the GAPIT R package, known for its efficiency and scalability in handling large genomic datasets. GAPIT enables the seamless execution of GWAS analyses by managing the computational load, thus ensuring that the tool performs optimally even with large-scale datasets. By incorporating this GWAS tool within the HtP-DAP platform, this study bridges the gap between phenotypic data from high-throughput phenotyping and genotypic data from modern genomic studies. The integration facilitates a holistic approach to genetic research, allowing users to move from data collection to meaningful biological insights within a single platform.
Development of Computational Tool for Mining Intron Length Polymorphism Markers and Designing Primers
Author: Soumya Shivamurti
2022-23
Standardizing Workflow for Identifying Stress-Tolerance Contributing Non-Coding Rnas in Vigna and Developing a Comprehensive NCRNA Database For Legumes
Author: Ashok S
2022-23
A Study on Machine Learning Based Approach for long non-coding RNA Subcellular Localization Prediction
Author: Baibhav Kumar
2019-20
Web Tool for Crispr/Cas9 off target prediction in Plants
Author: Abhishek Anand
2021-22
Discovery of Molecular Markers and Development of Database for Rice Bean
Author: Ravi
2021-22
Novel and efficient Pipeline for Metagenomics Binning
Author: Subham Ghosh
2021-22
Metagenomics delves into the examination of microorganisms, and a pivotal aspect of this field involves piecing together the genetic makeup of distinct organisms. This task proves challenging due to the complexities of isolating and cloning certain organisms under in-vitro conditions. Metagenomics is alternatively termed environmental genomics, eco-genomics, or community genomics. To reconstruct the fragmented sequences obtained from shotgun sequencing, the process heavily relies on genome assembly. However, a significant hurdle arises when attempting to segregate and reassemble genomes from various organisms. The abundance of these genomes and the intermingling of genomics reads present a formidable challenge. Shotgun sequencing produces genomic reads that contain fragments originating from diverse microorganisms' genomes. To facilitate reconstruction, it becomes imperative to classify these reads into separate bins corresponding to distinct microorganisms. For this purpose, various clustering techniques have emerged for the categorization of these intertwined genomes. These techniques encompass binning, boosting, bagging, and stacking. Among these, binning has gained prominence as the most extensively utilized algorithm in contemporary times. To put it differently, genomes are categorized into operational taxonomic units (OTUs) to facilitate subsequent taxonomic profiling and subsequent functional analysis. This process of OTU clustering is commonly referred to as binning. In this clustering process, binning employs a variety of clustering methods such as k-means, k-medoids, Hidden Markov Model (HMM), and hierarchical clustering. However, each of these clustering approaches comes with its own limitations and drawbacks. There is a no research on motif-based binning in the existing ones. Here an approach is given for metagenomic binning by constructing frequency table of motif or segments by using local alignment using gap and during local alignment, the segments should not be overlapped. K means clustering, PAM clustering and DBSCAN clustering are applied to cluster the contigs based on the segments and the motifs. But K-means clustering has performed the best. The rand indexes for this approach are tend to 1. So, this approach is good for metagenomics binning. And it is also performing better than the existing binning tools, i.e., MaxBin and MetaBat. And this approach has a lot of scope. In the place of simple kmeans clustering, many advanced clustering can be used for better performance. GC content, tetra-nucleotide frequency can be added for getting better performance. This approach also highlights the mutation concepts and conserved regions, which are too much necessary to get the idea of evolutionary biology.
A Semi-Supervised Approach For Binning Of Metagenomics Data
Author: Deeksha P M
2021-22
Computational Intelligence in the Discovery of Natural Products from agriculturally important Metagenomics Data
Author: Sharanbasappa
2020-21
Microorganisms are diverse, invisible, and ecologically important organisms that encourage biosphere activities, additionally providing constraints in the form of plant diseases with agricultural implications. This work navigates the novel landscape of metagenomics, a world that rejects traditional limitations through the application of high-throughput sequencing methods. Agriculturally significant metagenomics, characterized by direct DNA sequencing from soil, plants, and cattle, reveals previously hidden microbial diversity and genetic components. Despite this, the reconstruction of individual genomes from the complex mixture of DNA sequences remains a challenging task. The process of binning, which groups sequences from diverse microorganisms, lays the crucial foundation for the identification of Natural Products (NPs) by clustering genomes or taxonomically related groups. This preliminary step is pivotal for extracting valuable insights from the wealth of metagenomic data. Natural Products (NPs), organic compounds synthesized by living organisms, encompass bacteria, fungi, plants, and marine life. These NPs wield a vast range of applications, from medicine to agriculture. NPs often emerge from biosynthetic gene clusters (BGCs) within microbial genomes. Identifying these NPs and their associated BGCs stands as a paramount task in metagenomics, offering the prospect of discovering novel compounds with potential agricultural applications. Computational intelligence techniques facilitate the efficient analysis of metagenomics data and the prediction of NPs and have emerged as indispensable tools in this endeavor. This comprehensive study embarks on a transformative journey, introducing innovative approaches for binning metagenomics data, identifying NPs, and applying these methodologies to agriculturally significant metagenomics datasets. Introduced two novel binning strategies, Deep Embedded Clustering (DEC) and Variational Autoencoders (VAE), outperformed the existing unsupervised methods and were on par with semi-supervised techniques, with DEC excelling in cluster quality and VAE demonstrating a high silhouette index. Then, the NP identification from bins of metagenomics data, this research presents a comprehensive approach to effective BGC identification. The study focuses on five classes of Natural Products (NPs) classes: Polyketide synthase (PKS), Non-Ribosomal Polyketide Synthase (NRPS), Ribosomally synthesized and post-translationally modified Peptides (RiPP), Terpenes, and Hybrid PKS-NRPS. Data was gathered from the MiBIG database in GBK format. Protein sequences from each file were extracted, and sequences under the same BGC ID were combined. Physicochemical properties were calculated, and sequence embeddings were generated using NLP techniques like CountVec, TFIDF, and Word2Vec specific to each NP class. An integrated feature matrix was created by merging physicochemical properties and generated embeddings. Then this matrix was used for training and testing nine ML models including Logistic Regression (LR), Naïve Bayes (NB), Decision Tree (DT), Random Forests (RF), K-Nearest Neighbors (KNN), Extreme Gradient Boosting (XGBoost), Support Vector Machines (SVM), Artificial Neural Networks (ANN), and Categorical Boosting. The study explored data balancing techniques, with SMOTE and without SMOTE, and employed Grid Search for parameter optimization. This led to six datasets and 54 models. The LR model, using TFIDF with SMOTE, emerged as the most effective, achieving an accuracy of 0.96, AUC of 0.9912, and other strong metrics. With the proposed approach, we developed an AI-based tool called NaturePred (http://login1.cabgrid.res.in:5101/), for NP class prediction and protein physicochemical property calculation. Applied to a genuine Agriculturally Important Metagenomics dataset which is collected from Mustard soil Rhizosphere in the Mau district of UP, the study reveals a rich presence of more than 40% Ribosomally synthesized and post-translationally modified Peptides (RiPPs), signalling robust plant-microbiome interactions and soil health. By combining innovative binning strategies, advanced NLP techniques, and machine learning, this study lays a robust foundation for future advancements in agriculture and microbial research. The integration of AI tools, exemplified by NaturePred, promises to unlock untapped agricultural potential. This work propels microbial research into uncharted territories, unlocking hidden treasures within microbial genomes. The journey into the microbial universe continues with heightened excitement, driven by the insights and innovations arising from this transformative study.
Development of an approach for identification of core microbiome
Author: Sorna A M
2021-22
Development Of Tool For Selective Sweep Analysis Using Artificial Intelligence
Author: Abhik Sarkar
2021-22