Publications
Browse our research publications and academic works
Publications by Year
Publication Types
Development of web based tool to identify polymorphic microsattelite markers for RAD seq data
Author: Madhusudhan CM
2020-21
CRISPR/Cas9 off targets prediction in Plants using Deep Learning
Author: Chandini B C
2020-21
Identification and Characterization of Lnc RNA in Ricebean (Vigna umbellata)
Author: Bibek Saha
2019-20
Ricebean, Vigna umbellata is a Kharif-season annual legume. Its seeds are consumed as pulse. It is considered as a minor legume as it is grown in limited areas as an intercrop with maize and sorghum. It is mostly grown in Northern part of India (Mainly Uttarakhand) and North-eastern part of India (Mainly Assam). Its seed contains a good amount of protein and other nutrients. These protein-coding RNA of developing stages of seed largely regulated by non-coding RNA specifically long non-coding RNA. Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNA molecules with a nucleotide length of more than 200 bp and ORF<100 bp that do not encode proteins. It is one of the types of Regulatory non-coding RNA. LncRNAs are important regulators of gene expression by DNA methylation and chromatin remodeling, and in some cases, they act as miRNA (Micro RNA) sponges to enhance the expression of mRNA targeted by miRNA (Tay et al., 2014). LncRNAs are thought to have a wide range of functions in cellular and developmental processes. LncRNA may be positioned beside protein- coding genes or in between genes even it overlaps with coding genes. There has been hardly any work reported for the identification of lncRNA with respect to the Ricebean crop. This study aims to identify lncRNA and annotate its targets for the developing stages of Ricebean seed. A total of 906 novel lncRNAs have been identified. Out of these 906 novel lncRNAs, 82 lncRNAs have targets of 15 miRNAs. It was observed that different lncRNA could have similar miRNA targets. These 15 microRNA had targets of 15 mRNA. Lastly, annotation of 15 mRNA was carried out and it was found that these mRNA regulated different biological, cellular, metabolic processes of the developmental stages of Rice bean seed. ‘RbLncDB’, a web resource has also been developed under the present study to help future researchers in regard to Ricebean seed transcriptome. Keywords: Vigna umbellata; long non-coding RNAs; micro RNA; Reference assembly;lncRNA targeted miRNA.
Phylogenetic Marker Genes Based Approach for Binning of Metagenomics Data
Author: Asif Ali V K
2019-20
The study of microbes was traditionally focused on single species in pure culture, which made the interpretation of these complex communities very difficult. The science of ‘Metagenomics’ enables us to investigate microbes in their natural environments, the complex communities in which they normally live. Metagenomic sequence binning is one of the important steps of metagenomic data analysis so as to produce meaningful 'bins' or groups. There are several techniques for grouping, among which binning is most widely used. Binning indicates to the process of classification of DNA sequences into clusters that might be the true representative of an individual genome or genomes from taxonomically related microorganisms. Binning uses any of the several clustering techniques available such as K-Means, DBSCAN, spectral clustering, hierarchical clustering, etc. But each of these clustering techniques has its own drawbacks. In the past, only few efforts have been seen on the use of single-copy phylogenetic marker genes for the clustering of metagenomic data. The phylogenetic marker genes are protein encoding genes that are universal, single-copy marker genes and are rarely subjected to horizontal gene transfer (HGT). They had been used to accurately and consistently delineate prokaryotic species. Here in this research a semi-supervised clustering approach is adopted to cluster the metagenomic data using marker genes. Initially, contigs harbouring marker genes are identified by running the Prodigal, FetchMG and USEARCH applications sequentially. Then the K-Means clustering technique is applied on the metagenomic data which has been already reduced to two dimensions using BH-TSNE algorithm. In the end, correction of the generated clusters was carried out based on the sequences harbouring marker genes with the help of spectral clustering. K-Means clustering itself generated 8 clusters with a rand index of 0.973, a F1 score of 0.71 and an overall accuracy of 0.9 for a 10s genome dataset using tetranucleotide frequency as initial input feature matrix. While cluster correction resulted in the generation of 10 clusters with a rand index of 0.981, a F1 score of 0.91 and an overall accuracy of 0.95 for the same dataset. In a nutshell, the cluster correction using sequences harbouring marker genes produced better clustering results.
Deep Learning for Predicting Breeding Value using High Throughput Genotyping and Phenotyping
Author: Lal Dhari Patel
2019-20
Accurate estimation of the breeding value in a crop breeding program is of key importance. Traditionally, statistical methods have been widely utilized for predicting breeding values using genotypic effects. These statistical methods usually assume that genotypic effects are independently distributed and follows a prior distribution such as Gaussian etc. These statistical assumptions may play limiting role in predicting the breeding values using high throughput genotyping data, which has very precise information of genotypes. At the same time, harnessing the potential of this precise information of genotyping equally precise phenotyping is also warranted. Precise phenotyping is laborious, expensive and sometime impossible in case of conventional phenotyping. Therefore to overcome these limitations, the present work proposes the use of deep learning in prediction of breeding value by exploiting the full potential of high-throughput genotyping in conjecture with high throughput phenotyping. Hence, deep learning-based CNN Model has been trained for the prediction of breeding Value using High Throughput Genotyping and Phenotyping data of wheat dataset, which consist of 184 RILs and each RILs contains 3121 filtered SNPs. Altogether, data of six traits were taken, under two environments (controlled and drought condition), for the prediction of breeding value. First, the whole dataset was randomly divided into two parts, one is training dataset and other is testing dataset. The CNN models were trained on training dataset, which contains 80% of total dataset and remaining 20% of the total data was used for testing. Two parameters were used for testing and evaluation of the deep learning model training. The trained and tested deep learning model was compared with the existing statistical models i.e., GBLUP (Genomic best linear unbiased prediction), rrBLUP (ridge regression best linear unbiased prediction) and Bayesian LASSO (Bayesian Least Absolute Selection and Shrinkage Operator). The result shows that deep learning model performs better as compare to statistical methods undertaken.
Identification and Characterization of bZIP and Dof gene families from developing seeds of Vignaumbellata
Author: Shivdarshan S Jirli
2019-20
Prediction of enzymes involved in bioremediation using aquatic Metagenomes
Author: Chandana V
2019-20
Development of a deep learning based methodology for functional protein classification
Author: Bulbul Ahmed
2016-17
Cereals are staple crops widely cultivated across the world. These are highly nutritious, rich in vitamins, minerals, carbohydrates, fats, oils, proteins and fibers but are low in essential amino acids such as lysine. Cereal crops belong to poaceae family, having wider applications in production of flour, bread, rice, cakes, corn etc. The other by-products of these crops are beverages and wine. Moreover, consumption of these crops reduces the coronary heart disease, diabetes, colon cancer, diverticular disease etc. India is the third largest cereal producer after China and USA but it has been producing to a great extent which could be achieved to 4.9% increase in production from base year 2020 to 2027. The production of these crops is highly affected by biotic and abiotic stresses which adversely affected crop growth and development, further resulting in crop loss that leads to economic loss. Hence, it is required to understand and study the genes involved in order to minimize the biotic and abiotic stresses. The genes start adapting under stress factors and produce proteins that can tolerate such changes by changing signalling pathways in protein-protein interaction. Finding these proteins are highly expensive, time consuming and required a highly experienced person. In order to reduce cost and time, rapid classification and prediction of such proteins using computation approaches is required. Further, these proteins are complex in nature with high dimensions which are very difficult to study using conventional approaches. This study was oriented towards the application of different machine learning techniques (namely, support vector machine and random forest) and deep learning (long short-term memory) for development of classification models for abiotic stresses (heat, cold, salinity and drought) protein sequences from poaceae family. Also, an activation function, Gaussian Error Linear Unit with Sigmoid function (SiELU) has been developed for deploying in a deep learning model which shows an increased efficiency of the model. Lastly, a web-based tool for prediction of stress associated proteins from poaceae family has been developed implementing the proposed long short-term memory deep learning methodology with developed activation function i.e., SiELU and tuning of other hyper-parameters.
Development of Big Data Analytics Based Methods for Genome Assembly and Annotation.
Author: Amit Kairi
2014-15
The study on “Development of Big Data Analytics Based Methods for Genome Assembly and Annotation” was carried out in the Centre for Agricultural Bioinformatics (CABin), ICAR-Indian Agricultural Statistics Research Institute (IASRI), New Delhi during the year 2014-2020.In the present study, genome assembly and annotation procedures have been critically reviewed to develop new approaches that may reduce the time complexity along with increase in quality output. Big Data analytics-based techniques have been used in this study to devise new approaches and compare them with the existing algorithms so as to judge the quality of the outcome in terms of genome assembly and annotation. In this chapter, a brief introduction to sequencing techniques, genome assembly procedures with their merits and demerits, annotation, and Big Data has been made along with the motivation and objectives of the study.
Development of Robust Methods for Genomic Selection
Author: Neeraj Budhlakoti
2014-15
Development of Integrated Index for Genomic Selection
Author: Md Asif Khan
2015-16
Study on Differential expression of Coding and Non-Coding RNAs and Post-Translational Modifications in wheat rust resistance
Author: Parinita Das
2018-19
Bread wheat (2n=42), belonging to the family Poaceae is one of the most important food crops on the global scenario and known as “the King of Cereals”.It is extensively cultivated for its seeds as a staple food for human as well as for livestock feed across the world. Due to global warming and climate change, the most emerging problems are biotic and abiotic stresses.Stripe rust of wheat, caused by Puccinia striiformis f. sp. tritici (Pst) is one of the largest biotic stress factor among them, limiting wheat production worldwide. Despite the efficiency of fungicide treatments, genetic resistance is considered to be the most economical and environmentally friendly way to control the disease. The present study is based on the paired-end reads of control and Pst treated leaf samples of two near isogenic lines of wheat generated using HiSEQ 4000. The study aims at identification of differentially genes in response to wheat stripe rust between the susceptible line PBW343 and resistant line FLW29 and their transcriptional profiling, identification and characterizations of lncRNAs in stress conditions, transcription factors, pathways etc and lastly the prediction of PTM sites in the translated DEGs. A total of 164095 transcripts, 409 differential expressed genes,1503 transcriptional factors, and several protein domains and families were identified from reference based assembly. Myeloblastosis related proteins (MYB), WRKY domain, Basic helix-loop-helix (bHLHs) found in this study are reported to be associated with plant tolerance against biotic stress. A total of of 6807 putative lncRNAs have been identified under three different time points i.e. 12hpi, 48hpi and 72hpi out of which only 13 are differentially expressed. These are related to the pathogenesis-related protein 1 and disease resistance protein RGA2.These findings should facilitate the development of effective strategies for the breeding of resistant wheat varieties to obtain a better control of stripe rust. Lastly, the study on PTM has been carried out on the protein coded DEGs in sequence level and palmitoylation sites has been found more incase of downregulated genes compared to the upregulated ones indicating palmitoylation may play some important role in disease resistance mechanism.