Publications
Browse our research publications and academic works
Publications by Year
Publication Types
Identification of drought Responsive long non-coding RNAS in Pearl millet
Author: Baibhav Kumar
2017-18
Pearl millet, scientifically named Pennisetum glaucum L. (2n=14) is an annual C4 grass of family poaceae and sub-family panicoideae. It is also known as bajra, cattail millet, bulrush millet and dark millet. Pearl millet is the world’s 5th most important cereal crop and mostly grown by the poor and marginal farmers in arid and semi-arid tropics of Asia and Africa, due to its less intensive agronomic practices like less fertilizers and limited irrigation input. Pearl millet is cultivated mainly for its grains but also used as feed material for cattle, poultry, fish etc. Pearl millet crop is able to perform well in drought conditions where most of the cereals like wheat, rice and maize fails. So, understanding the molecular mechanism of the responses of pearl millet to adverse conditions is important. Key candidate genes controlling drought response in Indian pearl millet have been discovered and reported but lncRNA still remains uncovered. Expression of candidate genes are controlled by microRNA, TFs and lncRNA. The study aims at identification of drought responsive lncRNAs and development of a web genomic resource for user friendly access of the investigation findings, which is otherwise lacking in this crop. A total of 879 lncRNAs were identified out of which 209 (leaf control, leaf treated), 198 (leaf treated, root treated), 115 (leaf control, leaf treated) and 194 (root control, root treated) were differentially expressed. Two lncRNAs were found as potential target mimics of 3 miRNAs from the miRBase database. Gene ontology study was carried out which revealed that drought responsive lncRNAs are involved in biological processes like ‘metabolic process’ and ‘cellular process’, molecular functions like ‘binding’ and ‘catalytic activities’ and cellular components like ‘cell’, ‘cell part’ and ‘membrane part’. LncRNA-miRNAmRNA network play a vital role in stress responsive mechanisms through their activities in hormone signal transduction, response to stress, response to auxin and transcription factor activity. Only four lncRNAs were found to get a match with the lncRNAs present in plant lncRNA database CANTATAdb, which shows its poor conserved nature among species. All these information have been catalogued in pearl millet drought responsive long non-coding RNA database (PMDlncRDB) accessible freely at https://webtom.cabgrid.res.in/pmdlncrdb. The information from PMDlncRDB can be used for pearl millet improvement program in endeavour of higher production, combating drought in pearl millet.
Identification and Characterization of circular RNAS in Legumes
Author: Tanwy Dasmandal
2017-18
Leguminous crops are important crops next to cereals. The production and productivity of these crops has a major impact on country’s economy as well as the economic status of the stakeholders farmers. Moreover, the nutritional security can be ensured through growing of these leguminous crops. In this context, two leguminous crops, chickpea and soybean along with their transcriptome data were considered in the present study. With the advent of NGS technologies it has become feasible to unravel the underlying complex mechanisms at genome level. Even though the coding genes play a major role in various abiotic and biotic stress mechanisms, there are several noncoding RNAs which regulate the genes responsible for stress tolerance mechanisms. However, the identification and characterization of circular RNAs, a type of non-coding RNAs as well as the circRNA-miRNA-mRNA networks in legumes has not been fully explored. Hence, the present study on ‘Identification and characterization of circular RNAs in legumes’ has been taken up with the objectives: (i) to identify and characterize circRNAs responsible for biotic and abiotic stress tolerance in the leguminous crops and (ii) to study the relationship between circRNA-miRNAmRNA. Here the transcriptome data of the said crops were collected from public domain (NCBI, Ensemble, etc) and the algorithm given in CIRI (CircRNA Identifier) was used to identify the circRNAs under drought (abiotic) and wilt (biotic) stress conditions. The characterization of circRNAs was done through their differential expression in both drought and wilt stress conditions. The differentially expressed (DE) circRNAs were further probed for their role in circRNA-miRNAmRNA interaction. Finally, the identified genes from the network were studied for their functionality in stress tolerance mechanisms. The results revealed identification of 200 and 285 circRNAs under control and drought stress conditions in chickpea, 57 and 66 circRNAs under control and drought stress conditions in soybean and 48 and 75 circRNAs under control and wilt stress conditions in soybean. The number of DEcircRNAs were 44, 23 and 24 in chickpea-drought, soybean-drought and soybean-wilt respectively. These DEcircRNAs were found to act as sponges for 40 (chickpea-drought), 17 (soybean-drought) and 10 (soybean-wilt) miRNAs. Besides, these miRNAs were found to target 145 (chickpea-drought), 281 (soybean-drought) and 275 (soybeanwilt) mRNAs. GO study was carried out for the mRNAs and found that they are involved in biological processes like ‘metabolic process’ and ‘cellular process’, molecular functions like ‘binding’ and ‘catalytic activities’ and cellular components like ‘cell’, ‘cell part’ and ‘membrane part’. Thus circRNA-miRNA-mRNA network play a vital role in stress responsive mechanisms through their activities in hormone signal transduction, response to stress, response to auxin and transcription factor activity.
A Deep Clustering based binning approach for Metagenomic data
Author: Sharanbasappa
2018-19
Development of web based tool for computation of genomic signatures
Author: Mailarlinga
2018-19
Prediction and Analysis of abiotic Stress Responsive IncRNAs in lens culinaris
Author: Naveenkumar HS
2017-18
Lentil (Lens culinaris) an annual, bushy in nature known for its ‘lens-shaped seeds’ belonging to the family Leguminaceae , It is famous as poor man's nourishment also commonly known as masoor dal, dal etc. It is 40 cm tall and the diploid plant (2n = 2 = 14) with a genome size of 4 Gbp. It has the ability to fix air nitrogen, which makes it very important in soil health and wealth management. . It is well known for having seeds formed at its focal point. The crop production in Lentil is affected by both biotic as well as abiotic factors. Drought and Heat stress are the major environmental stresses affecting plants, resulting in reduced productivity and crop losses.The present study is based on the paired-end reads of control and drought affected leaf transcriptome of Lentils in both drought and heat stress condition generated by Illumina Hiseq 2000 technology. The study aims at identification of long non-coding RNAs in leaf tissues of Lentils for both drought stress and Heat stress by its transcriptional profiling, Target prediction of these lncRNAs and their regulating function in this crop are very much lagging. De novo transcriptome assembly was carried out using the assembler trinity. A total of 112210 transcripts, 2,155 lncRNAs and 177 differential expressed lncRNAs seen in Drought stress where as in heat stress a total of 106723 transcripts , 3447 lncRNAs with no differential expression were seen. .mi-RNAs like miR6167, miR1134, miR1134 are found to interact with the putative lncRNAs. All this information can be further be utilized in the implementation of genetic improvement by designing markers and validation for development of new improved cultivars. The target prediction of lncRNAs can also be valuable genomic resource in endeavour of drought and heat tolerant variety development for higher productivity of Lentil.
Computational Intelligence in the estimation of CRISPR Cas9 cleavage sites
Author: Jutan Das
2017-18
CRISPR-Cas9 system is one of the most significance genome editing techniques in the recent time period because of its higher potentiality to modify the specific target genes and region of the genome which are complementary of the designed guide RNA (or sgRNA). Based on the target sequence different sgRNA was design for accurately manipulating the desired genomic sites. CRISPR-Cas9 still now suffering from the off-target effect. Here, I developed three machine learning based techniques (i.e. Artificial Neural Network, Support Vector Machine and Random Forest) for estimation of the CRISPR-Cas9 cleavage sites to be cleaved by a given sgRNA. All of these machine learning model are developed based on the plant dataset which are overlooked in previous studies. The models were train and tested on the collected on-target and off-target dataset of different plant species. For ANNs I developed total six models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) among all of these model ANN1- ReLU model give the best result from other ANN based developed models. Here, I developed total four SVM model (SVM-Linear, SVM-Polynomial, SVM-Gaussian and SVM-Sigmoid), from all the model SVM-Linear model performance better compare to other three developed model. I demonstrate that random forest model attains the best performance on the plant dataset among other developed models, its produce an average classification area under the ROC curve (AUC) is 99.0%. Also I display that the prediction by my developed models are more precise compare to other available methodologies (CRISTA). Additional analyses are led to explore the fundamental reasons from different perceptions.
Development of Non-B, DNA Database for Rice and Maize
Author: Mohan Babu H S
2016-17
Amongst nucleic acids it has been found that apart from normal canonical form of B-DNA there are many other forms which are biologically functional. Keeping this in mind a database of non-B DNA was created for Rice and Maize. The Chromosome sequences and the gene information was collected from NCBI Database for Rice (Oryza sativa Japonica Group) and Maize (Zea mays) crops. The seven major non-B forms of DNA i.e., A-DNA, Z-DNA, G-Quadruplex motifs, Inverted Repeats, Direct Repeats, Mirror Repeats and Short Tandem Repeats were predicted in Rice and Maize chromosomes using the non-B DNA motif search tool which is freely available over the internet. The results were used to create the database, using the WAMP framework for Windows operating system. The database Architecture includes three tiers with the clients at the top, web server at the middle and MySQL database at the bottom. Bioperl script with the SeqIO module was used to divide the chromosomal sequences into subsequences while predicting the motifs from the chromosomes of Maize since their size is more than the analysis limit of the motif search algorithm. Window analysis was done to obtain the motifs that might have been missed, at the flanking regions of the subsequences. The interface was created using the client side programming languages HTML (Hypertext markup Language), CSS (Cascading style sheets), and JavaScript. PHP (Hypertext Preprocessor) was used as server side scripting language. MySQL was used as the structured query language to create the General search option to retrieve the motif data and the advanced search option in which user can search by Gene ID, accession number and description of the protein product coded by the gene. With these search options the interface also includes the menu for crop wise and chromosome wise statistics of non-B DNA motifs. The glossary menu includes the definitions for technical terms used in the project. Link is provided to NCBI Genome Data view to visualize the motif location on the genome. Links are also provided to the User manual, index files of Rice and Maize, tools and resources used in the research project. Since these motifs are involved in critical functions in the cell their study may be important for understanding economically important physiological phenomenons in other crops and animals of agricultural importance so the database can be further extended to meet this objective.
An Ensemble Based Clustering Approach for Metagenomics Data
Author: Dipro Sinha
2016-17
Study on Genomic Sequence Segmentation using Statistical and Computational Approach
Author: Arfa Anjum
2015-16
Some Investigation on Selection of Informative Genes using Gene Expression Data
Author: Nitesh Kumar Sharma
2017-18
Informative gene selection from high dimensional gene expression data has appeared as an important area of research in agri-genomics. Different gene selection techniques has been developed in recent time based on relevancy and redundancy of genes with class and among the genes. Most popular techniques for informative gene selection are Maximum Relevancy and Minimum Redundancy (MRMR) and Support Vector Machine Recursive Feature Elimination (SVM-RFE). However, these methodology have some drawbacks. One of the major drawback is that it ignores the spurious relations between genes and trait under study. In this study, a methodology for informative gene selection has been developed which takes care of this spurious relation by implementing the bootstrap technique along with SVM-RFE and MRMR. The performance of these gene selection techniques has been analyzed through classification accuracy of the SVM model with linear kernel developed using selected informative genes as predictors. A comparative evaluation of the developed method was done against three well known existing techniques for gene selection viz. Boot-MRMR, SVM-RFE, MRMR. On the basis of various evaluation measures, it has been observed that the performance of the developed methodology is better as compared to above given techniques and select less number of more informative genes. In order to proper implementation and dissemination of the developed methodology, a user friendly web-based tool named “Informative Gene Selection Tool (IGST)” has been developed by using state of the art technology. This study will provide a practical guide to select informative genes from high dimensional expression data to enhance the molecular breeding program in the area of agriculture science.
Development of Integrated Model for Genomic Selection
Author: Sayanti G Majumdar
2014-15
Genomic Selection (GS) is a recent area for efficient breeding of animals and plants. GS has been used globally for increasing agricultural production and productivity in recent days. It is suitable for selecting complex quantitative traits which lead to the efficient selection of breeding material after predicting Genomic Estimated Breeding Values (GEBVs) of target species. The accuracy of estimation of GEBVs depends on a large number of factors which include training population, genetic architecture of target species, statistical models, etc. Accuracy of selection of breeding parents also varies based on selected GS models according to their assumptions and treatments of marker effects. The first step of GS is feature (marker) selection. There are several models available in the literature for selection features in GS. However, applicability of these models is based on many factors including extent of additive and epistatic effects of breeding population. Therefore, there is strong need to evaluate the performance of these models and techniques of feature selection under different genetic architecture. Furthermore, statistical models for Genomic Selection available for estimation of Genomic Estimated Breeding Value are not robust against this genetic architecture and depend on datasets. Some models perform well for additive genetic architecture and others perform well for non-additive genetic architecture. But, there is lack of estimator which could capture both of these effects simultaneously. Therefore, this study has been conducted to develop a robust estimator which may be able to capture both additive and non-additive effects efficiently. First, the performance of linear/ additive effect models as well as non- linear/epistatic effect models have been evaluated through a simulation study. In general, performance of SpAM was found to be superior for GS than all other additive effects models considered in this study. However, in case of low heritability and high epistatic effect, the HSIC LASSO out performed all competitive models. Therefore, an robust integrated estimator has been developed by combining these two efficient additive and non-additive models i.e. SpAM and HSIC LASSO respectively. Further, for estimation of error variance four different methods have been evaluated which are being used for estimation of weight in the developed Integrated Model. The performance of the proposed model has been evaluated on the basis of prediction accuracy, fraction of correctly selected features and redundancy rate along with their standard error of mean. Further, the performance of the proposed model has been compared with SpAM and HSIC LASSO with respect to the above criteria. The newly developed estimator is found to be superior in terms of its performance and it has been demonstrated to be robust against any genetic architecture of datasets. Also the performance of the developed Integrated Model has been evaluated in case of 2%, 5% and 10% genotypic imputation of data and it is found to be comparable with respect to the complete dataset.
Development of transcriptome based web-genomic resources for drought responsiveness in block pepper
Author: Ankita Negi
2016-17