Publications
Browse our research publications and academic works
Publications by Year
Publication Types
Development of Bioinformatics Framework for Enhanced Utilization of Genomic Resources in Seasame (Sesamum indicum L.)
Author: Supriya Purru
2014-15
Sesame (Sesamum indicum L) is an ancient diploid (2n) dicotyledonous oilseed crop belonging to family pedaliaceae. Sesame, as a source of high quality oil, is valued for its stability, nutritional value and resistance to rancidity and is often referred as the “Queen of oil seeds”. The recent advances in whole genome sequencing particularly, the evolution of next generation sequencing technologies led to the availability of large amounts of genomic data covering the entire genome. The genomic information of pedaliaceae family is quite limited as genomes from this family have not previously been sequenced. Hence, it is essential to explore the sesame plant with state of the art of sequencing technology to improve the quality and yield. In the present investigation, the whole genome sequencing of Swetha variety of sesame generated 30 Gb data and provided 85x coverage using different NGS technologies. The high quality reads were assembled in to 16 linkage groups which covered ~340 Mb of the estimated genome size and showed 96.32% genome coverage. A total of 24579 genes were predicted from whole genome which were classified into 59 functional groups and found to involve in pathways such as fatty acid biosynthesis, purine metabolism, starch and sucrose metabolism etc. Furthermore, a total of 8244 genes had significant matches in PlantTFDB corresponding to 76 transcription factor families in which FAR1 (1132) was the most abundant TF followed by MADS (517), bHLH (499) and NAC (420). Microsatellite mining resulted in the identification of 90378 SSRs which includes di, tri, tetra, penta and hexa nucleotide repeats with one SSR present per 1.76 kb genome. The distribution of SSRs varies from one linkage group to other, however, the density is almost similar in all linkage groups ranging from 1.64 kb to 1.90 kb. Furthermore, 119710 SNPs were identified from the whole genome using CLC genomics workbench. The whole transcriptome assembly resulted in 16,612 transcripts with N50 value of 905 bp. A total of 16548 unigenes were predicted among which 15, 438 unigenes were assigned a total of 58 gene ontology (GO) terms in three ontologies namely, biological process (23), molecular function (16) and cellular component (19). Moreover, 1716 unigenes had significant matches in the KEGG database and were assigned to 22 KEGG pathways such as translation pathway, energy metabolism, carbohydrate metabolism and amino acid metabolism. The whole genome bisulphite sequencing analysis was done in order to obtain whole genome methylation patterns in sesame and the results suggested that 48% of DNA methylation was in CG context, 26% in CHG and 25% in CHH context. Micro RNA prediction and target identification revealed that predicted miRNAs of sesame target the genes involved in regulation of transcription, proteolysis, ATP binding, signal transduction and brassinosteroid mediated signaling pathway. The present study enhances the availability of genomic resources of sesame and provides significant amount of information which could serve as a valuable basis for future studies. The data generated and analyzed in the present investigation are made available through Sesame Genomic Information Resource (SGIR) which is an integrated genomic resource for sesame crop and can be freely accessible at http://202.141.12.147/sgir/. SGIR houses sesame genes with associated annotations, unigenes, microsatellite markers, SNPs, whole genome wide methylated regions and also contains a comprehensive set of query and display options. Furthermore, SGIR will continue to enhance the genomic information of sesame to benefit users in the discovery, use and combination of modules. The newly identified genes, markers and primers are expected to help sesame breeders in developing marker tags for the traits of economic importance thereby bringing about greater efficiency in marker assisted selection programs.
Development of Transcriptome based webgenomic resource of small cardamom (Elettaria cardamomum maton
Author: Aamir Khan
2015-16
Transcriptome analysis of moisture stress responsive genes in Lathrus sativus using RNA
Author: Sneha Murmu
2015-16
Lathyrus has a great agronomic importance as it is grown for both human consumption and livestock feed and is well adapted to the arid conditions and is one of the hardiest pulses known till date. Together with the growing popularity of RNA Seq, a number of data analysis methods and pipelines have already been developed for transcriptome analysis. There is a common assumption that substantial gains occur in the quality of the results as read length increases and when paired-ends (PE) are used. Currently, however, there are no clear consensus about the best practices for SOLiD short read single end data, which makes the choice of an appropriate method a daunting task especially for a basic user. Hence a comparative study of RNA-Seq analysis tools, in this study commercialized CLC bio Genomics Workbench vs open-source software like Velvet-Oases and TopHat-Cufflinks for de novo and reference-genome based approach respectively, was made with the aim to understand and assist the choice of selection of such methods for SOLiD transcriptome data. Velvet-Oases and TopHat-Cufflinks were chosen to carry out the transcriptome analysis of Lathyrus sativus based on their performance against Glycine max test dataset. Drought negatively impacts plant growth and the productivity of crops around the world. Understanding the molecular mechanisms in the drought response is important for improvement of drought tolerance using molecular techniques. In this study, we found 57 differentially expressed genes in case of de novo based approach and 140 in case of reference-genome based approach. The findings of this study is expected to facilitate the decision of choosing an optimal tools for the analysis of short read SOLiD transcriptome data. The result is also expected to provide an improved understanding and identification of sources for resistance against moisture stress for future genetic research in this hitherto under-researched, valuable legume crop.
Bioinformatics tool for analysis of Crop DNA Fingerprints
Author: Shweta Kumari
2015-16
Development of Webserver for Discovery of Polymorthic Microsatellite DNA Markers
Author: Ritwika Das
2015-16
Microsatellite or SSR markers are codominant DNA markers useful for various biological studies. Polymerase slippage during DNA replication, or slipped strand mispairing, is the main cause of variation in the number of repeat units, resulting in length polymorphism in SSR markers. The present scenario warrants the need of bulk marker mining from the whole genome or from the specific location of the chromosome along with their primer generation. In vitro polymorphism discovery has limitation of both time and cost, hence warranting in silico approaches. PolyMorphPredict is developed using Perl (64 bit, version 5), R (version 3.0) and Java (version 7) and launched at Apache. The user interface is built using Javascript and HTML and integrates MISA and Primer3 for SSR marker mining and primer generation, respectively. In-house PERL scripts were developed for string/ pattern matching to detect polymorphic markers. It consists of major modules like SSR Mining, Primer Generation and Polymorphism Detection. This server at present includes two whole genome sequence, viz., sugarbeet and rice. It detects polymorphic markers, in case both self-designed primers and external primers are provided along with the graphical visualization developed in R software. It is freely available for scientific community at http://webtom.cabgrid.res.in/polypoly/. Revalidation of webserver based e-PCR was done using re-sequencing data of 4 Indian rice varieties (i.e., Co-36, Dubraj, Co-39 and Cauvery) of 3K genome project, IRRI, Philippines and was found in concordance which increases reliability of the developed webserver. PolyMorphPredict is a user friendly, cost effective and time saving web-based tool. It has applications in DUS test for variety identification and management, MAS/GAS, QTL and gene mapping, germplasm improvement, traceability of product/produce, tracing hybridization and introgression events, differentiation of edible derived varieties/ initial varieties, marker development for screening the mapping-based population for the genes involved in several key traits, identification and differentiation of fungi, parentage testing in animals and fish, breed signature of animal and fish and genome finishing.
Prediction of miRNA related to late blight disease of potato
Author: Animesh Kumar
2014-15
An in-silico study on synteny between cattle and buffalo genome
Author: Nalinikanta Choudhury
2014-15
Study on change-points in Genomic Sequences
Author: Anubhav Roy
2014-15
Codon Usage Bias Based Comparative Genome Analysis of Rhizobium Species
Author: Priyanka Guha Majumdar
2014-15
Identification and characterization of Enhanced Disease Susceptibility 1 (EDSI) in Solanum melongena using in silico analysis
Author: Soumya Sharma
2014-15
Brinjal is an important vegetable crop of India. Despite the importance of brinjal there is little molecular and genetic information available for Brinjal. As a vegetable crop brinjal suffers from major crop losses due to pests and diseases. Recent advances in genetic engineering technologies for example target specific genome editing using CRISPER/cas9 have empowered the scientific community with the most important weapon for modulating genomes for desired phenotypes. It requires detailed information of the gene and its function for being a suitable candidate for genetic editing. Enhanced Disease Susceptibility1 (EDS1), a key regulator of plant defense could become a suitable target for gene editing to manipulate host resistance. Arabidopsis thaliana EDS1 (AtEDS1) is the most deeply studied EDS1 protein with crystal structure in PDB (4NFU). In the present study an attempt to extract detailed information regarding this protein in brinjal has been done. Brinjal EDS1 (SmEDS1) protein coding sequence is extracted from brinjal genome database using tblastn taking AtEDS1 as a query sequence. Prediction of the coding sequence refined with the help of transcriptome assembly data. SmEDS1 gene has been found in the contig Sme2.5_09498.1 of eggplant draft genome assembly. The gene has 1806 nucleotide long coding sequence encoding for 602 amino acid long protein. The gene has the most common architecture with three exons as compared to EDS1 protein coding genes from other families. The comparative analysis of SmEDS1 protein along with 46 other species EDS1 proteins proved the strong sequence and structural conservation of this protein among plants. The incongruence in the sequence and structure based phylogenetic trees was observed that could be attributed due to the influence of difference between global alignment and conservation of sequence signatures or it could be possibly explained by the fact that sequences are related by phylogeny whereas structures by constraints on their functions and regulations. Deeper analyses of phylogenetic relationships based on EDS1 will facilitate its genetic manipulation for agronomic purposes.
Deciphering genes associated with root wilt by RNA – Seq approach in coconut (Cocos nucifera)
Author: Sandeep Kumar Verma
2014-15
Development of transcriptome signature of different stages of lac insect (Kerria lacca)
Author: Bulbul Ahmed
2014-15
The natural gift, shellac, mainly grown in south east Asian countries, is the resinous substance produce by Lac insect (Kerria lacca). It has wide applications in surface coating industry, adhesive industry, electrical industry, pharmaceutical industry, confectionery industry, cosmetics industry etc. The paired end Illumina data of Kerria lacca of four stages, viz., larva, crawlers, fertilized female and adult stages from public domain were used for identification of differentially expressed genes at different stages of lac insect along with their annotation. De novo assembly was done using trinity, followed by abundance estimation and identification of differentially expressed genes. Assembly resulted in 157017 contigs and N50 value of 1374bp. All pair-wise combinations of stages were made to identify the differentially expressed genes. Further, stage specific differentially expressed genes were identified on the basis of fold change value and p-value less than 0.05. At all the four stages, 40 signature differentially expressed genes were identified. 3, 7, 20 and 3 pathways were identified at larva, crawlers, fertilized female and adult stages, respectively. A total of 80,761 SSRs, 110289 SNPs and 10657 Indels were also identified from all the four combined stages. The genomic enrichment of lac insect in terms of markers’ discovery, stage specific genes, and pathway studies targeting commercially important bio-molecules from this study would be useful for lac breeders. Our study report the first massive genomic resource for lac. Since the whole genome of lac insect is not available, the findings of this study will supplement the genomic information of whole genome sequencing in future.