Research Thesis
Title
Development of Bioinformatics Framework for Enhanced Utilization of Genomic Resources in Seasame (Sesamum indicum L.)
Objectives
1. To develop a pipeline for structural organization of sesame genome. 2. Visualization of trait-marker-association and their relationship with expressed regions for identification of marker tags for marker assisted selection. 3. To study some of the epigenetic mechanisms involved in sesame genome. 4. To develop a user-friendly information system on Indian sesame genome.
Abstract
Sesame (Sesamum indicum L) is an ancient diploid (2n) dicotyledonous oilseed crop belonging to family pedaliaceae. Sesame, as a source of high quality oil, is valued for its stability, nutritional value and resistance to rancidity and is often referred as the “Queen of oil seeds”. The recent advances in whole genome sequencing particularly, the evolution of next generation sequencing technologies led to the availability of large amounts of genomic data covering the entire genome. The genomic information of pedaliaceae family is quite limited as genomes from this family have not previously been sequenced. Hence, it is essential to explore the sesame plant with state of the art of sequencing technology to improve the quality and yield. In the present investigation, the whole genome sequencing of Swetha variety of sesame generated 30 Gb data and provided 85x coverage using different NGS technologies. The high quality reads were assembled in to 16 linkage groups which covered ~340 Mb of the estimated genome size and showed 96.32% genome coverage. A total of 24579 genes were predicted from whole genome which were classified into 59 functional groups and found to involve in pathways such as fatty acid biosynthesis, purine metabolism, starch and sucrose metabolism etc. Furthermore, a total of 8244 genes had significant matches in PlantTFDB corresponding to 76 transcription factor families in which FAR1 (1132) was the most abundant TF followed by MADS (517), bHLH (499) and NAC (420). Microsatellite mining resulted in the identification of 90378 SSRs which includes di, tri, tetra, penta and hexa nucleotide repeats with one SSR present per 1.76 kb genome. The distribution of SSRs varies from one linkage group to other, however, the density is almost similar in all linkage groups ranging from 1.64 kb to 1.90 kb. Furthermore, 119710 SNPs were identified from the whole genome using CLC genomics workbench. The whole transcriptome assembly resulted in 16,612 transcripts with N50 value of 905 bp. A total of 16548 unigenes were predicted among which 15, 438 unigenes were assigned a total of 58 gene ontology (GO) terms in three ontologies namely, biological process (23), molecular function (16) and cellular component (19). Moreover, 1716 unigenes had significant matches in the KEGG database and were assigned to 22 KEGG pathways such as translation pathway, energy metabolism, carbohydrate metabolism and amino acid metabolism. The whole genome bisulphite sequencing analysis was done in order to obtain whole genome methylation patterns in sesame and the results suggested that 48% of DNA methylation was in CG context, 26% in CHG and 25% in CHH context. Micro RNA prediction and target identification revealed that predicted miRNAs of sesame target the genes involved in regulation of transcription, proteolysis, ATP binding, signal transduction and brassinosteroid mediated signaling pathway. The present study enhances the availability of genomic resources of sesame and provides significant amount of information which could serve as a valuable basis for future studies. The data generated and analyzed in the present investigation are made available through Sesame Genomic Information Resource (SGIR) which is an integrated genomic resource for sesame crop and can be freely accessible at http://202.141.12.147/sgir/. SGIR houses sesame genes with associated annotations, unigenes, microsatellite markers, SNPs, whole genome wide methylated regions and also contains a comprehensive set of query and display options. Furthermore, SGIR will continue to enhance the genomic information of sesame to benefit users in the discovery, use and combination of modules. The newly identified genes, markers and primers are expected to help sesame breeders in developing marker tags for the traits of economic importance thereby bringing about greater efficiency in marker assisted selection programs.
Publications (3)
GinMicrosatDb: a genome-wide microsatellite markers database for sesame (Sesamum indicum L.)
Supriya Purru, Sarika Sahu, Saurabh Rai, Rao AR and Bhat KV.
Transcriptome Sequencing of sesame (Sesamum indicum L.) using Illumina Platform.
P Supriya, A.R.Rao and K. V. Bhat.
Computational identification of microRNAs and their target genes in sesame (Sesamum indicum L.)
Supriya Purru, Animesh Kumar, Sunil Archak and KV Bhat