Soumya Sharma
PhD in Bioinformatics
1
Publications
Research Thesis
Title
Development of database of genes and gene families responsible for nutritional traits in field crops
Objectives
1. Collation and curation of genes and gene families responsible for nutritional traits in selected field crops. 2. Development of a methodology for classification of nutrition related genes. 3. Development of detailed interactive database/ web resources by compiling obtained information.
Abstract
ABSTRACT Nutritional insecurity is a major challenge in developing countries which are largely dependent on cereal based diets. Soil and plant scientists have accumulated much information on the concentration of minerals in the leaves of food crops. Major problems with food plants have been attributed to their lower than desired concentration of protein, inadequate essential amino acid ratios in plant proteins, and low digestibility of the proteins and carbohydrates in plants. Nutritionally dense crops offer an inexpensive and sustainable solution to the problem of malnutrition. A comprehensive search strategy was followed to obtain the genes responsible for nutritional traits in plants. The genes for mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis were retrieved using advanced searches with gene ontology keyword for specific nutrients, plants, crops and their nutrient-related role in conjunction with the BOOLEANS like OR/AND), from 4 databases viz. GenBank, EnsemblPlants, Gramene, and UniProt. A total of 7695 sequences for mineral transportation, 1480 sequences for vitamin biosynthesis and 2583 sequences for essential amino acids were obtained. This study was oriented towards the application and comparison of different machine learning techniques (namely, support vector machine, random forest, Naïve Bayes and K nearest neighbour) for development of classification models for nutritional trait (mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis) related gene sequences in flowering plants. Firstly the machine learning techniques were applied for developing three binary classification models: binary classification for mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis genes. Afterwards, three multiclass classification models mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis genes were developed using each of the four classifiers. 5-fold cross validation was performed to compare the performances of four classifiers independently and the results suggested that Random forest, SVM and KNN performed best for both binary as well as multiclass classification. The performance of naïve Bayes was comparatively lower. Finally, a database nutritional trait (mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis) related gene sequences in flowering plants has been developed.
Publications (1)
Comparison of supervised machine learning techniques in classifying vitamin biosynthesis genes.
Sharma, S., Archak, S., Majumdar, S. G., Mishra, D. C., Rai, A.