Back to Alumni Directory
Soumya Sharma
PhD
Alumni • Class of 2016-17

Soumya Sharma

PhD in Bioinformatics

10778
Dr. Sunil Archak

1

Publications

Research Thesis

Title

Development of database of genes and gene families responsible for nutritional traits in field crops

Objectives

1. Collation and curation of genes and gene families responsible for nutritional traits in selected field crops. 2. Development of a methodology for classification of nutrition related genes. 3. Development of detailed interactive database/ web resources by compiling obtained information.

Abstract

ABSTRACT Nutritional insecurity is a major challenge in developing countries which are largely dependent on cereal based diets. Soil and plant scientists have accumulated much information on the concentration of minerals in the leaves of food crops. Major problems with food plants have been attributed to their lower than desired concentration of protein, inadequate essential amino acid ratios in plant proteins, and low digestibility of the proteins and carbohydrates in plants. Nutritionally dense crops offer an inexpensive and sustainable solution to the problem of malnutrition. A comprehensive search strategy was followed to obtain the genes responsible for nutritional traits in plants. The genes for mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis were retrieved using advanced searches with gene ontology keyword for specific nutrients, plants, crops and their nutrient-related role in conjunction with the BOOLEANS like OR/AND), from 4 databases viz. GenBank, EnsemblPlants, Gramene, and UniProt. A total of 7695 sequences for mineral transportation, 1480 sequences for vitamin biosynthesis and 2583 sequences for essential amino acids were obtained. This study was oriented towards the application and comparison of different machine learning techniques (namely, support vector machine, random forest, Naïve Bayes and K nearest neighbour) for development of classification models for nutritional trait (mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis) related gene sequences in flowering plants. Firstly the machine learning techniques were applied for developing three binary classification models: binary classification for mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis genes. Afterwards, three multiclass classification models mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis genes were developed using each of the four classifiers. 5-fold cross validation was performed to compare the performances of four classifiers independently and the results suggested that Random forest, SVM and KNN performed best for both binary as well as multiclass classification. The performance of naïve Bayes was comparatively lower. Finally, a database nutritional trait (mineral transportation, vitamin biosynthesis and essential amino acid biosynthesis) related gene sequences in flowering plants has been developed.

Publications (1)

Comparison of supervised machine learning techniques in classifying vitamin biosynthesis genes.

Comparison of supervised machine learning techniques in classifying vitamin biosynthesis genes.

Sharma, S., Archak, S., Majumdar, S. G., Mishra, D. C., Rai, A.

Journal of the Indian Society of Agricultural Statistics 2022 NAAS: 5.46 IF: 0.00
View →

Academic Details

Program
PhD
Roll Number
10778
Batch Year
2016-17
Fellowship
DBT JRF
Completion
Jul 2023