Back to Alumni Directory
Nalini K Choudhury
PhD
Alumni • Class of 2016-17

Nalini K Choudhury

Ph.D. Bioinformatics

Scientist at ICAR-Central Institute for Fisheries Education, Mumbai
10779
Dr. A. R. Rao

2

Publications

6 yr 5 mo

Duration

Research Thesis

Title

Identification of Deep Learning Models to study microbial diversity in North Indian River Systems

Objectives

1. To study deep learning model based procedures to analyze microbial communities of major north Indian River system. 2. To compare the performance of deep learning model based procedures with the existing procedures meant for metagenome data analysis 3. To develop an user-friendly interface with the developed and existing procedures of meta-genome analysis

Abstract

The Ganga and the Yamuna rivers constitute major north Indian river system. These rivers play an important role in irrigation, fishing, transportation, health, etc. Besides, they function as sinks for major microbial density and diversity. The type and abundance of microbial populations help enable to carry out several bio-remedial and biogeochemical studies including metagenomics. With the advent of high-throughput technologies a large amount of metagenomic data is available in the public domain. Thus, it became a challenge to process such large amount of metagenome data to classify the unknown/unclassified microbes to known groups of microbes such as bacteria, archaea, fungi, virus, and others. On the other hand, machine learning, and deep learning techniques came in a big way to handle myriads of data. In the field of river system metagenomics, that too in north Indian River metagenomics, application of such learning techniques is yet to be fully explored for binning/classification of unknown microbial populations. Further, there is a great demand for the development of online servers with tools/pipelines for analyzing metagenomic data from users view point. Thus, the present investigation has been carried out with objectives to: i) study deep learning model based procedures to analyze microbial communities of major north Indian river system, ii) compare the performance of deep learning model based procedures with the existing procedures meant for metagenome data analysis, and iii) develop an user-friendly interface with the developed and existing procedures of meta-genome analysis. In order to achieve the objectives, the river sediment samples collected from three sites, each at Kanpur & Farakka and Delhi for the Ganga and the Yamuna rivers respectively, by ICAR-CIFRI, Barrackpore were used. The raw metagenome data generated from the collected samples was pre-processed for quality checks and subsequently metagenome assembly was carried out to obtain contigs and scaffolds. The BLAST was initially applied on the scaffolds to identify the number of known microbial classes. It was found that there were broadly five classes present in the metagenome data. The other scaffolds that were unclassified were kept as separate group. The entire metagenomic data with the extracted features were subjected to iterative K-means clustering to classify the microbes into five categories. The identified group/class labels along with the extracted feature data were used to train and test the machine learning (SVM, RF, GBDT, XGBoost, AdaBoost) and deep learning (BiLSTM) models. A 10-fold cross-validation technique was also employed to assess the performance of learning classifiers in terms of metrics such as sensitivity, specificity, accuracy, etc. It was found from the comparison of performances of classifiers that the Random Forest performed with high accuracy (89%) over other classifiers. A software package based on RF in available at (https://github.com/Nalinikanta7/metagenomics). Also, the results revealed that Acetobacter, Achromobacter, Bacteroidetes, Fadolivirus, Indivirus, Gaeumannomyces, Phoenix, Strongyloides, Halobacterium, Haloferax, Halogeometricum, and Halosimplex microbes are most abundantly present in the metagenome data. Further, 66 percentage of unknown microbes have also been classified into the identified known five categories. The deep learning models have shown an accuracy range of 87 to 89 percentage for the analysis of metagenomic data. Thus, a web server “The Deep Machine in river metagenome” has also been developed based on deep learning models for the users to analyze river metagenome data at cabgrid.iasri.res.in/deepmachine.

Publications (2)

An Improved Machine Learning-Based Approach to Assess the Microbial Diversity in Major North Indian River Ecosystems

An Improved Machine Learning-Based Approach to Assess the Microbial Diversity in Major North Indian River Ecosystems

Choudhury, N.; Sahu, T.K.; Rao, A.R.; Rout, A.K.; Behera, B.K.

MDPI Genes 2023 NAAS: 8.80 IF: 2.80
View →
A Metagenomic Insight into Assessment of Microbial Diversity in the River Ganga at Two Locations for Sustainable Development

A Metagenomic Insight into Assessment of Microbial Diversity in the River Ganga at Two Locations for Sustainable Development

Choudhury, Nalini Kanta, Sahu, Tanmaya Kumar, Rao, A.R., Behera, B.K.

Journal of Community Mobilization and Sustainable Development 2023 NAAS: 5.02
View →

Testimonial

"It’s been a truly great experience at IASRI—gaining hands-on expertise in Bioinformatics, Agricultural statistics and Computer Application with a supportive, inspiring faculties avilable. The institute’s emphasis on real-world research has sharpened my skills and broadened my vision for impactful work in agri-technology. Looking ahead, I feel confident and motivated to contribute to future innovations, especially by integrating AI, data science, and genomics for sustainable solutions. IASRI has prepared me well for the next step in my agricultural research journey."

Academic Details

Program
PhD
Roll Number
10779
Batch Year
2016-17
Fellowship
Institute
Admission
Aug 2016
Completion
Feb 2023