Back to Alumni Directory
Jutan Das
PhD
Alumni • Class of 2019-20

Jutan Das

Ph.D. Bioinformatics

Assistant Professor, Kalinga University, Raipur
11468
Dr. Sarika

1

Publications

5 yr 10 mo

Duration

Research Thesis

Title

Development of artificial intelligence-based fish specific long non-coding RNA biomarkers discovery tool and web genomic resource

Objectives

I. To develop an Artificial Intelligence-based classifier for the identification of long non-coding RNAs for fish. II. To develop a web-based tool for the detection of long non-coding RNA biomarkers in fish using Artificial Intelligence. III. To develop a web genomic resource of common carp.

Abstract

Long noncoding RNAs (lncRNAs) are a subclass of RNA molecules longer than 200 nucleotides that do not encode proteins. However, they play crucial roles in regulating gene expression and most cellular processes. Although computationally challenging, especially with less-studied organisms such as fish, predictions and functional characterization of lncRNAs are urgently needed. This work bridges the gap by creating state-of-the-art machine learning/ deep learning models for identifying and analyzing lncRNAs in fish, contributing to improving aquaculture using genetic insight. A carefully curated dataset from the Ensembl database contained equal amounts of lncRNA and coding RNA sequences from 14 different fish species totaling up to 48,006 sequences. This dataset was enriched with a comprehensive feature extraction process, which combined traditional sequence-based techniques and advanced embedding-based techniques like TF-IDF (Term Frequency-Inverse Document Frequency) to ensure a strong representation of the biological information inherent in the RNA sequences. Six features extracted from the sequences were used for training and testing our ML models. In DL applications, TF-IDF was used. We mostly relied on two feature selection techniques, namely Random Forest (RF) and Univariate Selection (Mutual Information), along with their combination technique, RFMI (Random Forest intersect Mutual Information), in machine learning. A total of twelve different machine learning methods, seven deep learning methods, and three hybrid methods were employed to classify the lncRNAs. Through rigorous evaluation, the model of Light Gradient Boosting Machine (LGBM) with feature selection, combining Random Forest intersect Mutual Information (RFMI) on 45 features outperformed, achieving an accuracy of 98.36%. The effectiveness of the LGBM model was further validated by comparative analysis against six popular lncRNA prediction tools using an independent dataset derived from the fish species (Salmo trutta) not included in the training set. This independent validation underscores the robustness and accuracy of the model in real-world scenarios. This work also introduces FishLncPred, a user-friendly web server available at 250 | A b s t r a c t http://46.202.167.198:5000/ that was developed to facilitate the real-time prediction of lncRNA biomarkers in fish. This tool uses the trained LGBM model to give predictions and downloadable results for user-submitted sequences, making the process of lncRNA identification much easier for researchers in aquaculture. To validate the practicality of this classifier, a case study was conducted on the economically important fish species, common carp (Cyprinus carpio). A total of 33,990 lncRNAs and 22,854 circular RNAs (circRNAs) were identified. The classifier was further applied to identify lncRNAs in common carp from RNA-seq data. This application not only validated the utility of the classifier but also provided insights into the RNA regulatory mechanisms in common carp. In parallel with the prediction tool, a comprehensive genomic resource called CCncRNAdb available at (http://backlin.cabgrid.res.in/ccncrnadb/), was developed for the common carp. CCncRNAdb harbors the identified lncRNAs and circRNAs, which is a very useful resource for the scientific community to fuel further research in fish genomics. In conclusion, this research significantly advances the computational identification and functional analysis of lncRNAs in fish, providing tools and resources that improve the understanding of their roles in aquaculture. The output of this research will lead to more resilient and productive aquaculture practices that could be beneficial for developing more sustainable techniques of fish farming.

Publications (1)

Genome-wide identification and characterization of tissue specific long non-coding RNAs and circular RNAs in Common carp (Cyprinus carpio L.)

Genome-wide identification and characterization of tissue specific long non-coding RNAs and circular RNAs in Common carp (Cyprinus carpio L.)

Das, Jutan, Kumar, Baibhav, Saha, Bibek, Jaiswal, Sarika, Iquebal, Mir Asif, Angadi, UB, Kumar, Dinesh

Frontiers in Genetics 2023 NAAS: 8.80 IF: 2.80
View →

Academic Details

Program
PhD
Roll Number
11468
Batch Year
2019-20
Fellowship
Institute
Admission
Aug 2019
Completion
Jun 2025