Kabilan S
M.Sc. Bioinformatics
1
Publications
1 yr 8 mo
Duration
Research Thesis
Title
Quality control of label - free Proteomics expression data considering missing values and Heterogeneity
Objectives
1. To develop an approach for obtaining the best combination of normalization and imputation methods for quality control of label free proteomics expression data 2. To develop an R package for the developed approach
Abstract
Proteins are important biomolecules that perform various physiological tasks such as metabolic catalysis, energy conservation, host defence and signalling. Proteomics is the large-scale study of the expressed proteins in a cell. The LC-MS is an indispensable tool for protein identification and quantification analysis because of its improved coverage, sensitivity, and high throughput. The most popular approach for LC-MS based protein analysis is the label-free bottom-up approach, where the unlabelled proteins are proteolytically digested into peptides by specific proteases such as trypsin and then analysed. The label-free LC-MS proteomics dataset often suffers from the problems of data heterogeneity and missing values. Various normalization and imputation methods are widely used for removing these biases, but there is no standard condition available for selecting the suitable combination of normalization and imputation methods for the proteomics expression dataset. This study aims to develop an approach for finding the suitable combination of normalization and imputation methods for the label-free proteomics expression data based on various quality control measures such as PCV, PEV, and PMAD. The standard benchmark dataset based on a highly complex yeast lysate sample spiked with different levels of a UPS1 standard protein was taken for this study. The three popular normalization methods namely, VSN, LOESS, and RLR and three efficient imputation methods named k-NN, LLS, and SVD methods were chosen for this study. They were paired with each other and a total of nine combinations of these methods were considered. The combination of LLS imputation and LOESS normalization was given as the suitable combination in the developed approach. This combination identified a greater number of significant proteins in differential expression analysis, than other combinations in most cases. The performance of the developed approach was consistent even after generating the missing values artificially by three different ways in the dataset and based on the NRMSE scores. The R-package named ‘lfproQC’ was developed for the proposed approach and will be deposited in the CRAN repository. This package can be used to find the best combination of normalization and imputation methods for any label-free proteomics expression dataset and helps in efficient downstream analysis.
Publications (1)
A Statistical Approach for Identifying the Best Combination of Normalization and Imputation Methods for Label-Free Proteomics Expression Data.
Sakthivel, K., Lal, S. B., Srivastava, S., Chaturvedi, K. K., Khan, Y. J., Mishra, D. C., ... & Jha, G. K.
Resources (1)
Testimonial
"Very important period in my life to improve my knowledge and skills which will be really helpful for myself to serve the society."