Confident peptide-spectrum matching in the absence of ground truth data
Mass spectrometry (MS)-based proteomics plays a pivotal role in elucidating complex biological systems by identifying and quantifying peptides and proteins. A central challenge in this field is the accurate matching of experimental MS/MS spectra to theior originating peptides. Sequence database searching has been the cornerstone for accomplishing this task; however, the absence of a ground truth presents significant challenges in distinguishing correct from incorrect peptide-spectrum matches (PSMs).
In this presentation, I will begin by providing an overview of the sequence database searching approach for matching MS/MS spectra to peptides. Following this, I will introduce the concept of target-decoy competition, a robust statistical method for controlling the false discovery rate (FDR) during spectrum annotation. This strategy is particularly useful for filtering out incorrect PSMs when a ground truth is lacking.
Lastly, I will describe how semi-supervised learning techniques can be harnessed to improve the accuracy and sensitivity of spectrum annotation. By combining traditional methods with advanced machine learning algorithms, this provides a powerful approach for PSM validation that not only minimizes false positives but also increases the detection of true positive matches.
The slides of the presentation are available here.
Speaker
Wout Bittremieux is professor at the Department of Computer Science.
Time and Place
Monday 13/11/2023 at 12:45pm in M.A.143
Registration
Participation is free, but registration is compulsory. Make sure to fill in this form.
References and Related Reading
- An Introduction to Mass Spectrometry-Based Proteomics
- A face in the crowd: Recognizing peptides through database search
- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry
- A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics
- Semi-supervised learning for peptide identification from shotgun proteomics datasets
- Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning