Online Inquiry

Bioinformatics Tools and Databases for Post-Translational Modifications (PTM) Analysis

Introduction to Post-Translational Modifications (PTMs)

Post-translational modifications (PTMs) are chemical changes proteins go through after being made in the cell. These modifications act like switches, activating or deactivating a protein's function. They can also change a protein's location in the cell or alter how it interacts with other molecules. For example, adding a small chemical group helps a protein bind to a specific partner or change its shape. This can affect key processes like cell growth, programmed cell death, and DNA repair. PTMs are crucial for all cells and play a major role in regulating biological processes. However, figuring out PTMs experimentally is hard and requires many resources. This has led to a rise in computational methods for predicting and analyzing them. Scientists combine experimental methods and computer tools to study PTMs, improving our understanding of how cells react to various signals.

Common Bioinformatics Tools for PTM Identification and Prediction

A wide array of bioinformatics tools has been developed to help researchers identify and predict PTMs with precision and efficiency. These tools can be broadly grouped into three categories, each designed to tackle different aspects of the problem.

Sequence-Based Prediction Tools

These methods primarily analyze the amino acid sequence of a protein to locate potential modification sites. They rely on the recognition of specific sequence patterns or motifs that are known to be associated with modifications, such as a particular arrangement of amino acids that might signal a site for glycosylation or phosphorylation. Tools in this category often compare the target protein sequence with a library of experimentally validated sequences from public databases. By matching these patterns, they offer predictions on where modifications may occur. The advantage of this approach is its simplicity and speed, though it may not always capture the full complexity of the protein's behavior.

Structure-Based Prediction Tools

While sequence-based methods focus on the linear arrangement of amino acids, structure-based tools consider the three-dimensional conformation of proteins. These tools evaluate how the physical arrangement of a protein, including its folds and loops, might affect the accessibility of certain residues for modification. They take into account factors like the protein's surface exposure and the spatial relationships between different regions. This detailed view helps in predicting PTMs more accurately because the three-dimensional context often influences whether a particular site is available for modification by enzymes. As a result, structure-based tools can provide deeper insights, particularly for modifications that are heavily influenced by the protein's conformation.

Machine Learning and AI-Driven Tools

The most recent advancements in PTM prediction leverage machine learning and artificial intelligence to integrate both sequence and structural information. These tools are trained on large datasets containing both positive examples (known modification sites) and negative examples (sites without modifications). By learning the subtle patterns that differentiate modified from unmodified sites, these algorithms can make highly accurate predictions. Methods such as artificial neural networks (ANN), support vector machines (SVM), and hidden Markov models (HMMs) are commonly used.

General PTM Databases

General PTM databases serve as comprehensive repositories that collate a wide range of PTM information across multiple organisms and modification types. These databases are invaluable for researchers because they provide a one-stop resource to access verified data on protein modifications.

Swiss-Prot (UniProt) (https://www.uniprot.org/): UniProt offers a non-redundant and meticulously curated collection of proteins that includes both experimentally validated and predicted modifications. This resource is particularly useful because it not only lists PTM sites but also offers additional details such as protein function, structure, and interaction partners.

The Human Protein Reference Database (HPRD) (https://ngdc.cncb.ac.cn/databasecommons/database/id/1383): HPRD focuses on human proteins and contains detailed records on over 30,000 proteins and nearly 100,000 modification events. HPRD is distinguished by its manual curation process, where experts review and verify the data, ensuring that the information is accurate and reliable. This level of detail assists researchers in mapping out complex cellular processes, such as signaling pathways that govern cell growth or programmed cell death.

dbPTM (https://ngdc.cncb.ac.cn/databasecommons/database/id/343): dbPTM aggregates PTM data from multiple sources, including various public databases and scientific literature. It offers a broad spectrum of modification types, making it an excellent starting point for researchers looking to explore a diverse array of PTMs. By integrating data from different sources, dbPTM helps to ensure that users have access to the most comprehensive and up-to-date information.

dbPTM database for protein PTMs.

Figure 1. dbPTM database for protein post-translational modifications (PTMs). (Form https://biomics.lab.nycu.edu.tw/dbPTM/index.php)

Specific PTM Databases and Glycosylation

PTMs play a vital role in regulating protein function, and glycosylation is one of the most abundant and functionally significant types. Glycosylation involves the attachment of sugar molecules to proteins, influencing their stability, activity, and localization. To facilitate the comprehensive study of glycosylation and other PTMs, several specialized databases have been developed. These resources provide researchers with curated data on glycosylation sites, glycan structures, and related biological functions.

Phosphorylation Databases

Phosphorylation, the addition of a phosphate group to a protein, is one of the most well-studied PTMs. Several databases specialize in documenting phosphorylation sites, providing insights into kinase-substrate relationships and signaling pathways. Notable phosphorylation databases include:

  • PhosphoSitePlus (https://www.phosphosite.org/homeAction.action): A comprehensive resource that covers phosphorylation, ubiquitination, and acetylation events, with curated data from experimental studies.
  • Phospho.ELM (http://phospho.elm.eu.org/): Contains manually annotated phosphorylation sites and supports motif-based predictions to identify potential phosphorylation targets.
  • PHOSIDA (https://ngdc.cncb.ac.cn/databasecommons/database/id/610): Offers large-scale phosphorylation data derived from mass spectrometry experiments, supporting cross-species analysis.

Glycosylation Databases

Glycosylation databases provide essential information on the types, structures, and functional roles of glycans. Some focus on specific glycosylation types, while others cover a broader spectrum of glycan modifications. Key databases include:

  • UniCarbKB (https://ngdc.cncb.ac.cn/databasecommons/database/id/185): A comprehensive database that curates glycan structures, glycoproteins, and their biological sources. It also provides experimental data from mass spectrometry and glycan sequencing studies.
  • GlyGen (https://www.glygen.org/): Integrates glycoscience data, offering extensive information on glycans, glycoproteins, and enzymes involved in glycosylation.
  • GlycoSuiteDB (https://ngdc.cncb.ac.cn/databasecommons/database/id/5421): Specializes in annotated glycan structures and glycosylation sites, with detailed mass spectrometry data. This database is particularly valuable for biomarker discovery and therapeutic glycoprotein characterization.
  • O-GLYCBASE (https://services.healthtech.dtu.dk/datasets/OglycBase/): Focuses on O-linked glycosylation, specifically O-GlcNAcylation, which regulates numerous cellular processes. It provides curated datasets to support the identification of O-linked glycosylation sites.

Organism-Specific PTM Databases

Organism-specific PTM databases are tailored to capture the unique biological contexts of individual species, offering refined insights that are essential for accurate predictions and meaningful biological interpretations. Unlike general PTM databases, these resources incorporate species-specific data derived from experimental studies and computational predictions. This allows researchers to account for variations in protein sequences, enzyme activities, and regulatory mechanisms across different organisms.

Bacterial and Yeast PTM Databases

  • NetPhosBac and NetPhosYeast (https://services.healthtech.dtu.dk/services/NetPhosYeast-1.0/): Specialized tools design to predict phosphorylation sites in bacterial and yeast proteins, respectively. Bacteria and yeast exhibit distinct phosphorylation patterns, often regulated by different kinases than those in mammalian systems. NetPhosBac and NetPhosYeast utilize species-specific training datasets, enhancing their accuracy and providing insights into bacterial signaling pathways and yeast cell cycle regulation.
  • PhosPhAT (https://ngdc.cncb.ac.cn/databasecommons/database/id/448): PhosPhAT is dedicated to the phosphorylation sites in Arabidopsis thaliana, a model plant species. This database integrates mass spectrometry data with predictive algorithms to map phosphorylation networks in plants. PhosPhAT has become invaluable for understanding plant-specific responses to environmental stress and hormonal signaling.

Human and Mammalian PTM Databases

  • PhosphoSitePlus (https://www.phosphosite.org/homeAction.action): PhosphoSitePlus is a comprehensive resource focused on human phosphorylation, ubiquitination, and acetylation sites. With detailed annotations from both experimental and high-throughput studies, it serves as a foundational platform for studying human disease mechanisms and developing targeted therapies.
  • The Human Protein Reference Database (HPRD) (https://ngdc.cncb.ac.cn/databasecommons/database/id/1383): HPRD includes information on protein-protein interactions, modifications, and disease associations. By curating experimental data from literature, HPRD facilitates a deeper understanding of human PTM networks and their roles in health and disease.
  • dbPTM (https://ngdc.cncb.ac.cn/databasecommons/database/id/343): dbPTM provides extensive PTM data across various species, including humans, mice, and other mammalian models. It enables cross-species comparisons to identify conserved PTM sites and predict potential regulatory mechanisms in mammalian systems.

Plant-Specific PTM Databases

  • Plant PTM Viewer (https://www.vanbreusegemlab.be/research/plant-ptm-viewer): Plant PTM Viewer is a curated database that compiles PTM data from multiple plant species, focusing on phosphorylation, acetylation, and glycosylation events. It provides researchers with insights into how plants regulate growth, stress responses, and metabolic pathways through PTMs.
  • P3DB (Plant Protein Phosphorylation Database) (https://www.p3db.org/): P3DB offers plant-specific phosphorylation data derived from large-scale proteomics studies. By cataloging phosphoproteins and their corresponding kinases, P3DB supports the identification of signaling cascades involved in plant development and environmental adaptation.

Comparative Evaluation of Key PTM Databases

Database PTM Types Covered Data Source Key Features Limitations
Swiss-Prot (UniProt) Broad spectrum of PTMs Expert-curated experimental data Comprehensive, non-redundant, integrates functional annotations Primarily focuses on well-characterized proteins
HPRD Protein interactions and PTMs (e.g., phosphorylation) Manually curated literature data Extensive human-specific PTM entries; linked with auxiliary tools (e.g., PhosphoMotif Finder) Limited to human proteins
dbPTM Phosphorylation, glycosylation, ubiquitination, etc. Literature curation and experimental data Wide coverage across many PTM types; includes prediction modules May contain outdated information due to reliance on literature mining
Phospho.ELM Phosphorylation (Serine, Threonine, Tyrosine) Manually curated from scientific literature Detailed kinase-substrate interactions; motif-based prediction data No recent updates; newer databases have emerged
PhosphoSitePlus Phosphorylation, ubiquitination, acetylation Mass spectrometry experiments, curated experimental data High-confidence, human-specific modification data; detailed kinase-substrate information Limited data for non-human species
GlycoSuiteDB Glycosylation Mass spectrometry and literature mining Detailed glycan structures and linkage information; cross-referenced with related resources Limited to glycosylation events; update frequency can be variable
O-GLYCBASE O-linked glycosylation Curated from SWISS-PROT, PIR, and other sequence databases Focused repository for O-linked glycosylation sites with verified data Coverage is narrower compared to general PTM databases

Applications of PTM Analysis in Biomedical Research

  • Elucidate Signal Transduction Pathways: By mapping phosphorylation and glycosylation events, researchers can unravel the complex regulatory networks that govern cellular processes.
  • Identify Disease Biomarkers: Aberrant PTM patterns are frequently implicated in diseases such as cancer, neurodegeneration, and metabolic disorders. Predictive models and databases enable the discovery of novel biomarkers for diagnostic and therapeutic applications.
  • Facilitate Drug Development: Kinases and other PTM-modifying enzymes are prominent targets in pharmaceutical research. Comprehensive PTM analyses inform the design of inhibitors and modulators with high specificity and efficacy.
  • Advance Personalized Medicine: The integration of PTM data with genomic and transcriptomic profiles paves the way for precision medicine, wherein therapeutic interventions are tailored to individual molecular signatures.

Case Study

Investigation and identification of functional post-translational modification sites associated with drug binding and protein-protein interactions

Journal: BMC Systems Biology

Published: 2017

DOI: 10.1186/s12918-017-0506-1

Background

Protein PTMs are critical regulators of protein function, influencing folding, stability, and interactions. Despite advances in MS-based proteomics, the local effects of PTM sites—especially near drug-binding and protein-protein interaction (PPI) interfaces—remain poorly understood, even though many therapeutic effects of protein-based drugs are mediated by PTMs.

Purpose

The study aimed to investigate and structurally characterize functional PTM sites that affect drug-target binding and PPIs, thereby elucidating the molecular mechanisms by which PTMs regulate protein function.

Methods

  • Data Integration: Experimentally verified PTM sites were obtained from dbPTM and mapped to three-dimensional (3D) structures from the Protein Data Bank (PDB).
  • Structural Characterization: Five properties were analyzed: spatial amino acid composition, neighboring residue orientations, secondary structure, local charge environment, and solvent-accessible surface area.
  • Mapping to Functional Sites: PTM sites were cross-referenced with known drug binding and PPI interfaces using tools like PoseView and molecular docking (iGEMDOCK), as well as domain information from Pfam and 3D interaction data from 3DID.
  • Platform Development: An integrated analytical platform, CruxPTM, was developed for 3D visualization and detailed exploration of PTM sites.
Flowchart of the analyses performed.

Figure 2. Flowchart of the analyses performed in this study.

Results

  • Over 25,000 PTM sites were mapped to proteins with known structures, with 1,785 modified sites directly observed on 3D structures.
  • The study identified 1,917 PTM sites that may affect PPIs and 3,951 PTM sites associated with drug-target binding.
  • Case studies (e.g., on IGF-1R, human serum albumin, urease, p21, RhoGDI1, and Tau) demonstrated how specific PTMs modulate binding affinities and interactions, affecting both drug efficacy and protein complex formation.

Conclusion

This work delineates the structural correlations between PTM sites and functional interfaces for drug binding and PPIs. The CruxPTM platform enables researchers to visualize and analyze these relationships, potentially enhancing our understanding of PTM functions in biological processes and improving drug design strategies.

References

  • de Brevern A G, Rebehmed J. Current status of PTMs structural databases: applications, limitations and prospects. Amino acids, 2022, 54(4): 575-590. DOI: 10.1007/s00726-021-03119-z
  • Khoury G A, Baliban R C, Floudas C A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Scientific reports, 2011, 1(1): 90. DOI: 10.1038/srep00090
  • Zhao M X, et al. Protein phosphorylation database and prediction tools. Briefings in Bioinformatics, 2023, 24(2): bbad090. DOI: 10.1093/bib/bbad090