Information

How to interpret the relationships of PTMs from BioGRID's data

How to interpret the relationships of PTMs from BioGRID's data


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

On BioGRID Database, PTMREL is a file that describes relationships of the PTMs (Post Translation Modification) tabulated in a PTMTAB file.

I have several issues with this file.

  1. Foremost, I am not sure how to interpret the 'relationships'.

There are two columns 'Relationship' and 'Identity' that supposedly describe the relationships of PTMs.

Relationship. A plain text descriptor of the type of relation represented. (Unique values: ['-', 'kinase', 'phosphatase']) Identity. A plain text descriptor of additional identity details if available. (Unique values: ['catalytic', 'regulatory', 'PTM'])

Lets say, a gene has a relationship (in in PTMTAB file, column 'Has relationship' is True) and in PTMREL file it is 'kinase' (in PTMREL file column 'Relationship'), does that mean that the gene is kinase or it is phoshorylated by a kinase? Also what does it mean if the gene is additionally 'catalytic', 'regulatory' or a 'PTM' (in PTMREL file column 'Identity')?

Here's BioGRID's wiki page describing (but not detailing) the file format of PTMREL file. https://wiki.thebiogrid.org/doku.php/biogrid_ptmtab_ptmrel.

  1. If PTMREL contains data associated with PTMTAB file, why not simply combine it with PTMTAB file? Without knowing how to interpret the relationships of the PTMS, the users too, can not combine the two files.

  2. Additionally, there are PTMs included in 'BioGRID TAB 2.0' too, and the number of PTMs (for a species) in that file are not the same as in PTMTAB file(!).

I understand that BioGRID provide this data 'WITHOUT ANY WARRANTY'. But still, since BioGRID is the most unified and constantly updated resource of gene interaction networks, users should know about basic usage of the data.

UPDATE:

After some data explorations, I could guess the meanings of the data. If Relationship is '-' -> Identity is always 'PTM'. So, these are the PTMs that do not have known relationships. If Relationship is 'kinase' or 'phosphatase' -> Identity is 'catalytic' or 'regulatory'. Now, here, I am not sure what do 'catalytic' or 'regulatory' mean.


How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers

Histones comprise the major protein component of chromatin, the scaffold in which the eukaryotic genome is packaged, and are subject to many types of post-translational modifications (PTMs), especially on their flexible tails. These modifications may constitute a 'histone code' and could be used to manage epigenetic information that helps extend the genetic message beyond DNA sequences. This proposed code, read in part by histone PTM–binding 'effector' modules and their associated complexes, is predicted to define unique functional states of chromatin and/or regulate various chromatin-templated processes. A wealth of structural and functional data show how chromatin effector modules target their cognate covalent histone modifications. Here we summarize key features in molecular recognition of histone PTMs by a diverse family of 'reader pockets', highlighting specific readout mechanisms for individual marks, common themes and insights into the downstream functional consequences of the interactions. Changes in these interactions may have far-reaching implications for human biology and disease, notably cancer.


Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.


1 INTRODUCTION

The biomedical literature contains a vast amount of information that captures a great deal of knowledge about biology and biomedicine. However, identifying relevant information about gene or protein function with respect to any given process or disease can be a monumental task because of the sheer volume of data contained in the scientific literature and the fact that much of the literature is not open access. Automated extraction of key data elements from publications cannot be easily achieved due to the unstructured free-form text that comprises most of the biomedical literature. Moreover, relevant data are often only present in nontext elements such as figures, tables, and supplementary information. A fundamental goal of biomedical data curation is to convert text-, table-, and figure-based experimental information from the literature into consistent structured records that can be easily accessed in standardized formats for computational analyses.

To address these issues, the BioGRID resource was started in 2006 with the focused goal of comprehensively curating all available biological interaction data generated in the budding yeast Saccharomyces cerevisiae. 1, 2 Discrete interactions between macromolecules and/or functional genetic elements form the basis for all biological systems and collectively form highly interconnected auto-regulated networks that imbue system-level properties on cells, tissues, and organisms. These discrete types are readily exploited in computational approaches to model network behavior. 3 Budding yeast represented an ideal test case for curation of interaction data because of the implementation of high-throughput genetic and proteomic techniques that enabled the first genome-wide analyses of genetic and protein interactions in any species. 4 Since then, the BioGRID has expanded coverage to include interaction data for all major model organisms and humans, as well as many other less well-studied species, more than 70 species in all. As of October 2020, BioGRID contains over 1.93 million protein and genetic interactions curated from more than 63,000 publications (Figure 1a, Table 1). With respect to specific species, the BioGRID currently contains over 755,000 interactions for budding yeast, 79,000 interactions for fission yeast, 670,000 interactions for human, 29,000 interactions for worm, 78,000 interactions for fly, and 300,000 interactions for all other organisms (Figure 1b). BioGRID has also expanded its curation strategy to include other types of data that are relevant to biological interactions. For example, BioGRID records more than 515,000 unique protein post-translational modifications (PTMs) and over 28,000 interactions between drugs or other chemicals and their protein targets. BioGRID also now curates gene-phenotype relationships from genome-wide CRISPR screens. The BioGRID record structure, database architecture, and curation pipeline have been described in detail elsewhere. 5

Species Type Gene or protein nodes Interactions (redundant) Interactions (nonredundant) Publications
S. cerevisiae P 7,077 175,968 117,040 9,699
G 5,962 579,323 443,268 9,392
P + G 7,332 755,291 546,582 16,037
S. pombe P 3,571 17,529 13,109 1,624
G 3,627 62,313 52,449 2,003
P + G 4,651 79,842 64,418 2,874
A. thaliana P 10,737 58,424 50,756 2,307
G 306 351 283 155
P + G 10,787 58,775 50,952 2,387
C. elegans P 6,804 27,225 26,119 207
G 1,136 2,344 2,277 36
P + G 7,164 29,569 28,353 226
D. melanogaster P 9,226 64,404 54,507 3,697
G 3,147 14,476 10,141 4,395
P + G 9,546 78,880 63,290 7,166
M. musculus P 15,990 80,510 72,965 4,213
G 355 394 349 198
P + G 16,039 80,904 73,236 4,351
H. sapiens P 25,722 663,179 506,159 32,168
G 3,767 9,181 9,055 351
P + G 26,126 672,360 514,501 32,310
Other P 22,634 57,877 49,945 3,279
G 4,536 171,910 170,197 118
P + G 25,045 229,787 219,323 3,359
All P 76,711 1,088,662 841,748 51,784
G 22,191 839,711 687,467 16,448
P + G 81,183 1,928,373 1,511,287 63,083
  • Note: Data are compiled from BioGRID release 4.1.190 of October 1, 2020.
  • Abbreviations: G, genetic interaction P, physical interaction.

Due to the vast extent of the human biomedical literature, BioGRID has in part taken a biological process- and/or disease-focused approach in order to build curation depth in critical areas of human biology. These themed curation projects include the ubiquitin proteasome system (UPS), chromatin modification, autophagy, glioblastoma, Fanconi anemia and, most recently the SARS-CoV-2 coronavirus that is the causative agent of the COVID-19 pandemic. 6 A curated gene/protein list is developed by domain experts for each project to guide the literature curation strategy. These focused projects have driven much of the extensive growth of the human interaction curation in BioGRID over the past 10 years and are discussed in detail below. The themed curation efforts are complemented by a dedicated on-going effort to curate all large-scale human interaction datasets.

In this review, we describe the content and functionality of BioGRID, with an emphasis on the newest developments. We highlight recent interaction curation for the UPS and for the emergent coronaviruses that cause severe acute respiratory syndromes, including SARS-CoV-2. We also describe a new extension of BioGRID, named the Open Repository of CRISPR Screens (ORCS), which currently contains over 1,042 annotated CRISPR phenotype screens. The large collection of curated interaction data in BioGRID represents a unified resource for integrative network analyses by computational biologists and enables efficient mining of the biomedical literature by researchers interested in specific genes or proteins.


SEARCH FEATURES

The primary method of data access for BioGRID is via the web-based search interface. Combined JavaScript, PHP and Cascading Style Sheets (CSS) enable an interface that is both easy to interpret and navigate. BioGRID is supported by all main standards-compliant web browsers. Searches may be based on a wide range of supported identifiers, including gene name, ORF name, PubMed ID and free text. All genes/proteins retrieved by the query are listed in tabular format and are internally hyperlinked to allow rapid recursive searches. The BioGRID search interface retrieves the results, compiles interaction redundancies often found in large datasets and/or in combined multiple datasets, and provides an annotation-rich results page for further investigation ( Figure 1). Annotation features include descriptions of gene/protein function and GO biological process, molecular function and cellular compartment terms ( 26).


DATA CURATION

Curation for BioGRID is performed by a dedicated team of P h D-level curators. A web-based interaction management system (IMS) is used to build prioritized publication queues for different projects and facilitate the curation process through structured pull-down menus. The history of all curated data is tracked to each individual curator. Curators also help guide direct deposition by authors, which is particularly useful for pre-publication annotation of large-scale datasets and allows immediate public release of the data upon publication.

Within the past 2 years, BioGRID curators have begun to use text-mining tools to prioritize the relevant literature for each curation project ( 18). In turn, BioGRID supports the text-mining community by providing a gold-standard collection of manually curated interactions for the BioCreative challenge ( 19–22), a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. We have also established collaborations with WormBase ( 23) and the development team for the Textpresso text-mining tool ( 24). For example, the curation queue for the Wnt-signaling network is prioritized based on text-mining results by Textpresso support vector machine (SVM) analyses, and ‘Textpresso for Wnt’ has also been set up as a text-mining interface to facilitate our curation. The overall curation pipeline of BioGRID is illustrated in Figure 1.

BioGRID curation pipeline. The curation workflow consists of three major steps: (i) triage of the literature of interest by text-mining tools and/or interaction-directed PubMed queries (ii) curation, annotation and tracking of interaction data through the web-based IMS and (iii) monthly public release of interaction data records.

BioGRID curation pipeline. The curation workflow consists of three major steps: (i) triage of the literature of interest by text-mining tools and/or interaction-directed PubMed queries (ii) curation, annotation and tracking of interaction data through the web-based IMS and (iii) monthly public release of interaction data records.

BioGRID actively collaborates with the extensive MOD community on different aspects of curation. For example, in collaboration with SGD, BioGRID curators have used the Yeast Phenotype Ontology (YPO) developed at SGD to assign structured phenotypes to over 200 000 budding and fission yeast genetic interactions. Collaborations are also underway with WormBase ( 23), ZFIN ( 25), FlyBase ( 26), MGI ( 27) and CGD ( 28) to coordinate interaction curation, and thereby leverage expertise and in-house MOD data that are relevant to biological interactions. For example, GO evidence codes generated by the MODs are often derived from publications that are likely to contain interaction data. Collaborations with the MODs have also led to an improved curation approach for higher organisms by implementation of species-specific phenotype ontologies and the broadening of interaction terms to capture more complex genetic interaction data. The different biology of various organisms used in biomedical research presents a formidable challenge in the annotation and interpretation of genetic interactions, and in the reconciliation of structured phenotypes across all species. In order to meet this challenge, in conjunction with WormBase ( 23), and supported by other MODs such as SGD ( 13), CGD ( 28), PomBase ( 14), FlyBase ( 26), TAIR ( 15) and ZFIN ( 25), we have developed a universal genetic interaction (GI) ontology that enables the annotation of more complex phenotypic outcomes associated with genetic interactions from higher organisms. The genetic interaction ontology has been submitted to the PSI-MI editorial committee ( 29) and will be made publicly available with the next official PSI-MI ontology release.


Abstract

Various posttranslational modifications (PTMs) participate in nearly all aspects of biological processes by regulating protein functions, and aberrant states of PTMs are frequently implicated in human diseases. Therefore, an integral resource of PTM–disease associations (PDAs) would be a great help for both academic research and clinical use. In this work, we reported PTMD, a well-curated database containing PTMs that are associated with human diseases. We manually collected 1950 known PDAs in 749 proteins for 23 types of PTMs and 275 types of diseases from the literature. Database analyses show that phosphorylation has the largest number of disease associations, whereas neurologic diseases have the largest number of PTM associations. We classified all known PDAs into six classes according to the PTM status in diseases and demonstrated that the upregulation and presence of PTM events account for a predominant proportion of disease-associated PTM events. By reconstructing a disease–gene network, we observed that breast cancers have the largest number of associated PTMs and AKT1 has the largest number of PTMs connected to diseases. Finally, the PTMD database was developed with detailed annotations and can be a useful resource for further analyzing the relations between PTMs and human diseases. PTMD is freely accessible at http://ptmd.biocuckoo.org.


Visualizing Post-Translational Modifications in Protein Interaction Networks Using PTMOracle

Post-translational modifications (PTMs) of proteins act as key regulators of protein activity, including the regulation of protein-protein interactions (PPIs). However, exploring functional links between PTMs and PPIs can be difficult. PTMOracle is a Cytoscape app that facilitates the co-visualization and co-analysis of PTMs in the context of PPI networks. PTMOracle also allows extensive data to be integrated and co-analyzed, allowing the role of domains, motifs, and disordered regions to be considered. Here, we describe several PTMOracle protocols investigating complex PTM-associated relationships and their role in PPIs. This is assisted by OraclePainter for coloring proteins by the modifications present and visualizing these in the context of networks, by OracleTools for cross-matching PTMs with sequence feature for all nodes in the network, and by OracleResults for exploring specific proteins and visualizing their PTMs in the context of protein sequences. This unit aims to demonstrate how PTMOracle can be used to systematically explore network visualizations and generate testable hypotheses regarding the functional role of PTMs in PPIs, and how the results can be analyzed to better understand the regulatory role of PTMs in PPIs. © 2019 by John Wiley & Sons, Inc.


Comprehensive structural analysis of mutant nucleosomes containing lysine to glutamine (KQ) substitutions in the H3 and H4 histone-fold domains

Post-translational modifications (PTMs) of histones play important roles in regulating the structure and function of chromatin in eukaryotes. Although histone PTMs were considered to mainly occur at the N-terminal tails of histones, recent studies have revealed that PTMs also exist in the histone-fold domains, which are commonly shared among the core histones H2A, H2B, H3, and H4. The lysine residue is a major target for histone PTM, and the lysine to glutamine (KQ) substitution is known to mimic the acetylated states of specific histone lysine residues in vivo. Human histones H3 and H4 contain 11 lysine residues in their histone-fold domains (five for H3 and six for H4), and eight of these lysine residues are known to be targets for acetylation. In the present study, we prepared 11 mutant nucleosomes, in which each of the lysine residues of the H3 and H4 histone-fold domains was replaced by glutamine: H3 K56Q, H3 K64Q, H3 K79Q, H3 K115Q, H3 K122Q, H4 K31Q, H4 K44Q, H4 K59Q, H4 K77Q, H4 K79Q, and H4 K91Q. The crystal structures of these mutant nucleosomes were determined at 2.4-3.5 Å resolutions. Some of these amino acid substitutions altered the local protein-DNA interactions and the interactions between amino acid residues within the nucleosome. Interestingly, the C-terminal region of H2A was significantly disordered in the nucleosome containing H4 K44Q. These results provide an important structural basis for understanding how histone modifications and mutations affect chromatin structure and function.


Affiliations

Department of Biology, ETH Zurich, Zürich, Switzerland

Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, USA

Jeffrey N Agar & Alexander R Ivanov

Department of Chemistry, University of Georgia, Athens, Georgia, USA

Department of Biomedical Sciences, Macquarie University, Sydney, New South Wales, Australia

Department of Chemistry, Stanford University, Stanford, California, USA

Carolyn R Bertozzi, Parag Mallick & Sharon J Pitteri

Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, Maryland, USA

Emily S Boja & Henry Rodriguez

Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, USA

Department of Molecular Medicine, The Scripps Research Institute, La Jolla, California, USA

Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA

Department of Biochemistry and Biophysics, University of Pennsylvania School of Medicine, and Epigenetics Institute, Philadelphia, Pennsylvania, USA

Department of Cell and Regenerative Biology, Human Proteomics Program, University of Wisconsin–Madison, Madison, Wisconsin, USA

Department of Chemistry, University of Wisconsin–Madison, Madison, Wisconsin, USA

Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA

Jeremy Gunawardena & Vamsi K Mootha

Memorial Sloan Kettering Cancer Center, New York, New York, USA

Department of Chemistry, University of Illinois, Urbana, Illinois, USA

Department of Biosciences and Christian Doppler Laboratory for Biosimilar Characterization, University of Salzburg, Salzburg, Austria

Christian G Huber & Therese Wohlschlager

Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark

Ole N Jensen & Martin R Larsen

The Center for Synthetic Biology, Northwestern University, Evanston, Illinois, USA

Michael C Jewett & Milan Mrksich

Department of Chemistry, Molecular Biosciences and the Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA

Neil L Kelleher, Steven M Patrie & Paul M Thomas

Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

Department of Cellular Molecular Pharmacology, University of California, San Francisco, California, USA

Department of Biological Chemistry, University of California, Los Angeles, California, USA

Joseph A Loo & Rachel R Ogorzalek Loo

Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden

Department of Genetics, Stanford University, Stanford, California, USA

Emma Lundberg & Michael P Snyder

Department of Genome Sciences, University of Washington, Seattle, Washington, USA

Department of Chemistry, Princeton University, Princeton, New Jersey, USA

Department of Biology, Saint Mary's College of California, Moraga, California, USA

Salk Institute for Biological Studies, Torrey Pines, California, USA

Applied Proteomics, Genentech, Inc., San Francisco, California, USA

Department of Clinical Chemistry/Central Laboratories, University Medical Center Hamburg – Eppendorf, Hamburg, Germany

National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, Maryland, USA

Department of Chemistry, Yale University, New Haven, Connecticut, USA

Genome Center of Wisconsin, Madison, Wisconsin, USA

Department of Microbiology, KTH Royal Institute of Technology, Stockholm, Sweden

Cedars Sinai Medical Center, Los Angeles, California, USA

Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA

Department of Pathology, Harvard Medical School and Wyss Institute at Harvard University, Boston, Massachusetts, USA

Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

Department of Chemistry, University of California, Berkeley, Berkeley, California, USA

Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA

Department of Cell Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA

Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA


Watch the video: BioGRID (May 2022).