Where does PPI data come from?
PPI databases don't generate their own interaction data — they aggregate data from experiments published in primary research papers, then add computational predictions on top. This is crucial to understand: the interactions you see in STRING or BioGRID were measured by biologists using specific experimental methods, each with their own strengths and limitations.
The main experimental sources, in order of directness of evidence, are:
Y2H (Yeast Two-Hybrid)
Tests binary interactions in yeast cells. High-throughput but relatively high false-positive rate (~30–50% for genome-scale screens). Best for discovering new direct interactions. Can miss weak/transient ones.
AP-MS (Co-purification)
Pulls down one protein and identifies all co-purifying partners by mass spectrometry. Detects complexes, not just binary pairs. Very sensitive but can't distinguish direct vs indirect contacts within the complex.
Co-immunoprecipitation (Co-IP)
Uses an antibody to pull down one protein; western blot confirms the partner. Typically low-throughput (one interaction at a time) but very specific and biologically relevant. Gold standard for validating an interaction.
Biophysical (SPR, ITC, FRET)
Surface Plasmon Resonance, Isothermal Calorimetry, and FRET directly measure binding kinetics and affinity. Very high-quality evidence, but low-throughput. These are the interactions databases flag as highest-confidence experimental evidence.
Proximity labelling (BioID, APEX)
An enzyme fused to a bait protein biotinylates all nearby proteins. Captures transient and weak interactions that Y2H or Co-IP might miss. Increasingly important for synaptic and nuclear PPI studies.
Computational prediction
STRING's "text mining", "co-expression", "genomic context", and "homology" channels are computational, not experimental. Useful but should be weighted lower. Many databases allow you to turn these off and view experimental-only networks.
When STRING reports an interaction with a confidence score of 0.9, this doesn't mean the interaction is true with 90% probability. It means the combined evidence from multiple channels supports this interaction being real. A score of 0.9 based purely on text mining is far less reliable than 0.7 based on two independent experimental methods. Always check which evidence channels contribute to a score.
STRING: the go-to PPI database
STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is maintained by the Bork group at EMBL Heidelberg and updated roughly every 2–3 years (current version: STRING v12.0, released 2023). It covers over 14,094 organisms and integrates more than 67 billion interactions — though the human-focused subset of well-evidenced interactions is much smaller and more practically useful.
STRING is the most widely used PPI database in neuroscience publications because it is freely accessible, returns visual networks immediately, and integrates multiple evidence channels into a single combined score. It also links directly to GO enrichment analysis and pathway databases (KEGG, Reactome) from the same interface.
STRING's seven evidence channels
Every edge in STRING has a score for each of seven evidence channels. The combined score is calculated by combining these channel scores probabilistically (correcting for random interactions). Understanding each channel tells you why an interaction is shown:
| Channel | What it measures | Type | Reliability for PPI |
|---|---|---|---|
| Neighbourhood | Genes physically adjacent on genome in other organisms — suggests co-evolution of function | Computational | Low–medium; indirect |
| Gene fusion | The two genes are fused into one in some other organism, implying functional linkage | Computational | Medium; rare but reliable when present |
| Co-occurrence | Both genes are present or absent together across species (phylogenetic profiling) | Computational | Low–medium |
| Co-expression | Genes show correlated expression patterns across conditions/tissues (e.g. from RNA-seq datasets) | Computational | Medium; correlation ≠ interaction |
| Text mining | Both gene names co-occur in PubMed abstracts and full-text articles | Computational | Low–medium; prone to false positives from review articles |
| Databases | Manually curated interactions from databases like BioGRID, IntAct, MINT, DIP, HPRD | Experimental | High — human-curated from primary literature |
| Experiments | Directly imported experimental data: Y2H, AP-MS, Co-IP, biophysical measurements | Experimental | Very high — most reliable channel |
For a publication-quality PPI analysis, many researchers set STRING to show only the "Databases" and "Experiments" channels (and optionally "Co-expression" if looking for functional associations). This dramatically reduces false positives. In STRING's settings panel, deselect "Text mining", "Neighbourhood", "Gene fusion", and "Co-occurrence" to get a cleaner network. You'll typically see far fewer — but much more reliable — interactions.
Reading STRING output: an annotated guide
When you run a STRING query (e.g. enter "APP" in the search box, select Homo sapiens, set confidence to 0.7), you get a visual network and a results table. Here is what every element means:
Choosing a confidence threshold
The most important decision when using STRING is your confidence threshold. Drag the slider to understand the trade-offs:
BioGRID: experimentally focused
BioGRID (Biological General Repository for Interaction Datasets) is a curated repository of protein and genetic interactions, maintained by the Tyers lab at Université de Montréal. Unlike STRING, BioGRID does not include computational predictions — every interaction in BioGRID was detected experimentally and manually extracted from a primary paper by a trained curator.
As of 2024, BioGRID contains over 2.5 million interactions across ~70 species. For the human interactome specifically, BioGRID is arguably the highest-quality resource because of its strict experimental curation standard. However, because it doesn't use prediction, its coverage is lower than STRING.
When to use BioGRID
When you need to know the experimental method behind an interaction — BioGRID lets you filter by detection method (Y2H, Co-IP, AP-MS, etc.) and by throughput (high-throughput vs low-throughput). This is invaluable when critically evaluating the evidence for a specific interaction before proposing follow-up experiments.
BioGRID vs STRING in practice
STRING feeds from BioGRID (listed as one of its "Databases" channel sources). So interactions in BioGRID will appear in STRING too, but STRING will give them a higher combined score if they're also supported by co-expression or text mining. For verification of a specific interaction, check BioGRID directly for the source paper.
How to query BioGRID for a specific interaction
Navigate to thebiogrid.org
Go to the BioGRID website and use the search bar. Enter your protein of interest — for example, SNCA — and select Homo sapiens from the organism dropdown.
Review the interaction table
BioGRID returns a table listing each interaction partner, the detection method, the throughput category (high vs low), and a link to the PubMed source paper. Unlike STRING's visual network, BioGRID presents raw interaction records — one row per experimentally detected interaction, with full provenance.
Filter by detection method
Use the "Detection Method" filter to show only, say, Co-IP interactions. For neurodegeneration research, Co-IP and Co-localisation in human brain tissue carry more biological weight than interactions detected only in overexpression systems. This level of filtering is not available in STRING.
Cross-reference with STRING
For interactions that appear in BioGRID but not at the top of your STRING output, check the STRING evidence view for that specific protein pair (you can enter both gene names in STRING's "compare proteins" mode) to see exactly which channels contributed. If STRING shows it mainly via text mining but BioGRID confirms it experimentally, you should trust BioGRID's assessment.
IntAct, MINT, DIP — the specialist databases
Beyond STRING and BioGRID, several other curated databases are worth knowing. You will see them cited in papers and as source databases within STRING's "Databases" channel:
IntAct (European Molecular Biology Laboratory – European Bioinformatics Institute) is one of the most rigorously curated PPI databases. All interactions are curated from primary literature using the PSI-MITAB standard, meaning each interaction record has a defined experimental detection method, author, and PubMed ID. IntAct is a core member of the IMEx Consortium, which ensures consistent curation standards across member databases. For verification purposes, IntAct is often the most authoritative source.
MINT focuses on experimentally verified functional interactions. Now maintained alongside IntAct under the IMEx Consortium. In practice, MINT's data is merged into IntAct for most query purposes. Useful to know when reading older papers that cite MINT as a data source — this data is still accessible via the IntAct portal.
DIP is one of the oldest PPI databases, developed at UCLA. It maintains a high-quality, manually curated core dataset and has particularly good coverage of yeast two-hybrid data. STRING includes DIP as one of its "Databases" channel sources. For most current analyses, STRING effectively subsumes DIP, but DIP can be useful if you want to specifically understand the Y2H evidence base for an interaction.
HPRD focuses exclusively on human proteins and integrates PPI data with post-translational modifications, disease associations, and subcellular localisation. It was last updated in 2010 and is no longer actively maintained — however, because many early neuroscience PPI papers used HPRD, it's still referenced in the literature. Be cautious when citing HPRD data, as it predates many high-throughput proteomics datasets that have since updated our understanding of the human interactome.
Choosing the right database: a comparison
| Feature | STRING | BioGRID | IntAct | DIP |
|---|---|---|---|---|
| Visual network output | ✓ | ~ | ✗ | ✗ |
| Computational predictions | ✓ | ✗ | ✗ | ✗ |
| Experimental data only option | ✓ (filter channels) | ✓ | ✓ | ✓ |
| Per-interaction method info | ~ (via evidence view) | ✓ | ✓ | ✓ |
| Integrated GO enrichment | ✓ | ✗ | ✗ | ✗ |
| Multi-species coverage | ✓ 14,094 organisms | ✓ ~70 organisms | ✓ | ~ |
| Last updated | 2023 (v12) | Ongoing | Ongoing | 2014 (limited) |
| Best for… | Initial exploration, visual network, GO overview | Verifying experimental evidence | High-quality curation, method details | Y2H evidence, historical data |
For most neuroscience PPI analyses: (1) use STRING with "Databases + Experiments" channels at confidence 0.7 to generate your network; (2) verify key hub interactions in BioGRID to confirm experimental evidence and detection method; (3) export the network to Metascape or Cytoscape for enrichment analysis and publication-quality figures. This is the workflow you'll see used in the worked examples in Chapter 5.