Chapter 6: Best Practices

// section 6.1

Choosing confidence thresholds

The STRING confidence threshold is the single most consequential parameter in a PPI network analysis. There is no universally "correct" threshold — the right choice depends on your biological question, your input list size, and the level of evidence you want your network to reflect.

Where do these numbers actually come from?

STRING's combined confidence score is not arbitrary — it is a probabilistic estimate of whether two proteins are functionally associated, benchmarked against the E. coli operonic gold standard and verified against COG (Clusters of Orthologous Groups) across hundreds of organisms. A score of 0.7 means STRING estimates a roughly 70% probability that the interaction is real given the available evidence. A score of 0.9 means ~90% confidence. This matters because:

At 0.4 (medium), STRING includes many interactions supported only by text mining or genomic co-occurrence — these have high false-positive rates in mammals. Typical false-positive rates at 0.4: ~40–50% in curated benchmarks.
At 0.7 (high), the network retains interactions that have at least some experimental validation (co-expression, co-immunoprecipitation, yeast two-hybrid) or strong curated database evidence (e.g. KEGG, Reactome). Benchmarked false-positive rate drops to approximately 20–30%. This is an acceptable trade-off for exploratory-to-confirmatory analyses — you keep enough of the biologically meaningful network without drowning in noise.
At 0.9 (very high), the set is restricted to interactions with multiple independent experimental confirmations or highly curated mechanistic evidence. False-positive rate: approximately 5–10%. The network is much smaller and sparser, but almost everything in it is real.

🔑

Why 0.7 for network construction, 0.9 for hub subnetwork — the logic

0.7 for the full network: You want to capture all proteins that might plausibly be part of the disease-relevant interaction landscape, including those where evidence is strong but not exhaustive. A 0.7 network preserves important indirect connections between disease gene modules that would disappear at 0.9. This is the scaffold for module detection (MCODE), enrichment analysis, and visualising the disease landscape.

0.9 for hub identification specifically: When you're claiming that protein X is a hub and therefore a drug target candidate, you need to be confident that X's high degree isn't inflated by weak or spurious edges. At 0.9, each edge connecting X to another node has been independently validated multiple times. If X remains a hub at 0.9, you have a strong, defensible argument for its centrality. If it disappears at 0.9 (e.g. it had many 0.7–0.88 edges but few 0.9 ones), it may be central in a weakly-evidenced way — still worth noting, but not a primary drug target claim. This is why papers routinely report a 0.7 main network with a 0.9 hub subnetwork inset — they serve different inferential purposes.

Practical guidance by input type

📏

Rule of thumb thresholds for different goals

Small focused list (10–30 known genes, e.g. AD causal genes): use 0.7 — at 0.9 the network may become too sparse to be informative. Medium GWAS-derived list (50–300 genes): 0.7 for full network, 0.9 for hub subnetwork — this is the most common published approach. Large proteomics-derived list (500+ proteins, e.g. full PSD proteome): consider 0.9 as the primary threshold to avoid a hairball; 0.7 may produce an uninterpretable network at this scale. Always turn off text mining and co-occurrence channels when aiming for ≥ 0.7 experimental-only networks.

✓ Do

State your threshold explicitly in the Methods: "STRING v12, confidence ≥ 0.7, experimental and database evidence channels only"
Justify your threshold choice with reference to your input list size and goal
Run a sensitivity analysis: report whether your key hub conclusions hold at 0.4, 0.7, and 0.9 thresholds
Use separate thresholds for network construction (0.7) vs hub identification (0.9 edges for the hub subnetwork)

✗ Don't

Use 0.4 (medium confidence) for publication analysis without strong justification — reviewers will flag this
Use the same threshold for a 10-gene input (use 0.7+) as for a 500-gene input (might need 0.9+ to avoid hairball networks)
Treat the default STRING threshold as automatically appropriate for all analyses
Include "text mining" channel edges at low thresholds without noting that text mining ≠ experimental evidence

// section 6.2

Publication bias and study bias

This is one of the most important — and most underappreciated — limitations of PPI database analysis. The interactions in STRING, BioGRID, and IntAct were detected by researchers studying proteins that were already of interest. Proteins like APP, TP53, EGFR, and MAPT have thousands of curated interactions because they have been the focus of thousands of studies. Lesser-studied proteins may have equally important biology but far fewer documented interactions simply because nobody has looked.

The consequence

Well-studied proteins appear as hubs partly by publication bias, not purely by biological importance. When you report TP53 as a hub in your AD network, a reviewer may reasonably ask: "Is TP53 a hub because it genuinely mediates AD-relevant biology, or because it's the most-studied protein in the genome?" You need to have an answer ready.

How to address it

Acknowledge the limitation explicitly in your Discussion. If a known "promiscuous" hub (TP53, EGFR, ACTB) appears at the top of your hub list, discuss whether it has specific known roles in your disease of interest. Use tissue-specific expression data (Human Protein Atlas, GTEx) to verify that your hub proteins are actually expressed in the relevant tissue.

⚠️

The "promiscuous hub" problem

TP53, ACTB (β-actin), HSP90AA1, and EGFR appear as hubs in nearly every PPI network analysis of any disease because they are the most-studied proteins in biology. When these proteins appear in your hub list, check: (1) Do they have documented functional roles in your specific disease? (2) Are they expressed in the relevant cell type? (3) Is their hub status driven primarily by text mining or by experimental evidence? If the answers are "no", "no", and "text mining" — consider excluding them or presenting them separately with appropriate caveats.

// section 6.3

How to write a PPI Methods section

A reproducible PPI analysis requires reporting enough detail that another researcher could replicate your exact network. Many published papers fall short of this standard. Here is a template for a complete PPI Methods section:

Methods template — PPI network analysis

PPI network construction. Protein–protein interaction data were retrieved from STRING database v12.0 (Szklarczyk et al., 2023) for Homo sapiens. [Input gene list description] was queried using [gene symbols / UniProt IDs] as input identifiers. Network edges were restricted to interactions with a combined confidence score ≥ 0.7, incorporating only experimentally determined interactions and curated database interactions; text mining, gene neighbourhood, gene fusion, and co-occurrence channels were excluded to maximise specificity. The resulting network comprised N nodes and M edges.

Enrichment analysis. Gene Ontology (GO) and pathway enrichment analysis was performed using g:Profiler v2.0 (Raudvere et al., 2019). Gene symbols were submitted against the Homo sapiens annotation database. Enrichment was tested across GO Biological Process (BP), Molecular Function (MF), Cellular Component (CC), KEGG, and Reactome annotation sources. Multiple testing correction was applied using the g:SCS method; significance threshold: q < 0.05. The background gene set was [all human protein-coding genes / [specific tissue proteome, n = X]].

Network visualisation and module detection. Network files were imported into Metascape (Zhou et al., 2019) for automated MCODE module detection (minimum degree threshold: 2; node score cutoff: 0.2; K-core: 2; maximum depth: 100). Module annotation was performed by enrichment against the same GO and pathway databases described above. Final network visualisation was performed in Cytoscape v3.10 (Shannon et al., 2003). Node size was scaled proportionally to degree centrality; edge width was scaled to STRING combined confidence score. Betweenness centrality was calculated using the NetworkAnalyzer plugin.

Replace italicised portions with your actual values. This level of detail is the minimum required for reproducibility — most reviewers in computational biology will expect all of this information.

// section 6.4

Reproducibility checklist

Before submitting a paper containing PPI network analysis, verify you can check all of the following:

// section 6.5

Common reviewer criticisms — and how to preempt them

"The authors use uncorrected p-values for enrichment analysis"

How to preempt: Always report FDR-adjusted q-values. State the correction method (g:SCS, Benjamini–Hochberg) explicitly. In your results text, write "FDR q = X.XX" not "p = X.XX" for enrichment results. If you want to include a supplementary table with raw p-values for completeness, clearly label the column "raw p-value (uncorrected)" to distinguish from the adjusted values.

"The STRING threshold used (0.4) is too low for reliable conclusions"

How to preempt: Use 0.7 as your primary threshold. If you use 0.4 for exploratory analysis, clearly label it as exploratory and present your main conclusions using the 0.7 or 0.9 threshold network. A supplementary analysis at multiple thresholds showing that key hub proteins are stable across thresholds significantly strengthens a paper's robustness.

"The identified hub proteins are generic hubs seen in any PPI analysis"

How to preempt: In your Discussion, acknowledge that some high-degree proteins (TP53, EGFR) reflect publication bias rather than disease-specific biology. Focus your interpretation on hubs with disease-specific literature support and expression in the relevant tissue. Consider presenting two categories: "generic hubs" and "disease-specific hubs" (e.g. proteins uniquely highly connected within your disease gene network but not in the whole genome network).

"The network analysis is descriptive — no functional validation provided"

How to preempt: This is the most common criticism and reflects a genuine limitation of purely computational PPI analyses. Address it by: (1) framing your analysis explicitly as hypothesis-generating; (2) connecting your findings to existing experimental literature; (3) where possible, including at least one orthogonal validation (e.g. verifying a key predicted interaction by co-IP, or confirming hub expression in disease tissue by immunohistochemistry). Computational-only papers are publishable in the right journals, but experimental validation strengthens the impact significantly.

"The enrichment background is inappropriate"

How to preempt: Always explicitly state your background gene set in the Methods. If your input genes came from an RNA-seq experiment, use expressed genes as background. If they came from a proteomics study, use the full detected proteome as background. If they came from a GWAS or literature curation without a specific tissue context, all human protein-coding genes is acceptable. The key is to match the background to the context in which your input list was generated.

Best Practices & Common Pitfalls