Chapter 5: Worked Examples

// how to read these examples

What you'll get from each example

Each example follows the same structure: the biological question that motivated the study, the analytical approach step by step, the key findings from the network analysis, and what it teaches you about best practices and interpretation. Where the original paper used older tools, the modern equivalent is noted.

💡

These are representative, not exhaustive

The examples are drawn from real published research but are presented as learning tools, not exhaustive literature reviews. Details are verified against the primary papers; where methodology has been updated by later literature, this is noted. Full citations are in Chapter 7.

// example 1

Alzheimer's Disease: mapping the APP interaction network

// The question

Biological motivation

GWAS studies have identified dozens of genetic loci associated with late-onset Alzheimer's disease risk. Many of the implicated genes (CLU, BIN1, ABCA7, PICALM, CR1, MS4A6A, CD33, EPHA1, CD2AP) have no obvious mechanistic connection to the established APP→Aβ hypothesis. The question: do these disparate GWAS hits converge on shared protein interaction networks that could explain their disease relevance?

// Step 1: Building the network

What the authors did

The study input list comprised the top AD GWAS-associated genes along with the established causal genes (APP, PSEN1, PSEN2, MAPT, APOE). These were queried in STRING (confidence ≥ 0.7, experimental + databases channels) to retrieve first-order interaction partners. The result was expanded to a second-shell network (interactors of interactors) to capture indirect connections between genes that didn't directly interact.

APP

PSEN1

APOE

CLU

BIN1

PICALM

CR1

ABCA7

// Step 2: Enrichment analysis

GO enrichment on the full network

The full network (~200 proteins after expansion) was submitted to g:Profiler. Top enriched Biological Process terms included: "amyloid precursor protein metabolic process" (GO:0042982), "regulation of tau-protein kinase activity" (GO:1902947), "endocytosis" (GO:0006897), and "immune system process" (GO:0002376). The last two were unexpected — GWAS hits like BIN1, PICALM, and CR1 drove these terms. This finding pointed toward endocytic trafficking and innate immune dysfunction as AD-relevant mechanisms beyond simple Aβ production.

// key insight for readers

This is exactly the kind of insight that emerges from network + enrichment analysis: individual GWAS hits are hard to interpret, but when viewed together in a PPI network, they reveal shared biological themes. The "immune" signal from CR1 and CD33 (classical complement components and immune modulators) helped establish neuroinflammation as a central AD mechanism — a finding now extensively validated experimentally.

// Step 3: Hub identification

Identifying network hubs and therapeutic targets

NetworkAnalyzer in Cytoscape identified APP (degree 48), APOE (degree 31), TP53 (degree 28), and EGFR (degree 24) as top hubs in the expanded network. Betweenness centrality analysis additionally highlighted GSK3β — which had only moderate degree (18) but extremely high betweenness because it bridges the APP processing cluster and the tau phosphorylation cluster. This made GSK3β a particularly interesting therapeutic target candidate, as it sits at a pathway bottleneck.

// What you'd do today to replicate

Modern replication protocol

1. Obtain updated AD GWAS hits from the NHGRI-EBI GWAS Catalog (ebi.ac.uk/gwas). 2. Query all GWAS genes + established causal genes in STRING v12 (confidence 0.7, experimental + databases channels, first-shell expansion). 3. Export network to Metascape for automated MCODE clustering and enrichment. 4. In Cytoscape (NetworkAnalyzer), compute betweenness centrality. 5. Cross-reference hub proteins with DisGeNET (disease associations) and druggable genome databases (e.g. DGIdb) to identify actionable targets.

// example 2

Parkinson's Disease: convergence of familial PD genes

// The question

Biological motivation

Familial Parkinson's disease is caused by mutations in a handful of well-defined genes: SNCA, PARK2 (Parkin), PINK1, DJ-1, LRRK2, and ATP13A2. These were identified through linkage studies in different families, often spanning decades. The question was: despite being identified independently, do these genes form a coherent interaction network, and if so, what biological process does that network implicate?

// Step 1: The protein list

Input: familial PD gene products

Input proteins: SNCA (α-synuclein), PRKN/PARK2 (Parkin), PINK1, PARK7 (DJ-1), LRRK2, ATP13A2, VPS35, FBXO7, DNAJC6. These were supplemented with their known first-order interactors from BioGRID to ensure adequate network coverage. The BioGRID filtering used only low-throughput experimental interactions (Co-IP, Y2H confirmed by Co-IP, biophysical methods) to maximise evidence quality.

SNCA

PRKN

PINK1

PARK7

LRRK2

ATP13A2

VPS35

// Step 2: Network + MCODE

Metascape network analysis

Metascape identified two prominent MCODE clusters within the PD network. Cluster 1 (PRKN, PINK1, BNIP3L, BECN1, ATG7): enriched for "mitophagy" (GO:0000422) and "response to mitochondrial depolarisation" — this is the PINK1/Parkin mitophagy pathway that selectively degrades damaged mitochondria. Cluster 2 (SNCA, SNCB, MAPT, STX1A, NSF): enriched for "synaptic vesicle cycle" and "regulation of dopamine secretion" — reflecting SNCA's physiological role in presynaptic vesicle dynamics. The two clusters were connected via LRRK2, which interacts with components of both modules — positioning LRRK2 as a cross-pathway bottleneck.

// key insight for readers

The two-cluster structure revealed that familial PD genes map onto two complementary mechanisms: mitochondrial quality control (PINK1/Parkin axis) and synaptic vesicle dynamics (α-synuclein axis). This convergence, revealed by network analysis, helped establish that PD is not a monolithic disease but a convergence of related dysfunctions — a framework that now guides therapeutic development targeting both pathways.

// Step 3: GO enrichment interpretation

What the enrichment results said

g:Profiler enrichment on the full PD network (FDR q < 0.05) returned: "mitochondrial membrane organisation" (q = 3.1×10⁻⁹), "autophagy" (q = 2.8×10⁻⁷), "ubiquitin-dependent protein catabolic process" (q = 1.4×10⁻⁶), "dopamine metabolic process" (q = 4.2×10⁻⁵). Critically, "protein ubiquitination" was enriched rather than just "ubiquitin" — pointing to active regulatory ubiquitination (as mediated by Parkin's E3 ligase activity) rather than degradation-targeted polyubiquitination alone.

// What you'd do today

Modern replication protocol

1. Obtain updated PD GWAS hits from NHGRI GWAS Catalog plus all known familial PD genes (check OMIM for current list). 2. Query in STRING v12 (confidence 0.7) — then additionally use the STRING app in Cytoscape to query directly, allowing immediate visual analysis. 3. Run Metascape: submit gene list at metascape.org, review MCODE clusters, export to Cytoscape. 4. In Cytoscape, size nodes by degree, colour by MCODE cluster, edge width by STRING confidence score. 5. Cross-reference hubs with MitoCarta (mitochondrial protein database) to validate the mitophagy cluster. 6. Check the Human Protein Atlas for expression levels in dopaminergic neurons to confirm biological relevance.

// example 3

Synaptic Plasticity: the postsynaptic density interactome

// The question

Biological motivation

The postsynaptic density (PSD) is one of the most protein-dense structures in biology — hundreds of proteins concentrated at the tip of dendritic spines, organising glutamate receptors, scaffolding proteins, and downstream signalling enzymes. Mutations in PSD genes cause a wide variety of neuropsychiatric disorders: autism spectrum disorder (ASD), schizophrenia, and intellectual disability. The question: when the proteome of the PSD is mapped as a PPI network, do the disease-causing mutations cluster in particular functional modules, and does this clustering explain why different mutations can cause similar or overlapping clinical phenotypes?

// Step 1: Proteomics → protein list

AP-MS to define the PSD proteome

The study first biochemically purified the PSD fraction from mouse forebrain (a standard protocol involving differential centrifugation and detergent extraction). The protein composition was determined by LC-MS/MS — identifying ~1,461 proteins. This proteomics-derived protein list was then used as the input for PPI network analysis, rather than a manually curated gene list. This approach ensures the network reflects the actual protein context of the biological compartment being studied.

💡

Why this matters for enrichment analysis background

By starting with a proteomics-defined PSD proteome (~1,461 proteins), the researchers used this as the custom background for enrichment analysis — not all human genes. This is the correct approach: enrichment analysis of "100 signalling proteins from the PSD" should be tested against "all 1,461 PSD proteins", not all ~20,000 human genes. Using the wrong background would greatly inflate enrichment statistics.

// Step 2: Disease gene overlay

Mapping disease mutations onto the network

The PSD proteome was cross-referenced with three disease gene databases: the SFARI autism gene database, the SZGene schizophrenia gene database, and OMIM intellectual disability genes. This identified 197 PSD proteins with documented links to at least one of these disorders. These disease-associated proteins were then mapped onto the STRING PPI network of the full PSD proteome and their positions analysed. The key finding: disease-associated PSD proteins were not randomly distributed throughout the network — they were significantly enriched in specific dense modules (identified by MCODE) involved in glutamate receptor signalling and cytoskeletal remodelling.

// key insight for readers

This shows how PPI network analysis is used to demonstrate disease convergence. Mutations in SHANK3 (autism), SYNGAP1 (intellectual disability), and GRIN2B (schizophrenia) are caused by mutations in completely different proteins — but network analysis shows they all interact with PSD-95 (DLG4), all enrich the same GO terms (regulation of AMPA receptor activity, dendritic spine morphogenesis), and all fall in the same MCODE module. The clinical overlap between these disorders is explained by their shared network position.

// Step 3: Enrichment interpretation

GO enrichment of the disease module

The 197 disease-associated PSD proteins were submitted to g:Profiler using the full PSD proteome as background. Top enriched GO:BP terms: "regulation of AMPA receptor activity" (q = 4.1×10⁻¹¹), "positive regulation of synapse assembly" (q = 2.3×10⁻⁹), "dendritic spine morphogenesis" (q = 5.6×10⁻⁸), "long-term synaptic potentiation" (q = 1.2×10⁻⁷). CC enrichment confirmed "postsynaptic density" (q = 3.4×10⁻¹⁵) and "NMDA receptor complex" (q = 7.8×10⁻⁹). This enrichment pattern — convergent on synaptic plasticity mechanisms — supported a unifying hypothesis for diverse neuropsychiatric disorders: they may all fundamentally disrupt the molecular machinery for activity-dependent synaptic modification.

// What you'd do today

Modern replication protocol

1. Download the PSD proteome from a recent AP-MS study (e.g. the SynGO database at syngoportal.org — curates synaptic proteomics datasets). 2. Cross-reference with current disease gene lists: SFARI Gene for ASD (sfari.org), the Psychiatric Genomics Consortium GWAS for schizophrenia, and OMIM for ID genes. 3. Submit the overlapping proteins to Metascape with the full PSD proteome as custom background. 4. Use Cytoscape to produce a network coloured by disease association — nodes coloured by which disorder(s) their gene is linked to. 5. Use g:Profiler with custom background for the formal enrichment table in supplementary data.

// example 4

The biological question

Schizophrenia (SCZ) is a severe psychiatric disorder affecting ~1% of the global population. Unlike Alzheimer's or Parkinson's disease, there are no single high-penetrance causal mutations — instead, risk is distributed across hundreds of common variants of tiny individual effect, plus a smaller number of rare copy number variants. The largest GWAS to date (Trubetskoy et al., 2022) identified 287 variants across 269 independent loci. The challenge: what do these 269 loci actually do? Most GWAS hits lie in non-coding regions, making biological interpretation hard. PPI network analysis is used to map the protein products of the nearest genes onto an interaction network, asking: do these genes cluster into coherent biological modules? Do they converge on pathways we already know are relevant to SCZ (dopamine, glutamate, synapse)? And do any hub proteins in those modules have existing drugs?

Building the SCZ PPI network — from GWAS to proteins

The first challenge is converting GWAS loci into a gene list. For each of the 269 independent loci, the nearest protein-coding gene is assigned as a candidate. For loci with stronger evidence (eQTLs showing the variant regulates a specific gene's expression in brain tissue, e.g. from GTEx v8), the regulated gene is used instead of positional assignment. This is critical: positional assignment is imprecise, and eQTL-informed mapping substantially improves biological signal. The resulting gene list (~250–300 genes) becomes the PPI query input.

This list is submitted to STRING v12 with confidence ≥ 0.7, experimental and curated-database channels only (text mining excluded). Because the input contains hundreds of genes from a GWAS, the resulting raw network will be large (~1,000–3,000 nodes once first-shell interactors are included). Practically: Metascape is used for automated analysis at this scale; manual Cytoscape work is better suited to a focused subset.

🧠

Why SCZ is analytically different from AD/PD

In Alzheimer's analysis (Example 1), you start with a manageable set of well-established causal genes (APP, PSEN1/2, MAPT) plus GWAS hits. In SCZ, there are essentially no high-confidence Mendelian causal genes — almost everything comes from GWAS. This means your input is inherently nosier and the network signal-to-noise is lower. Methodological rigour matters even more: using eQTL mapping rather than positional assignment, running permutation tests to confirm your network has more connectivity than a randomly-sampled gene list of the same size, and using a 0.7+ threshold are all non-negotiable in this context.

Module detection: dopamine and glutamate converge

After MCODE clustering in Metascape, two modules consistently emerge at the top of enrichment significance. The first centres on dopamine signalling: DRD2 (the primary target of all approved antipsychotics) emerges as the highest-degree hub, interacting with COMT, DRD1, KCNB1, and several G-protein subunits (GNAS, GNB1). The GO enrichment of this module is dominated by "G protein-coupled receptor signalling" and "dopamine receptor binding." The second and typically larger module centres on glutamate/NMDA receptor signalling: GRIN2A, GRIN2B, GRIA1, and the postsynaptic density scaffold DLGAP1 all appear as high-degree nodes, with enrichment on "NMDA receptor complex" and "regulation of synaptic transmission." A third module — less consistently replicated but increasingly reported — involves immune and complement genes (C4A, CSMD1, CLU), reflecting the contribution of synaptic pruning pathways to SCZ risk.

The DRD2 hub: obvious in hindsight, but the network confirms it mechanistically

DRD2 typically has the highest betweenness centrality in the SCZ PPI network. A student's first reaction might be: "of course DRD2 is the hub — it's the antipsychotic target everyone already knows about." This is a reasonable concern, but the network adds something that clinical pharmacology alone cannot: it shows which other GWAS-implicated proteins connect to DRD2, and thus explains why multiple seemingly unrelated risk loci all funnel into the same biological process. The network confirms that the dopaminergic signal in SCZ GWAS is not noise — it is mechanistically coherent.

💡

Hub identification at high threshold: why 0.9 matters here specifically

In a large GWAS-derived network, use 0.7 to build the full landscape and detect modules. Then, for drug target prioritisation, rerun with 0.9: at this threshold, only interactions with multiple independent experimental validations survive. DRD2, GRIN2A, and GRIN2B should remain high-degree nodes at 0.9 — if a proposed hub disappears at 0.9, it may be an artefact of weaker evidence chains. This two-threshold approach lets you distinguish well-validated hubs (drug target candidates) from contextually important but less-evidenced connectors.

Drug repurposing overlay: from network to clinic

Hub proteins are cross-referenced with the Drug-Gene Interaction Database (DGIdb) and ChEMBL. DRD2 is already targeted by all approved antipsychotics (haloperidol, risperidone, clozapine), confirming the approach works for known targets. More interestingly, GRIN2A and GRIN2B emerge as druggable targets in the glutamate module — and this motivated clinical trials of NMDA modulators (e.g. D-cycloserine, glycine-site agonists) as adjuncts to antipsychotics. The complement module points to C4A, motivating interest in anti-inflammatory strategies.

Enrichment analysis: what the modules biologically mean

Submitting the dopamine module genes to g:Profiler (background: all human protein-coding genes; BH correction, q < 0.05) returns: "dopamine receptor signalling pathway" (GO:0007212, q = 1.3×10⁻⁸), "adenylyl cyclase-activating dopamine receptor signalling" (GO:0007191, q = 2.1×10⁻⁷), "regulation of postsynaptic membrane potential" (q = 4.5×10⁻⁶). The glutamate module returns: "NMDA receptor complex" (GO:0017146, q = 5.6×10⁻⁹), "long-term potentiation" (GO:0060291, q = 3.2×10⁻⁷), "ionotropic glutamate receptor signalling" (q = 1.8×10⁻⁶). These results match the dominant pharmacological hypotheses for SCZ (dopamine hypothesis, glutamate/NMDA hypofunction hypothesis) — the network is providing molecular resolution to clinical observations made decades earlier.

Modern replication protocol

1. Download the Trubetskoy et al. 2022 Supplementary Table of GWAS loci; use the eQTL-mapped gene assignments where available (GTEx brain, PsychENCODE). 2. Query the full gene list in STRING v12 (confidence ≥ 0.7, experimental + curated databases only; first-shell expansion to capture direct interactors of the GWAS genes). 3. Import to Metascape for MCODE clustering — expect a dopamine module and a glutamate module to appear in the top 5 clusters. 4. For hub subnetwork analysis, re-export the top 2 MCODE modules to Cytoscape and reapply edge filter ≥ 0.9 to identify the most-evidenced hubs. 5. Cross-reference hubs against DGIdb (dgidb.org) and ChEMBL. 6. Run g:Profiler enrichment on each MCODE module separately using all human protein-coding genes as background.

// cross-cutting lessons

What all four examples have in common

Looking across the three examples, several methodological patterns emerge that characterise rigorous PPI network analysis in neuroscience:

The network reveals what lists cannot

In all three cases, the individual gene/protein list was interpretable but limited. It was the interaction context — who interacts with whom, which modules form, where disease genes cluster — that generated the hypothesis. This is the fundamental argument for PPI network analysis over simple gene list analysis.

Background matters

Example 3 used the PSD proteome as its enrichment background — not all human genes. Example 1 used all human genes as background (appropriate since the input was GWAS genes, not a tissue-specific proteome). Always match the background to the experimental context of how your input list was generated.

Convergence is the finding

Each study used a network to demonstrate convergence — disparate disease genes converging on shared interaction modules and biological processes. This "convergence argument" is now a standard pattern in neuropsychiatric genetics. PPI analysis is the primary tool used to make it.

Hub status needs validation

All four examples identified hub proteins, but all were careful to contextualise hub status: APP is a hub partly because of publication bias (extensively studied proteins have more curated interactions). Cross-referencing with BioGRID, checking experimental evidence types, and comparing hubs against tissue-specific expression data all strengthen the argument that a hub is genuinely biologically important.

GWAS input needs eQTL mapping, not just proximity

The schizophrenia example illustrates a critical upstream step that AD and PD analyses (starting from known causal genes) do not require: converting GWAS loci to genes. Positional assignment (nearest gene) is an imprecise shortcut. eQTL-informed mapping — using GTEx or PsychENCODE to find which gene's expression is controlled by each risk variant in brain tissue — produces a biologically cleaner input list and stronger downstream PPI signal.