Chapter 4: Network Analysis

// section 4.1

Metascape and Cytoscape: analyst and artist

These two tools are often mentioned together in papers because they serve complementary roles in a PPI network analysis workflow. Understanding their distinct purposes prevents confusion about when to use each:

📊

Metascape — the analyst

Takes your gene/protein list, runs enrichment analysis across dozens of databases, constructs a PPI network from STRING data, detects functional modules with MCODE, and semantically clusters GO results. It is automated, opinionated, and fast. The output is biologically meaningful but visually limited — Metascape figures in papers look recognisably "Metascape-like" and are not highly customisable.

→

The handoff

Metascape's network can be exported as node/edge tables or a direct Cytoscape session file. The node table contains the coordinates, cluster membership, and GO annotation computed by Metascape. The edge table contains the STRING-derived interactions and scores. Cytoscape reads both files and reconstructs the network exactly as Metascape laid it out — but now you have full control over every visual property.

🎨

Cytoscape — the artist

Cytoscape (cytoscape.org, Shannon et al., 2003) is an open-source desktop platform for network visualisation and analysis. It doesn't run enrichment analysis or build networks from scratch — it visualises and analyses networks you bring to it. In Cytoscape you control: node size, colour, shape (can map to degree, cluster, or any metadata), edge width and opacity (mapped to STRING confidence score), layout algorithm, labels, and legends. The result is a figure that communicates exactly what you want the reader to see.

🔑

The fundamental distinction

Metascape is for generating biological insight. Cytoscape is for communicating that insight visually. A complete PPI paper figure typically uses both: Metascape's MCODE clusters define the structure; Cytoscape's styling makes that structure legible and publication-ready.

// section 4.2

Metascape network analysis in depth

When you submit a gene list to Metascape and it constructs a PPI network, several analytical steps happen automatically. Understanding them helps you interpret and report the output correctly.

MCODE: detecting protein complexes

MCODE (Molecular Complex Detection, Bader & Hogue, 2003) identifies densely connected subgraphs within the PPI network. The algorithm assigns a score to each node based on the local clustering coefficient weighted by the number of edges in the neighbourhood. Dense clusters exceed a score threshold and are reported as "modules".

In Metascape's output, each MCODE module is presented with: (1) a list of member proteins, (2) a GO enrichment summary for those proteins, and (3) a visual position in the full network where the cluster is highlighted. This is the information you'll use to label your Cytoscape figure — each cluster becomes a labelled region in the final network figure.

Metascape MCODE output — typical AD PPI network

Schematic of a Metascape MCODE output — three functional clusters automatically detected and annotated within a larger PPI network

Metascape hub protein analysis

Within the network view, Metascape calculates degree centrality for each node and can highlight the top hub proteins. In the "PPI Enrichment" results, clicking "Network Visualization" shows node size scaled by degree. For Alzheimer's networks, APP, MAPT, and TP53 consistently appear as the largest nodes — confirming their hub status within the network topology.

// section 4.3

Cytoscape: publication-quality network figures

Cytoscape is an open-source desktop application (download at cytoscape.org). Version 3.x works on Windows, macOS, and Linux. It requires Java 11+ and at least 2GB RAM for typical neuroscience-scale networks (50–500 nodes). The core application is free; a library of plugins called Apps extends its functionality — the most important for PPI work are NetworkAnalyzer, STRING app, EnhancedGraphics, and Omics Visualizer.

🎨

Visual Style editor

Map any node/edge attribute to any visual property. Size nodes by degree centrality, colour them by cluster membership, scale edge width by STRING confidence score. Everything is data-driven.

📐

Layout algorithms

Force-directed (organic), hierarchical, circular, grid. For PPI networks, the "organic" force-directed layout usually works best — it naturally positions hub proteins centrally and clusters dense modules.

📊

NetworkAnalyzer

Built-in plugin that computes degree, betweenness centrality, clustering coefficient, average path length, and other metrics for every node. Results are stored as node attributes, ready to drive visual styles.

🔌

STRING App

Query STRING directly from within Cytoscape without exporting/importing files. Returns the same network STRING.db would, but opens directly in Cytoscape's analysis environment.

🕸️

Cytoscape workflow: from Metascape export to publication figure

Import the Metascape network

In Metascape's network view, click Export → Cytoscape to download a ZIP containing cytoscape_input.xlsx (node and edge tables). In Cytoscape: File → Import → Network from File. Select the edge table; set "Source" and "Target" columns appropriately. Then import node attributes: File → Import → Table from File, attaching to nodes by gene name. Alternatively, open the .cys session file if Metascape generated one — this reconstructs the exact layout automatically.

Run NetworkAnalyzer

Go to Tools → Analyze Network. For undirected PPI networks, select "treat network as undirected". This adds degree, betweenness centrality, and clustering coefficient as node attributes visible in the node table. These attributes are now available to use in your visual style — for example, you can map node size to degree so that hub proteins appear larger.

Set up Visual Styles

Open the Style panel (left sidebar). For a clean PPI figure: map Node Size to degree (continuous mapping, larger = higher degree); map Node Fill Color to MCODE cluster membership (discrete mapping, different colour per cluster); map Edge Width to STRING confidence score (continuous); set Edge Opacity to 60%. Use a dark background matching your paper's figure style, or white for journal figures. Font: avoid default sans-serif — use a clean monospaced font for node labels to match bioinformatics conventions.

Apply a layout

Go to Layout → Organic (yFiles layout is the highest-quality but requires a licence; the open-source "Prefuse Force Directed" is the best free alternative). After applying the layout, manually adjust nodes that are overlapping or poorly positioned. Cytoscape allows individual node dragging while the rest of the network remains anchored.

Add annotations and export

Use Annotations to add text labels for each MCODE cluster (e.g. "Module 1: APP processing — GO:0042982"). Export via File → Export → Network to Image. For publications: use PDF or SVG format for vector graphics (scalable, editable in Illustrator or Inkscape). PNG at 300 DPI minimum for bitmap formats. Include a scale bar or legend in Cytoscape's annotation layer before exporting.

💡

What editors and reviewers expect in a PPI network figure

A good PPI network figure should be legible at publication size (typically 8–16 cm wide). Nodes should be labelled (gene symbols, not protein names). Edge weight should reflect confidence score. Hub proteins should be visually prominent (larger nodes). If MCODE clusters are shown, they should be colour-coded consistently with the enrichment table. Include a concise legend explaining colour, size, and edge weight scales. Avoid "hairball" networks — if your network has >100 nodes, consider showing only the top N by degree, or separating clusters into panel sub-figures.

// section 4.4

Identifying hub proteins in practice

Hub identification is the most common analytical goal in PPI network analysis. Here is the standard workflow used in neuroscience papers:

Standard hub identification pipeline

Set degree threshold. After running NetworkAnalyzer in Cytoscape, sort nodes by degree descending. The top 10% by degree are conventionally called "hub proteins". For a 50-node network, the top 5 nodes are your hubs. For a 200-node network, the top 20. Some papers define hubs as degree > mean + 2 standard deviations — state your criterion clearly in the Methods.
Confirm with betweenness centrality. Hub proteins typically also have high betweenness. The intersection of "top 10% by degree" and "top 10% by betweenness" gives you the most important hub proteins — nodes that are both highly connected AND structurally essential for network communication. These are your highest-priority candidates for further investigation.
Validate against literature. Cross-reference your identified hubs with disease databases (DisGeNET, OMIM) and existing functional data. If you identified APP as a hub in an AD network and CDK5 as a hub in a tau phosphorylation analysis — both have extensive experimental validation. Unexpected hubs with no prior disease association are interesting candidates for novel target identification.
Report hub proteins in a table. Standard reporting includes: gene symbol, degree, betweenness centrality, known disease associations (from DisGeNET), and relevant GO annotations. This table typically appears in the main text or supplementary data of a paper.

⚠️

Hub status is network-context dependent

A protein's hub status depends entirely on which proteins you included in your analysis and which confidence threshold you used. TP53 appears as a hub in almost every PPI network because it has an enormous number of curated interactions in STRING. Before concluding that a protein is "a hub in your disease network", check whether it would also appear as a hub in a random set of the same size from the same organism. Some papers report a "hub enrichment score" that compares the degree of their identified hubs to a background distribution — this is good practice.

Chapter 3: Enrichment Chapter 5: Worked Examples